Modal Auto Endpoints: Optimized inference you own

Modal allows leading teams like Cognition, Decagon, Fathom, and DoorDash to own their inference without compromising on cost-performance or developer velocity.

Now you can do the same with a single command:

modal endpoint create --name agent --model zai-org/GLM-5.2-FP8

Introducing Modal Auto Endpoints : a smooth, self-serve on-ramp to production-grade LLM inference.

Take it for a spin right now , or read on to learn more about how we built it and why.

Built for the era of actually owning your inference

Proprietary model providers can silently degrade models or suddenly retract access . If you don't own your inference, you don't own your destiny.

If you work with open models served by an inference provider, you gain some control. But we think ownership runs deeper than the API. To actually own your inference, you need to own, understand, and optimize the code that runs the inference.

Managed inference providers make it easy to get an API, but the serving stack is a black box. So until now, teams that wanted proper ownership of their inference have had only one option: roll an inference service yourself. That gives you control, but now you own a lot more than just inference: engine tuning, endpoint benchmarking, container deployment, replica autoscaling & routing, and inference metrics.

That's why we built Modal Auto Endpoints, and why they look very different from what's offered by traditional inference providers.

... continue reading