LiteLLM (YC W23): Founding Reliability Engineer – $200K-$270K and 0.5-1.0% equity

TLDR

LiteLLM is an open-source AI gateway (36K+ GitHub stars) that routes hundreds of millions of LLM API calls daily for companies like NASA, Adobe, Netflix, Stripe, and Nvidia. We're at $7M ARR, 10 people, YC W23.

When LiteLLM goes down, our customers' entire AI stack goes down. We need someone who makes sure that doesn't happen.

You'd be the first dedicated reliability hire. You'll own reliability, performance, and production stability end-to-end. Nobody will tell you how to do it

What this job actually is

We'll be straight with you: this role is roughly 60% operational reliability and 40% deep performance engineering. On any given week you might be:

Hunting a memory leak in our async streaming handler that causes OOMs after 4 hours under load

Fixing a race condition where PodLockManager releases another pod's lock

Profiling why update_database() does 7 deep copies per request in the spend tracking hot path

does 7 deep copies per request in the spend tracking hot path Helping a Fortune 500 customer debug why their 20-pod deployment is exhausting Postgres connections

... continue reading