Outsourcing plus LocalAI will soon become more economical vs Frontier labs
Tl:Dr: This essay is an attempt to answer at which point it becomes more economical to hire an engineer in a cheaper country and give them DeepSeek/local-AI API key vs using Frontier closed-source LLMs and concludes that at the very least, this dynamic puts a price ceiling on the frontier lab offerings. We use DeekSeek as a proxy for localAI costs.
We keep hearing that the inference costs are supposed to be on a downward trajectory but they are evidently not, not for the frontier US labs anyways.
GPT 5.5 ($5/$30) that released less than 2 months after GPT-5.4 doubled the API pricing across the board. GPT 5.5 costs over 3x of what GPT-5 cost 8 months ago ($1.25/$10).
Gemini 3.5 Flash ($1.50/$9.00) tripled the API pricing across the board over its predecessor Gemini-3-flash-preview ($0.50/$3.00) which was already price-hiked from its predecessor 2.5 Flash (0.30/$2.50)
Anthropic released Opus-4.7 with a new tokenizer that effectively increased the token consumption by 32% to 47% over its immediate predecessor Opus-4.6.
How do the frontier OSS and closed source models compare
For this comparison, we used a ‘blend token consumption ratio’ that assumes that for every 1M input (plus cached) tokens, there are 50k output tokens (just under 5%). This is a conservative estimate if anything since large agentic loops are dominated by reads due to the large number of turns.
Then we take the caching into account for each provider (source: openrouter.ai) and compare the average blend price per million agentic tokens.
Provider Input Price ($/1M) Output Price ($/1M) Cache Hit Rate Anthropic $1.57 $25.00 79.6% OpenAI $1.30 $30.22 84.8% DeepSeek $0.055 $0.870 88.1%
... continue reading