Skip to content
Tech News
← Back to articles

AI companies charge you 60% more based on your language, BPE tokens

read original get AI Language Token Counter → more articles
Why This Matters

This article highlights how AI companies charge users based on tokens, which are not standardized across providers, leading to potentially significant and hidden cost disparities. Consumers and businesses may unknowingly pay up to 60% more depending on the provider and language used, especially for non-English inputs. The lack of standardization and transparency in tokenization practices raises concerns about fairness and cost predictability in AI services.

Key Takeaways

How AI Companies Are Charging You More Without You Even Realizing It

You pay for what you use. That's the deal. Except it's not.

When you use an AI model — GPT-4, Claude, Gemini — you do not pay per word. You pay per token. And that tiny technical detail is quietly costing you, depending on which company you choose, up to 60% more for the exact same request.

60% Extra cost for non-English speakers 420× Price gap between cheapest & priciest model 0 Standardization across providers

What Is a Token, Really?

Before we get to the money, a crash course. Tokens are not words. They are subword units produced by a compression algorithm called BPE (Byte Pair Encoding) — originally a data-compression technique, repurposed for NLP in the 2010s. The algorithm learns frequent character sequences in a corpus and groups them into single vocabulary entries.

The catch: every AI company trains its own tokenizer on its own corpus with its own vocabulary size. The result is that the same word gets sliced differently depending on who's counting:

OpenAI · tiktoken "unbelievable" un believ able Total tokens 3 Google · SentencePiece "unbelievable" ▁un believable Total tokens 2 Anthropic · Proprietary "unbelievable" un be liev able Total tokens 4

Same word. Three different prices. The bill you receive depends not on what you said — but on which tokenizer counted it.

The Dirty Secret — Tokens Are Not Standardized

... continue reading