Tech News
← Back to articles

Half million 'Words with Spaces' missing from dictionaries

read original related products more articles

Words with Spaces

There are nearly half a million compound phrases that aren’t in any dictionary—simply because they contain spaces. “Boiling water.” “Saturday night.” “Help me.” I got interested in this because I make word games. I wanted to understand when to include words with spaces—and the legacy effects of traditional dictionaries on our sense of “what is a word.”

Here’s a slider. Look at expressions at different familiarity levels. Gold words are missing from traditional dictionaries:

Slide the knob to see missing terms. Familiar Obscure Excluding long terms >11 chars Totally missing In Wiktionary In traditional dictionaries: MW Oxford Both

Crowd-sourced Wiktionary has 16 times more headwords than Merriam-Webster’s already hefty tabletop book. Yet even Wiktionary leaves gaps.

Focused on singletons

When dictionaries were planned and created, lexicographers focused on the building blocks of language—and overwhelmingly preferred individual words. Even the technical term is clinical: “multi-word expressions” (MWEs)—as if they’re a deviation from the norm.

Coverage drops as you go deeper MW +Wikt

Merriam-Webster (green) covers just 18% of the top 10,000 MWEs, dropping to 7% by 100K. Adding Wiktionary (yellow) brings coverage to 75%, but even that drops to 48%.

A few selected, non-obvious expressions (called “opaque compounds”) would be included if they seemed interesting enough. And dictionary pages were not wasted on self-evident combinations. But what was lost?

... continue reading