Spell Checking a Year's Worth of Hacker News

The Spelling of Others

18th of February, 2026

We're using language models to automate the wrong things; rather than removing creativity and human agency, let's automate the boring stuff. Independently, it would be great if there was a way to use language models to be more kind to other people, instead of trying to directly derive economic value from them all the time. So I was wondering: What is something that I do that is (1) boring, (2) automatable, and (3) kind? One answer is spel checkng. When reading blogs, I sometimes find errata and mail them to the author. It's repetitive, automatable, and usually the author appreciates the gesture.

The problem decomposes into the following steps. First we have to get inbound, maybe a couple thousand pieces of writing that plausibly have errors in them. Then, we identify the errors and the author's mail address using a language model. Finally, we send out our well-meaning spam.

As for the source of blogs of potentially receptive but still error-prone authors, the Hacker News front page is ideal. It is also famously easy to query; we use the Algolia endpoint. As a first filter for websites we don't care about, we sort out about 100 hand-picked common websites that are not blogs. The links are then crawled, and fed to a language model. The model's job is to classify whether the page belongs to a single person (we only want to help out individuals for now) and to list spelling errors with confidence scores. The latter is why modern language models are the enabling technology here; spellcheckers have been around forever, but only now can we specify what kinds of errors we have in mind, and get meaningful probabilities of how sure the model is. When an error is found, another model is tasked with finding the email address, with a budget of two more hops on the author's website. Since we want to operate at scale, and don't need that much intelligence, we use a small model, Haiku 4.5. We fetch the posts on today's front page as a pilot.

Curiously, even for a task as simple as spellchecking (in 2026!) it's hard to get the model to consistently output what was intended. British vs. American spelling, slang, creative neologisms, stylizations ("oh nooooo" vs. "oh no"), text encoding mishaps and more all lead to false positives. Even worse, once in a while the model fails for no reason in particular, flagging something completely correct. Luckily the confidence score plus some prompting affords enough maneuvering room to bring the false positive rate way down. We don't care much about false negatives, so I didn't spend much time debugging for those. However, even after many iterations, the system is not robust enough to skip manual review; I still need to have at least a quick look at every flagged error. As for fetching email addresses, there are also many edge cases to consider, but these are more from the realm of web crawling, which is beside the point here.

The pilot led to 3/30 posts with addressable errors and available addresses. We are now ready to think about how we want the emails to read, which is arguably the most important part. Importantly, I decided not to hide behind the automation and put my name to the emails, and send them from my school account. I also want to make it as easy as possible to correct the error for the site owner, so the context of the error is also included for easy searching.

Hi {name}, Lennart blog ) here, I was reading {article name} and noticed a spelling error: {wrong} should probably be {right} (in "{context}") All the best, keep up the good work Lennart

And analogously with an HTML list for multiple errors. The wording is a bit embellished in terms of my investment in the article, but since I ultimately manually check every error, I felt comfortable writing it like this. Also, since every post made it at least to the second page of HN, and I generally would want to encourage anyone writing a blog to continue, it seems fine to indiscriminately call everybody's work good.

Before sending the first mails, I stopped for a moment to reassess whether this was a good idea. What ultimately helped me decide was whether I would like to get the mail as an author. The answer was a definite yes; errors make a blog appear less professional, and when readers send me a version of the above email, I'm happy. So I ventured to send mail.

... continue reading