We locate web content using special addresses called URLs. We are all familiar with addresses like https://google.com. Sometimes, URLs can get long and they can become difficult to read. Thus, we might be tempted to format them
like so in HTML using newline and tab characters, like so:
< a href = "https://lemire.me/blog/2026/02/21/ how-fast-do-browsers-correct-utf-16-strings/" > my blog post a >
It will work.
Let us refer to the WHATWG URL specification that browsers follow. It makes two statements in sequence.
If input contains any ASCII tab or newline, invalid-URL-unit validation error. Remove all ASCII tab or newline from input.
Notice how it reports an error if there is a tab or newline character, but continues anyway? The specification says that A validation error does not mean that the parser terminates and it encourages systems to report errors somewhere. Effectively, the error is ignored although it might be logged. Thus our HTML is fine in practice.
The following is also fine:
< a href = "https://go ogle.c om" class = "button" > Visit Google a >
You can also use tabs. But you cannot arbitrarily insert any other whitespace.
... continue reading