Sanitizing HTML is the practice of taking a piece of HTML and removing some unwanted elements and attributes. Most often this is done to allow user-generated content with HTML but without causing XSS bugs. When imported from a library, a sanitizer typically looks like this:
const clean = DOMPurify . sanitize ( input ); context . innerHTML = clean ;
However, the API that we are building doesn't look like this at all. The core feature of the Sanitizer API is actually just Element.setHTML(input) .
This blog post will explain why.
To do so, we have to study the two lines of code from the DOMPurity example above. They result in the following steps:
Take an input string (and optionally a list of allowed elements as parameter). Parse the input into an HTML fragment (no context element given). Traverse the HTML fragment and remove elements as configured. Serialize the remaining fragment into a string. Parse the sanitized string (again), this time with context as context node into a fragment. Insert the new fragment below context in the DOM tree.
Quick exercise for the reader: Can you spot where line 1 ( DOMPurify.sanitize() ) stops and line 2 (the innerHTML assignment) starts?
Solution DOMPurify.sanitize() includes steps 1 through 4. The innerHTML assignment. is steps 5-6.
This is pretty similar to the Sanitizer that I wanted to build into the browser:
const mySanitizer = new Sanitizer ( /* config */ ); //XXX This never shipped. context . innerHTML = Sanitizer . sanitize ( input );
... continue reading