Skip to content
Tech News
← Back to articles

An NSFW filter for Marginalia search

read original more articles
Why This Matters

This article highlights the development of a fast, CPU-efficient NSFW filter for Marginalia Search, addressing the challenge of balancing speed and accuracy in content moderation. By implementing a neural network approach from scratch, the project emphasizes the importance of tailored solutions for real-time search environments, especially when integrating safety filters without compromising performance.

Key Takeaways

… optional, that is.

I’ve been working on an NSFW filter for Marginalia Search, as that is something some people have asked for, primarily API consumers.

The search engine has had some domain based filtering for a while, based on the UT1 lists, but that isn’t a very comprehensive approach.

We’ll land on a single hidden layer neural network approach, implemented from scratch, but before landing on that, many other things were tried along the way.

This is largely an abbreviated account of the way there.

There is a tension between speed and generality in classification.

Building something that is both fast and reasonably correct in its assessments is incredibly fiddly work, even if the solution itself is often pretty straightforward.

The main limiting constraint for a filter that runs in a search engine is that it needs to be really fast and run well on CPUs.

This immediately disqualifies transformer-based models and other state-of-the art approaches, capable as they are they check neither of those boxes.

Fasttext

... continue reading