Fast(er) regular expression engines in Ruby
Published on: 2025-07-27 06:00:39
Introduction
With modern, overengineered, and over-obfuscated websites, we at SerpApi face increasing challenges with extracting data from them. Beside the usual HTML parsing, sometimes we're literally forced to fall back to good 'ol regular expressions, e.g. for extracting embedded JS data. And while regexps do the trick, they might come at a cost.
Onigmo, the default regexp engine in Ruby, while substantially updated in Ruby 3.2, still has weak points that may really upset in terms of scan time, adding latency to our search requests.
Let's find out what alternatives are available in the wild and how they compare to Ruby.
Contenders
re2
It's developed by Google, and it's widely used in various Google products. Under the hood it uses what they call "an on-the-fly deterministic finite-state automaton algorithm based on Ken Thompson's Plan 9 grep". It is stated that re2 was designed with an explicit goal of being able to handle regular expressions from untrusted sources, i.e. to be
... Read full article.