Find Related products on Amazon

Shop on Amazon

Fast(er) regular expression engines in Ruby

Published on: 2025-07-27 06:00:39

Introduction With modern, overengineered, and over-obfuscated websites, we at SerpApi face increasing challenges with extracting data from them. Beside the usual HTML parsing, sometimes we're literally forced to fall back to good 'ol regular expressions, e.g. for extracting embedded JS data. And while regexps do the trick, they might come at a cost. Onigmo, the default regexp engine in Ruby, while substantially updated in Ruby 3.2, still has weak points that may really upset in terms of scan time, adding latency to our search requests. Let's find out what alternatives are available in the wild and how they compare to Ruby. Contenders re2 It's developed by Google, and it's widely used in various Google products. Under the hood it uses what they call "an on-the-fly deterministic finite-state automaton algorithm based on Ken Thompson's Plan 9 grep". It is stated that re2 was designed with an explicit goal of being able to handle regular expressions from untrusted sources, i.e. to be ... Read full article.