I made a search engine worse than Elasticsearch (2024)
Published on: 2025-06-07 20:37:20
I want you to share in my shame at daring to make a search library. And in this shame, you too, can experience the humility and understanding of what a real, honest-to-goodness, not side-project, search engine does to make lexical search fast.
BEIR is a set of Information Retrieval benchmarks, oriented around question-answer use cases.
My side project, SearchArray adds full text search to Pandas. So naturally, to see stand in awe at my amazing developer skills, I wanted to use BEIR to compare SearchArray to Elasticsearch (w/ same query + tokenization). So I spent a Saturday integrating SearchArray into BEIR, and measuring its relevence and performance on MSMarco Passage Retrieval corpus (8M docs).
… and 🥁
Library Elasticsearch SearchArray NDCG@10 0.2275 0.225 Search Throughput 90 QPS ~18 QPS Indexing Throughput 10K Docs Per Sec ~3.5K Docs Per Sec
… Sad trombone 🎺
It’s worse in every dimension
At least NDCG@10 is nearly right, so our BM25 calculation is correct (probably due to n
... Read full article.