Full-Text Search with DuckDB

Published on: 4/29/2026

Overview

This is a follow-up to my first post about DuckDB: A Dab of DuckDB. If you’re new to DuckDB, you may want to start there.

The basic DuckDB workflow of making a data source quickly and easily discoverable is incredibly powerful … but there are limits. Some use cases, like searching the contents of historical publications or a tranche of emails would be constrained by basic text queries. As mentioned in my first post, I’m interested in exploring some of the more powerful DuckDB features and in this post I’ll be focused on full-text search (FTS). I have a decent amount of experience using other FTS solutions, like Elasticsearch and Postgres (both with the in-built options and extensions like pgvector and pg_search). So, in this post I will take you through a quick tour of the current state of FTS in DuckDB.

An abbreviated FTS primer

A full FTS tutorial is outside the scope of this post and if you’re interested in learning more the Postgres docs are a worthwhile read.

FTS allows for more comprehensive and configurable queries than what can be achieved using SQL operators like = , ilike or regexen. Query scores can also be tuned using algorithms, like Okapi BM25, which is what DuckDB offers.

Index options:

- Stemming: reduces words to a common root and handles some forms of inflection (walk, walks, walked, walking, etc.) but there are gaps for unconventional forms (e.g. mice and mouse) - Stop words: removal of common "stop words" like "the", "and" and "of" whose presence may skew results - Strip accents: normalize "á", "ä", and "a"

... continue reading