Skip to content
Tech News
← Back to articles

Full-Text Search with DuckDB

read original get DuckDB Full-Text Search Extension → more articles
Why This Matters

This article highlights the integration of full-text search (FTS) capabilities into DuckDB, a lightweight analytical database, enhancing its utility for searching large textual datasets. The addition of FTS features like stemming, stop word removal, and scoring algorithms positions DuckDB as a more powerful tool for data discovery and analysis, benefiting both developers and data analysts. As FTS becomes more accessible within DuckDB, it opens new possibilities for efficient, scalable search solutions in the tech industry.

Key Takeaways

Full-Text Search with DuckDB

Published on: 4/29/2026

Overview

This is a follow-up to my first post about DuckDB: A Dab of DuckDB. If you’re new to DuckDB, you may want to start there.

The basic DuckDB workflow of making a data source quickly and easily discoverable is incredibly powerful … but there are limits. Some use cases, like searching the contents of historical publications or a tranche of emails would be constrained by basic text queries. As mentioned in my first post, I’m interested in exploring some of the more powerful DuckDB features and in this post I’ll be focused on full-text search (FTS). I have a decent amount of experience using other FTS solutions, like Elasticsearch and Postgres (both with the in-built options and extensions like pgvector and pg_search). So, in this post I will take you through a quick tour of the current state of FTS in DuckDB.

An abbreviated FTS primer

A full FTS tutorial is outside the scope of this post and if you’re interested in learning more the Postgres docs are a worthwhile read.

FTS allows for more comprehensive and configurable queries than what can be achieved using SQL operators like = , ilike or regexen. Query scores can also be tuned using algorithms, like Okapi BM25, which is what DuckDB offers.

Index options:

- Stemming: reduces words to a common root and handles some forms of inflection (walk, walks, walked, walking, etc.) but there are gaps for unconventional forms (e.g. mice and mouse) - Stop words: removal of common "stop words" like "the", "and" and "of" whose presence may skew results - Strip accents: normalize "á", "ä", and "a"

... continue reading