Skip to content
Tech News
← Back to articles

Show HN: DuckDB community extension for prefiltered HNSW using ACORN-1

read original get DuckDB HNSW Extension → more articles
Why This Matters

This extension enhances DuckDB's vector similarity search by integrating ACORN-1 filtered HNSW, allowing for more accurate and efficient filtered queries. It addresses a key limitation where filters were applied post-search, improving result accuracy and recall, especially in large-scale, filtered datasets. This development is significant for the tech industry as it enables faster, more precise search capabilities within embedded database systems, benefiting applications like recommendation engines and data analytics.

Key Takeaways

This is a fork of duckdb/duckdb-vss that adds ACORN-1 filtered HNSW search. The upstream extension has a critical limitation: WHERE clauses are applied after the HNSW index returns results, so SELECT ... WHERE category = 'X' ORDER BY distance LIMIT 10 often returns fewer than 10 rows. This fork pushes filter predicates into the HNSW graph traversal using the ACORN-1 algorithm, ensuring filtered queries return the correct number of results with high recall. What changed: Filter predicates are evaluated during HNSW graph traversal, not after

ACORN-1 two-hop expansion through failed neighbors recovers graph connectivity under selective filtering

Selectivity-based strategy switching: >60% selectivity uses post-filter, 1-60% uses ACORN-1, <1% uses brute-force exact scan

Per-node expansion threshold (Lucene's 90% rule) skips two-hop when the neighborhood is already well-connected

Configurable thresholds: SET hnsw_acorn_threshold = 0.6 and SET hnsw_bruteforce_threshold = 0.01 Benchmark (228k movies, 768-dim Nomic embeddings): Filter Selectivity Upstream ACORN-1 English only ~60% ~10/10 10/10 Japanese only ~3% 0-1/10 10/10 Korean only ~1% 0/10 10/10 Rating >= 8.0 ~5% 0/10 10/10 Query: movies similar to The Matrix, filtered by language → returns Matrix Revolutions, Gunhed (ja), Savior of the Earth (ko). See test/benchmark/movies_real_benchmark.sql for the full benchmark.

Original README follows.

Vector Similarity Search for DuckDB

This is an experimental extension for DuckDB that adds indexing support to accelerate Vector Similarity Search using DuckDB's new fixed-size ARRAY type added in version v0.10.0. This extension is based on the usearch library and serves as a proof of concept for providing a custom index type, in this case a HNSW index, from within an extension and exposing it to DuckDB.

Filtered Search (ACORN-1)

This fork adds support for filtered vector search. Queries with WHERE clauses now push filter predicates into the HNSW index traversal:

... continue reading