Subtleties of SQLite Indexes Sep 29, 2025 In the last 6 months, Scour has gone from ingesting 330,000 pieces of content per month to over 1.4 million this month. The massive increase in the number of items slowed down the ranking for users' feeds and sent me looking for ways to speed it up again. After spending too many hours trying in vain to squeeze more performance out of my queries and indexes, I dug into how SQLite's query planner uses indexes, learned some of the subtleties that explained why my initial tweaks weren't working, and sped up one of my main queries by ~35%. Scour's items table Scour is a personalized content feed that finds articles, blog posts, etc related to users' interests. For better and for worse, Scour does its ranking on the fly whenever users load their feeds page. Initially, this took 100 milliseconds or less, thanks to binary vector embeddings and the fact that it's using SQLite so there is no network latency to load data. The most important table in Scour's database is the items table. It includes an ID, URL, title, language, publish date (stored as a Unix timestamp), and a text quality rating. Scour's main ranking query filters items based on when they were published, whether they are in a language the user understands, and whether they are above a certain quality threshold. The question is: what indexes do we need to speed up this query? Don't bother with multiple single-column indexes When I first set up Scour's database, I put a bunch of indexes on the items table without really thinking about whether they would help. For example, I had separate indexes on the published date, the language, and the quality rating. Useless. It's more important to have one or a small handful of good composite indexes on multiple columns than to have separate indexes on each column. In most cases, the query planner won't bother merging the results from two indexes on the same table. Instead, it will use one of the indexes and then scan all of the rows that match the filter for that index's column. It's worth being careful to only add indexes that will be used by real queries. Having additional indexes on each column won't hurt read performance. However, each index takes up storage space and more indexes will slow down writes, because all of the indexes need to be updated when new rows are inserted into the table. If we're going to have an index on multiple columns, which columns should we include and what order should we put them in? Index column order matters The order of conditions in a query doesn't matter, but the order of columns in an index very much does. Columns that come earlier in the index should be more "selective": they should help the database narrow the results set as much as possible. In Scour's case, the most selective column is the publish date, followed by the quality rating, followed by the language. I put an index on those columns in that order: CREATE INDEX idx_items_published_quality_lang ON items ( published , low_quality_probability , lang ); ...and found that SQLite was only using one of the columns. Running this query: EXPLAIN QUERY PLAN SELECT id , low_quality_probability FROM items WHERE published BETWEEN $ 1 AND $ 2 AND low_quality_probability <= $ 3 AND lang IN ( SELECT lang FROM user_languages WHERE user_id = $ 4 ) Produced this query plan: QUERY PLAN |--SEARCH items USING COVERING INDEX idx_items_published_quality_lang (published>? AND published? AND published? AND published? AND published? AND published? AND published