Feb 21 2026
Recently, I got nerd-sniped by this exchange between Jeff Dean and someone trying to query 3 billion vectors. I was curious to see if I could implement the optimal map-reduce solution he alludes to in his reply.
A vector is a list/array of floating point numbers of n dimensions, where n is the length of the list. The reason you might perform vector search is to find words or items that are semantically similar to each other, a common pattern in search, recommendations, and generative retrieval applications like Cursor which heavily leverage embeddings.
I started by writing an extremely naive implementation which made the following assumptions:
we have 3 billion searchable (document) vectors and ~1k query vectors (a number I made up)
Both of the vector sets are stored on disk in .npy format (simple format for storing numpy arrays
format (simple format for storing numpy arrays We’d like to compare each of the query vectors against the larger pool of document vectors and return the resulting similarity (dot product) for each of the vector combinations.
3k total reference vectors (to see if we could intially run this amount before scaling)
The vectors are of dimensionality (n) 768, a common dimensionality for many models that allow for similarity-based embedding queries
import numpy as np from loguru import logger import time import os # start with 3_000 vectors to keep things small total_vectors_num = 3_000_000_000 query_vectors_num = 1_000 def generate_random_vectors (num_vectors:int) -> np . array: logger . info( f "Generating { num_vectors } vectors..." ) rng = np . random . default_rng() vectors = rng . random((num_vectors, 768 )) return vectors def get_dot_products (vectors_file:np . array, query_vectors:np . array) -> list[np . array]: total_products_computed = 0 dot_products = [] for v in vectors_file: for qv in query_vectors: dot_product = v @ qv dot_products . append(dot_product) total_products_computed += 1 if total_products_computed % 100000 == 0 : logger . info( f "Total vectors processed: { total_products_computed } " ) return dot_products # Generate initial vectors and query vectors and write to disk doc_vectors = generate_random_vectors(total_vectors_num) query_vectors = generate_random_vectors(query_vectors_num) np . save( 'vectors.npy' , doc_vectors) # Load vectors from disk logger . info( "Loading file from disk..." ) vectors_file = np . load( 'vectors.npy' ) start_time = time . time() logger . info( "Getting dot products..." ) results = get_dot_products(vectors_file, query_vectors) end_time = time . time() logger . info( f "Execution time: { end_time - start_time : .4f } seconds" ) logger . info( f "Number of dot products computed: { len(results) } " )
... continue reading