Hierarchical Navigable Small World (HNSW) in PHP
Or: How to find a needle in a haystack without checking all the hay
In the previous article we saw how vectors can help find the right information simply by describing the concept we want to find.
We used Cosine Similarity to compare our request with all available documents, scanning them one by one until we found those with the highest similarity. Does this approach work? Yes. Is it fast? Sort of....
Imagine if a librarian, to find your book, had to read the titles of all 10 million volumes in a National Library. Even if they spent one millisecond per book, it would take hours. This is the problem of linear search ($O(N)$).
To solve this problem, we introduce a concept that will transform that search from hours to milliseconds: HNSW (Hierarchical Navigable Small World).
Small worlds and highways
Let's use an analogy to better understand the solution HNSW proposes: How do we find a specific point in a huge city if we come from very far away and don't know the direction?
Simple: in real life, we use a hierarchy of roads. That is, we start from a "broad" view of the map and subsequently "zoom in" towards the point of interest. Expressing this with an algorithm, we would:
Take the highway to get close to the region. Exit onto the state road to reach the city. Take the main streets for the neighborhood. Finally, use the side streets to find the house number.
... continue reading