Using Vectorize to build an unreasonably good search engine in 160 lines of code

The tl;dr is that search got really good suddenly and really easy to build because of AI.

For instance, this is the search experience I recently made for my side project website Braggoscope.

Braggoscope is my unofficial directory of BBC Radio 4’s show In Our Time. There are over 1,000 episodes on all kinds of topics, like the Greek Myths or the Evolution of Teeth or Romeo & Juliet. It’s a static site built on GitHub Pages.

I can search for “Jupiter” and the episode about Jupiter comes back back.

But check it out! I can also search for “the biggest planet” and the same episode is at the top of the list. Semantic search like this used to be hard to build, and now get it for free by indexing our documents as embeddings and storing them in a vector database.

We’ll walk through building this search engine right now.

What are embeddings? What’s a vector database?

Embedding models are an adjacent technology to large language models.

Using an embedding model, you can convert any string of text (a word, a phrase, a paragraph, a document) into a vector. Think of an embedding vector as a coordinate in semantic space. Like an x,y vector is a coordinate in 2D space, an embedding vector is a coordinate in a much larger space, usually about 1,000 dimensions.

Indexing a set of documents is a matter of converting them all into vectors, using an embedding model, and storing them in a vector database.

... continue reading