An LLM Query Understanding Service
Published on: 2025-09-05 06:46:59
We need to be cheating at search with LLMs. Indeed I’m teaching a whole course on this in July.
With an LLM we can implement in days what previously took months. We can take apart a query like “brown leather sofa” into the important dimensions of intent — “color: brown, material: leather, category:couches” etc. With this power all search is structured now.
Even better we can do this all without calling out to OpenAI/Gemini/…. We can use simple LLMs running in our infrastructure making it faster and cheaper.
I’m going to show you how. Let’s get started. Follow along in this repo.
The service - wrapping an open source LLM
We’ll start by deploying a FastAPI app that calls an LLM.
The code below is just a dummy “hello world” app talking to an LLM. We send a chat message over JSON, the LLM comes up with a response and we send it back.
Here’s the basic service:
from fastapi import FastAPI , Request from fastapi.responses import JSONResponse from llm_query_understand.llm import LargeL
... Read full article.