We recently used Qwen3-Embedding-0.6B to embed millions of text documents while sustaining near-100% GPU utilization the whole way.
That’s usually the gold standard that machine learning engineers aim for… but here’s the twist: in the time it took to write this blog post, we found a way to make the same workload 3× faster, and it didn’t involve maxing out GPU utilization at all. That story’s for another post, but first, here’s the recipe that got us to near-100%.
The workload
Here at the Daft kitchen, the same order keeps coming in: “One fast, painless pipeline to get my documents into a vector database for retrieval!”
Heard.
We whipped up a sample workload that:
1 . Reads millions of text documents from S3 2 . Chunks them into sentences using spaCy 3 . Compute embeddings with the state-of-the-art model Qwen3-Embedding-0.6B 4 . Writes the results to turbopuffer
Mise en place
Before starting, let’s install the required dependencies:
1 pip install "daft[ray]" turbopuffer torch sentence - transformers spacy accelerate transformers 2 python - m spacy download en_core_web_sm
... continue reading