Latest Tech News

Stay updated with the latest in technology, AI, cybersecurity, and more

Filtered by: daft Clear Filter

We Hit 100% GPU Utilization–and Then Made It 3× Faster by Not Using It

We recently used Qwen3-Embedding-0.6B to embed millions of text documents while sustaining near-100% GPU utilization the whole way. That’s usually the gold standard that machine learning engineers aim for… but here’s the twist: in the time it took to write this blog post, we found a way to make the same workload 3× faster, and it didn’t involve maxing out GPU utilization at all. That story’s for another post, but first, here’s the recipe that got us to near-100%. The workload Here at the Daft