So, you want to chunk really fast?

we’ve been working on chonkie , a chunking library for RAG pipelines, and at some point we started benchmarking on wikipedia-scale datasets.

that’s when things started feeling… slow.

not unbearably slow, but slow enough that we started wondering: what’s the theoretical limit here? how fast can text chunking actually get if we throw out all the abstractions and go straight to the metal?

this post is about that rabbit hole, and how we ended up building memchunk.

what even is chunking?

if you’re building anything with LLMs and retrieval, you’ve probably dealt with this: you have a massive pile of text, and you need to split it into smaller pieces that fit into embedding models or context windows.

the naive approach is to split every N characters. but that’s dumb — you end up cutting sentences in half, and your retrieval quality tanks.

the smart approach is to split at semantic boundaries: periods, newlines, question marks. stuff that actually indicates “this thought is complete.”

"Hello world. How are you?" → ["Hello world.", " How are you?"]

why delimiters are enough

... continue reading