Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.
But a lot of folks are saying: “No, you can just use Server-Sent Events (SSE) with Last-Event-ID to get a durable stream, it’s easy”. And yes, all of this is do-able. But I contest that it’s easy. So let’s walk through how to do it, and you can decide for yourself.
Catch up on the previous article and discussion https://news.ycombinator.com/item?id=47832720
The advanced chatbot features I want to walk through are:
Resumable streams — refresh the page mid-response and get the in-progress tokens back, instead of waiting for the full response to land in the database.
— refresh the page mid-response and get the in-progress tokens back, instead of waiting for the full response to land in the database. Cancellations — stopping the LLM mid-response when the user changes their mind, even though the connection is now allowed to drop and reconnect.
— stopping the LLM mid-response when the user changes their mind, even though the connection is now allowed to drop and reconnect. Multi-device — open the same conversation on a second device or browser, and have it pick up the in-flight response and any new prompts in realtime.
Each of these is do-able on SSE. Whether they’re easy is what we’re going to find out.
Tokens vs. the API responses
Tokens are the individual pieces of text that LLMs generate, but the actual responses you get back from LLM providers have a bunch more stuff in the. The responses have slightly different structure and format, but pretty much all follow a similar pattern.
... continue reading