Stop using natural language interfaces

Natural language is a wonderful interface, but just because we suddenly can doesn't mean we always should.

Natural language is a wonderful interface, but just because we suddenly can doesn't mean we always should. LLM inference is slow and expensive, often taking tens of seconds to complete. Natural language interfaces have orders of magnitude more latency than normal graphic user interfaces. This doesn't mean we shouldn't use LLMs, it just means we need to be smart about how we build interfaces around them.

The Latency Problem

There's a classic CS diagram visualizing latency numbers for various compute operations: nanoseconds to lock a mutex, microseconds to reference memory, milliseconds to read 1 MB from disk. LLM inference usually takes 10s of seconds to complete. Streaming responses help compensate, but it's slow.

Compare interacting with an LLM over multiple turns to filling in a checklist, selecting items from a pulldown menu, setting a value on a slider bar, stepping through a series of such interactions as you fill out a multi-field dialogue. Graphic user interfaces are fast, with responses taking milliseconds, not seconds. But. But: they're not smart, they're not responsive, they don't shape themselves to the conversation with the full benefits of semantic understanding.

This is a post about how to provide the best of both worlds: the clean affordances of structured user interfaces with the flexibility of natural language. Every part of the above interface was generated on the fly by an LLM.

Popup-MCP

This is a post about a tool I made called popup-mcp (MCP is a standardized tool-use interface for LLMs). I built it about 6 months ago and have been experimenting with it as a core part of my LLM interaction modality ever since. It's a big part of what has made me so fond of them, from such an early stage. Popup provides a single tool that when invoked spawns a popup with an arbitrary collection of GUI elements.

You can find popup here, along with instructions on how to use it. It's a local MCP tool that uses stdio, which means the process needs to run on the same computer as your LLM client. Popup supports structured GUIs made up of elements including multiple choice checkboxes, drop downs, sliders, and text boxes. These let LLMs render popups like the following:

The popup tool supports conditional visibility to allow for context-specific followup questions. Some elements start hidden, only becoming visible when conditions like 'checkbox clicked', 'slider value > 7', or 'checkbox A clicked && slider B < 7 && slider C > 8' become true. This lets LLMs construct complex and nuanced structures capturing not just their next stage of the conversation but where they think the conversation might go from there. Think of these as being a bit like conditional dialogue trees in CRPGs like Baldur's Gate or interview trees as used in consulting. The previous dialog, for example, expands as follows:

... continue reading