We need to augment our command line tools and design APIs so they can be better used by LLM Agents. The designs are inadequate for LLMs as they are now – especially if you're constrained by the tiny context windows available with local models.
Agent APIs
Like many developers, I’ve been dipping my toes into LLM agents. I’ve done my fair share of vibe coding, but also I’ve been playing around with using LLMs to automate reverse engineering tasks mostly using mrexodia’s IDA Pro MCP , including extending it.
Developing an MCP interface is an interesting process. You need to walk the line between providing too much information to avoid filling the context windows but also providing enough information to reduce tool calls. We have a few APIs that are better than others, like get_global_variable_at , which takes an address, identifies the type, and returns the best string representation of that value based on that type. However, the function can fail, so we provide a second set of accessor methods ( data_read_dword , data_read_word , read_memory_bytes , etc). These accessor methods are fine, but they ignore type information – so we don’t want the LLM to use them first.
To mitigate this problem, we added some guidance into the docstrings:
@jsonrpc @idaread def data_read_byte( address: Annotated[str, "Address to get 1 byte value from"], ) -> int: """ Read the 1 byte value at the specified address. Only use this function if `get_global_variable_at` failed. """ ea = parse_address(address) return ida_bytes.get_wide_byte(ea)
This seems to have mostly worked, but these sorts of problems exist for all the APIs. We have the nice convenience function and we also have the more gnarly but more complete function and we want the LLM to use the convenience one first.
I like to do work with offline LLMs which have much smaller context windows, so having better APIs matters a lot.
These problems exist for command line tools also. If you watch Claude Code, you’ll see that it often uses head -n100 to limit the results apriori. It also gets lost about which directory it’s in, and it will frustratingly flail around trying to run commands in different directories until it finds the right one.
To keep Claude Code in line on my project, I’ve relied heavily on linters, build scripts, formatters, and git commit hooks. It’s pretty easy to get Claude Code to commit often by including it in your CLAUDE.md, but it often likes to ignore other commands like “make sure the build doesn’t fail” and “fix any failing tests”. All my projects have a .git/hooks/pre-commit script that enforces project standards. The hook works really well to keep things in line.
... continue reading