How to setup a local coding agent on macOS

I'd had my internet fail a few times recently leaving me stranded without a coding agent, and so when I saw the "Gemma 4 now runs 2x faster with MTP" Multi-Token Prediction update for Gemma 4 I decided to have a go at getting it running.

I wanted a local coding agent setup that:

was fast enough to actually use on my Mac

worked through an OpenAI compatible API (so I could use it in other tools)

and preferably could handle screenshots/images when needed, so I can feed it screenshots of what it has made.

And I did! This video is realtime. And shows the agent responding at a perfectly usable speed.

After a bit of testing the final setup I ended up with is:

llama.cpp built with Metal on macOS

Gemma 4 26B-A4B in GGUF format

A Q8 MTP draft model for speculative decoding

... continue reading