Skip to content
Tech News
← Back to articles

How to setup a local coding agent on macOS

read original more articles
Why This Matters

This guide demonstrates how to set up a high-performance local coding agent on macOS, leveraging advanced models and hardware acceleration. Such setups are crucial for developers seeking faster, more reliable AI tools that work offline and integrate seamlessly with their workflows, enhancing productivity and privacy.

Key Takeaways

I'd had my internet fail a few times recently leaving me stranded without a coding agent, and so when I saw the "Gemma 4 now runs 2x faster with MTP" Multi-Token Prediction update for Gemma 4 I decided to have a go at getting it running.

I wanted a local coding agent setup that:

was fast enough to actually use on my Mac

worked through an OpenAI compatible API (so I could use it in other tools)

and preferably could handle screenshots/images when needed, so I can feed it screenshots of what it has made.

And I did! This video is realtime. And shows the agent responding at a perfectly usable speed.

After a bit of testing the final setup I ended up with is:

llama.cpp built with Metal on macOS

Gemma 4 26B-A4B in GGUF format

A Q8 MTP draft model for speculative decoding

... continue reading