Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

2026-04-02 | original

read original get AMD Radeon Graphics Card → more articles

Why This Matters

AMD's Lemonade introduces a fast, open-source local LLM server leveraging GPU and NPU, enabling advanced AI model deployment on high-memory systems. This development is significant for the tech industry as it promotes accessible, high-performance AI infrastructure, empowering developers and enterprises to run sophisticated models locally. For consumers, it means more efficient, customizable AI tools that can be tailored to specific needs without relying on cloud services.

Key Takeaways

Supports large models like gpt-oss-120b and Qwen-Coder-Next.
Use --no-mmap to improve load times and increase context size.
Leverages GPU and NPU for faster AI processing.

Chat

What can I do with 128 GB of unified RAM?

Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.

What should I tune first?

You can use --no-mmap to speed up load times and increase context size to 64 or more.

Explore topics: amd lemonade gpt-oss-120b qwen-coder-next npu