Chat
What can I do with 128 GB of unified RAM?
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.
What should I tune first?
You can use --no-mmap to speed up load times and increase context size to 64 or more.
AMD's Lemonade introduces a fast, open-source local LLM server leveraging GPU and NPU, enabling advanced AI model deployment on high-memory systems. This development is significant for the tech industry as it promotes accessible, high-performance AI infrastructure, empowering developers and enterprises to run sophisticated models locally. For consumers, it means more efficient, customizable AI tools that can be tailored to specific needs without relying on cloud services.
Chat
What can I do with 128 GB of unified RAM?
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.
What should I tune first?
You can use --no-mmap to speed up load times and increase context size to 64 or more.