PewDiePie goes all-in on self-hosting AI using modded GPUs, with plans to build his own model soon — YouTuber pits multiple chatbots against each other to find the best answers

PewDiePie has built a custom web UI for self-hosting AI models called "ChatOS" that runs on his custom PC with 2x RTX 4000 Ada cards, along with 8x modded RTX 4090s with 48 GB of VRAM. Running open-source models from Baidu and OpenAI, PewDiePie made a "council" of bots that voted on the best responses, and then built "The Swarm" for data collection that will become the foundation of his own model coming next month.

Once the poster boy for gaming on YouTube, he has settled into a semi-retired life in Japan with his wife, Marzia. While he no longer uploads as frequently, and his content has shifted from exaggerated, reaction-channel-style videos to more family vlogs, it seems his love for computing has reemerged. Felix was never known to be particularly tech-savvy, but he's gone on a crazy arc as of late — de-Googling his life, building his first gaming PC, and learning how to write code. His latest act is one of decentralization: self-hosting AI models and eventually building his own.

STOP. Using AI Right now - YouTube Watch On

In a new YouTube video, Felix explained how his "mini data center" is helping fuel medical research. He's donating compute from his 10-GPU system to Folding@home so scientists can use it to run protein folding simulations, and he's created a team so other people can join with their computers to contribute as well. It's a noble cause, but PewDiePie wanted to venture into unknown territory and explore the other, obvious thing you can do when you have a lot of GPUs — running AI.

Felix's computer has 2x RTX 4000 Ada cards, along with 8x modded RTX 4090s with 48 GB of VRAM, totaling his memory pool out to roughly 256 GB, which is enough to run many of the largest models today. That's exactly what he did, starting out with Meta's LLaMA 70B, then jumping to OpenAI's GPT-OSS-120B, which he said ran surprisingly well and felt “just like ChatGPT but much faster.” This is where he first described his web UI called ChatOS, which he custom-built to interact with models using vLLM.

(Image credit: PewDiePie on YouTube)

To truly “max out,” he tried Qwen 2.5-235B, one of Baidu’s newer models, which typically requires over 300 GB of VRAM at full precision. Felix managed to get it running by using quantization, which dynamically reduces the bit precision of each layer, compressing the model without affecting functionality. This lets him handle context windows of up to 100,000 tokens —essentially the length of a textbook —something very rare for locally run LLMs.

This is where Felix jokingly says the model has too much power, as it was coded in front of him so fast that it made him feel insecure about learning programming. But he turned that dread around and put it to use for his own plans. “The machine is making the machine,” claimed Pewds, since now he was asking it for code to add extra features to ChatOS.

(Image credit: PewDiePie on YouTube)

Felix demoed his web UI, adding search, audio, RAG, and memory to Qwen. As soon as the model gained access to the internet, the answers became expectedly more accurate. He added RAG (Retrieval-Augmented Generation), which lets the AI perform deep research — basically looking up one thing and then branching out to find related info, mimicking how a human might use Google. But this wasn't the coolest part of his AI; that award goes to memory.

... continue reading