Jamesob's guide to running SOTA LLMs locally

2026-07-03 | original

read original more articles

Why This Matters

This guide highlights how consumers and industry professionals can run state-of-the-art large language models (LLMs) locally, emphasizing cost-effective hardware configurations and optimized setups. It underscores the growing accessibility of advanced AI models outside cloud environments, empowering users with greater control, privacy, and customization.

Key Takeaways

Custom hardware setups can reduce costs and improve performance for running SOTA LLMs locally.
PCIe4 switches enable faster GPU communication, enhancing model training and inference efficiency.
Ready-to-run Docker configurations simplify deploying advanced speech-to-text and language models on local machines.

jamesob's guide to running SOTA LLMs locally

Note: nothing in this README aside from the tables was written by AI.

Have $2k burning a hole in your pocket and want some local, state-of-the-art machine intelligence? How about $40k?

If Dario and Altman are giving you heartburn (they should be), read on to figure out how to run this new kind of computing locally.

In this repo you'll find

the hardware I use to run SOTA locally, why I bought what and little-known SECRETS for configuring it,

how I run speech-to-text (STT) locally,

ready-to-run configuration for running models I think are good within Docker containers.

Contents

My setup

... continue reading

Explore topics: llm nvidia rtx tensor parallelism speech-to-text docker