Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
(news.ycombinator.com)
1.
2.
What's in a GGUF, besides the weights – and what's still missing?
(news.ycombinator.com)
3.
Hugging Face Packages Weaponized With a Single File Tweak
(darkreading.com)
4.
Show HN: TRiP – a complete transformer engine in C built from scratch just by me
(news.ycombinator.com)
5.
TurboQuant KV Compression and SSD Expert Streaming for M5 Pro and IOS
(news.ycombinator.com)
6.
Unsloth Studio
(news.ycombinator.com)
7.
Unsloth Dynamic 2.0 GGUFs
(news.ycombinator.com)