Steering interpretable language models with concept algebra
(news.ycombinator.com)
1.
2.
Show HN: Steerling-8B, a language model that can explain any token it generates
(news.ycombinator.com)
3.
Guide Labs debuts a new kind of interpretable LLM
(techcrunch.com)