Monitoring LLM behavior: Drift, retries, and refusal patterns
(venturebeat.com)
1.
2.
General scales unlock AI evaluation with explanatory and predictive power
(feeds.nature.com)
3.
Ask HN: How are people doing AI evals these days?
(news.ycombinator.com)