Monitoring LLM behavior: Drift, retries, and refusal patterns
(venturebeat.com)
1.
2.
General scales unlock AI evaluation with explanatory and predictive power
(feeds.nature.com)