1.
2.
A Man Who Reads Books for a Living (One Every Two Days)
(news.ycombinator.com)
3.
I’ve Hired Hundreds of People — Here’s the Trait I Look For Before Anything Else
(feeds.feedburner.com)
4.
Even (very) noisy LLM evaluators are useful for improving AI agents
(news.ycombinator.com)
5.
The worst job interview I ever had
(news.ycombinator.com)
6.
7.
Charity – Categorical programming language (1998)
(news.ycombinator.com)
8.
Monitoring LLM behavior: Drift, retries, and refusal patterns
(venturebeat.com)
9.
10.
Evaluating large language models for accuracy incentivizes hallucinations
(feeds.nature.com)
11.
Duolingo was evaluating its workers’ AI use. Workers pushed back.
(feeds.feedburner.com)
12.
A Digital Compute-in-Memory Architecture for NFA Evaluation
(news.ycombinator.com)
13.
Smart people recognize each other – science proves it
(news.ycombinator.com)
14.
General scales unlock AI evaluation with explanatory and predictive power
(feeds.nature.com)
15.
16.
Show HN: Claude skill that evaluates B2B vendors by talking to their AI agents
(news.ycombinator.com)
17.
Our whole way of thinking about leadership is a century out of date
(feeds.feedburner.com)
18.
19.
Gemini 3.1 Pro
(news.ycombinator.com)
20.
C++26: Std:Is_within_lifetime
(news.ycombinator.com)
21.
Don't Trust the Salt: AI Summarization, Multilingual Safety, and LLM Guardrails
(news.ycombinator.com)
22.
Enterprises are measuring the wrong part of RAG
(venturebeat.com)
23.
Claude Code daily benchmarks for degradation tracking
(news.ycombinator.com)
24.
Claude Code Daily Benchmarks for Degradation Tracking
(news.ycombinator.com)
25.
Counterfactual evaluation for recommendation systems
(news.ycombinator.com)
26.
27.
28.
OpenEvolve: Teaching LLMs to Discover Algorithms Through Evolution
(news.ycombinator.com)
29.
Opinion | The High Cost of Learning
(feeds.content.dowjones.io)
30.
Saturn (YC S24) Is Hiring Senior AI Engineer
(news.ycombinator.com)