Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs
(news.ycombinator.com)
1.
2.
4.
5.
6.
Is fun at work overrated?
(feeds.feedburner.com)
7.
I Won a Championship That Doesn't Exist
(news.ycombinator.com)
8.
9.
Monitoring LLM behavior: Drift, retries, and refusal patterns
(venturebeat.com)
10.
11.
Evaluating large language models for accuracy incentivizes hallucinations
(feeds.nature.com)
12.
13.
Duolingo was evaluating its workers’ AI use. Workers pushed back.
(feeds.feedburner.com)
14.
Show HN: Continual Learning with .md
(news.ycombinator.com)
16.
Wit, unker, Git: The lost medieval pronouns of English intimacy
(news.ycombinator.com)
17.
A Digital Compute-in-Memory Architecture for NFA Evaluation
(news.ycombinator.com)
18.
Smart people recognize each other – science proves it
(news.ycombinator.com)
19.
How to deal with a passive-aggressive colleague
(feeds.feedburner.com)
20.
General scales unlock AI evaluation with explanatory and predictive power
(feeds.nature.com)
21.
The story of Britain's oldest sweet, the Pontefract Cake (2019)
(news.ycombinator.com)
22.
23.
24.
Chroma Context-1: Training a Self-Editing Search Agent
(news.ycombinator.com)
25.
Show HN: Claude skill that evaluates B2B vendors by talking to their AI agents
(news.ycombinator.com)
26.
Gerard of Cremona
(news.ycombinator.com)
27.
I built an AI receptionist for a mechanic shop
(news.ycombinator.com)
28.
Our whole way of thinking about leadership is a century out of date
(feeds.feedburner.com)
29.
30.
VisiCalc Reconstructed
(news.ycombinator.com)