Find Related products on Amazon

Shop on Amazon

People are just as bad as my LLMs

Published on: 2025-10-28 05:16:26

Last year I created a fun little experiment where I asked a bunch of LLMs to rank 97 hackernews users using their comment history based on whether they would be good candidates for the role of “software engineer at google”. (yes yes, seems silly I know, you can read part 1 and part 2 but they are long). In it, I had a persistent problem of bias. I had arranged the comments in an interleaved fashion like this: Person one: What makes you think that? Person two: When I was a lad I remember stories of when... Person one: Great post! I particularly like the things and stuff ... The users aren’t responding to each other, that’s just how I arranged the comments in the prompt. I didn’t give the model the users names for obvious reasons. Then the model says who it prefers, and using many pairwise comparisons we can come up with a ranking (similar to the way chess rankings work). However, I noticed an odd bias. Even though which user was named “Person one” in the prompt was random, the mod ... Read full article.