On a breezy San Francisco afternoon last Saturday, I found myself at a nondescript coworking space filled with shoeless coders. Just over a hundred visitors had crowded into an office building in the Duboce Triangle neighborhood for a showdown that would pit teams armed with AI coding tools against those made up of only humans (all were asked to ditch their shoes at the door). The hackathon was dubbed “Man vs. Machine,” and its goal was to test whether AI really does help people code faster—and better. Roughly 37 groups were randomly assigned “human” or “AI-supported.” Later, an organizer told me several people dropped out after being placed on the human team. A panel of judges would rank projects based on four criteria: creativity, how useful it might be in the real world, technical impressiveness, and execution. Only six teams would make it to the demo. The winning team would earn a $12,500 cash prize and API credits from OpenAI and Anthropic. Second place would get $2,500. A group working on an AI tool for pianists to get performance feedback. Courtesy of Karina Bao and Sebastián Vogelmann AI coding has been somewhat of a lightning rod in Silicon Valley. While fears of an engineering apocalypse abound, a new study from METR—an AI research nonprofit that cohosted the hackathon—found that AI tools actually slowed experienced open source developers by 19 percent. The weekend hackathon was meant to take METR’s research a step further. While the study looked at experienced coders working on existing codebases, at this event, some of the participants had very little coding experience and everyone would be proposing new projects. Many studies on developer productivity use metrics like the number of pull requests or lines of code written, says Joel Becker, a member of the technical staff at METR. But these numbers can be hard to interpret. Writing more code or sending off more pull requests isn’t always better. Similarly, when we look at AI performance, even if a model scores 80 or 90 percent on a given benchmark, it’s not always clear what that means in terms of its practical abilities. Becker bets the machine will win. With 8 hours to submit a project, attendees jam away. Courtesy of Karina Bao and Sebastián Vogelmann Organizers randomly select participants for the “machine” or “man” groups. Courtesy of Karina Bao and Sebastián Vogelmann Crunch Time In a Slack channel for the event, contestants pitched ideas to try to attract potential teammates: an AI tool for pianists to get performance feedback, an app to track what you’re reading, and a platform to help neighbors connect. One contestant, Arushi Agastwar, is a student at Stanford studying AI ethics. She first started coding in eighth grade but has since taken a break to focus on evaluating AI’s impact on society. Agastwar was randomly selected to be on the human team, and she decided to build a framework that evaluates sycophancy (like the agreeableness that plagued OpenAI’s GPT-4o) in AI models. “I have a feeling that some of the ideas that are going to be coming out from the ‘man’ teams are going to be really profound, and I'm hopeful that the demo aspect is not the only thing that the judges will be impressed by,” Agastwar tells me. Her initial bet was that a man team, i.e., one not using AI, would win. But several hours into the hackathon, she wasn’t so sure that she could complete the task by the 6:30 PM deadline.