AI outperforms law professors in Stanford Law study

A groundbreaking study led by Stanford Law School Professor Julian Nyarko reveals that law professors overwhelmingly prefer AI-generated answers to student questions over responses written by their fellow instructors—a finding that could reshape how legal education is delivered.

The study, titled “Law Professors Prefer AI Over Peer Answers,” was conducted with 16 law professors across U.S. law schools and tested whether large language models could serve as effective tutors for contract law courses.In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

“This study challenges important assumptions about AI’s role in legal education,” said Nyarko, who leads Stanford Law School’s Legal Innovation through Frontier Technology Lab, or liftlab. He co-authored the paper with colleagues from Yale, NYU, University of Chicago, and other leading institutions. “We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity—not just factual recall.”

Can LLMs Reason?

The study is particularly notable because previous AI evaluations have focused primarily on subjects with clear right-or-wrong answers. Legal reasoning, by contrast, demands careful analysis of competing arguments and defensible conclusions.

“We were frankly surprised by the magnitude of the results,” Nyarko added. “These weren’t just simple questions with obvious answers. Many of them required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop their own analytical skills.”

Participants created 40 representative contracts law questions that students might ask after class or during office hours, wrote their own answers, and then evaluated responses without knowing whether they came from AI or other participating professors. The AI systems performed comparably to the best human instructor in the study.

Perhaps most striking: professors flagged AI responses as pedagogically harmful only 3.5% of the time, compared to 12% for peer-written answers.

“In most fields where AI gets tested, there’s a right answer. In law, there often isn’t.” said Sarath Sanga, co-author and professor at Yale Law School. “Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other’s arguments. In this case, the answer was yes.”

The research team took extensive precautions to ensure the study’s validity. They calibrated AI responses to match the length and structure of human answers, used multiple evaluation methods, and had professors assess whether responses might mislead or confuse students.

... continue reading