Bridget McCormack is used to correcting judges’ work. As the former chief justice on the Michigan Supreme Court, it was her job to review complaints about how judges at the lower courts failed to consider key evidence or rule on certain aspects of a case.
In her current job, McCormack is working on a new kind of legal decision-maker. Like a judge, it would make mistakes. But unlike many judges, it wouldn’t be burdened by more casework than it had hours in the day. It could make sure to always show its work, check that each side agreed it understood all the facts, and ensure it ruled on each issue at play. And it wouldn’t be human — it’s made of neural networks.
McCormack leads the American Arbitration Association, which has developed an AI Arbitrator to help parties settle document-based disputes in a low-cost way. The system is built on OpenAI’s models to walk parties in arbitration through their dispute and draft a decision on who should win the case and why. The system deals only with cases that rely solely on documents, and there’s a human in the loop at every stage, including in the final step of issuing an award. But McCormack believes even with these caveats, the process can make dispute resolution faster and more accessible, greasing the wheels of an overburdened legal system.
Generative AI frequently makes headlines for its failures in the courtroom. Last year, at least two federal judges had to issue mea culpas and come up with new policies after issuing court orders with made-up facts, thanks to the use of generative AI. Academics warn that AI’s legal interpretations are not as straightforward as they can seem, and can either introduce false information or rely on sources that would never be legally admissible otherwise. AI tools have been shown to import or exacerbate human biases without careful consideration, and the public’s skepticism of the tools could further threaten trust in the justice system.
Optimists like McCormack, meanwhile, see huge potential upsides for bringing speedier justice to the American legal system, even as they see an enduring role for human decision-makers. “Most small and medium businesses in the United States can’t afford legal help at all, and one dispute can put them under,” she says. “So imagine giving all of those businesses a way to resolve disputes and move forward with their business in a way that they could navigate, afford, and manage on their own.” She and others are balancing a difficult question: Can a new technology improve a flawed and limited justice system when it has flaws and limitations of its own?
How judges use AI today
While high-profile failures have garnered the most attention, courts are using AI in ways that mostly fly under the radar. In a review of AI use in the courts, Daniel Ho, faculty director at Stanford’s RegLab, and former research fellow Helena Lyng-Olsen found AI was already being used in the judicial system for both administrative and judicial tasks. Administrative court staff, for example, use AI for things like processing and classifying court filings, basic employee or customer support, or having AI monitor social media keywords for threats to judicial staff. Judges or their staff might use generative AI tools for lower-risk use cases like asking a large language model (LLM) to organize a timeline of key events in a case, or perform a search across both text and video exhibits. But they also use them for higher-risk tasks, according to Ho and Lyng-Olsen, like relying on AI for translations or transcriptions, anticipating the potential outcome of a case, and asking an LLM for legal analysis or interpretation.
Some of the technology used in courts predates the modern generative AI era. For example, judges have been using algorithmic risk assessments for years to help evaluate whether to release a defendant before trial. These tools already raised questions about whether algorithms could encode human bias. A 2016 ProPublica investigation revealed that not only were these algorithms not very good at predicting who would go on to commit violent crimes, they also disproportionately assessed Black defendants as high risk compared to white defendants, even when ProPublica controlled for other factors like criminal history and age. Newer LLM systems introduce entirely new concerns, particularly a propensity to make up information out of whole cloth — a phenomenon known as hallucination. Hallucinations have been documented in legal research tools like LexisNexis and Westlaw, which have integrated generative AI in an effort to help lawyers and judges find case law more efficiently.
Despite these risks, at least one prominent judge has promoted the use of LLMs: Judge Kevin Newsom, who sits on the 11th Circuit Court of Appeals. In 2024, Newsom issued a “modest proposal” in a concurring opinion, which he recognized “many will reflexively condemn as heresy.” Newsom’s pitch was for judges to consider that generative AI tools — when assessed alongside other sources — could help them analyze the ordinary meaning of words central to a case.
Newsom’s test case was a dispute that hinged partly on whether installing an in-ground trampoline could be considered “landscaping,” entitling it to coverage under an insurance policy. Newsom, a self-described textualist, wanted to understand the ordinary meaning of the word “landscaping.” He found myriad dictionary definitions lackluster. Photos of the in-ground trampoline didn’t strike him as “particularly ‘landscaping’-y,” but this unscientific gut feeling bothered the jurist whose entire philosophy is based around a strict adherence to the meaning of words. Then, “in a fit of frustration,” Newsom said to his law clerk, “I wonder what ChatGPT thinks about all this.”
... continue reading