Three highlights from Apple’s recent workshop on natural language processing

A few months ago, Apple hosted a two-day event that featured talks and publications on the latest advancements in natural language processing (NLP). Today, the company published a post with multiple highlights, and all the studies presented. Here’s the roundup. The Workshop on Natural Language and Interactive Systems 2025 took place on May 15-16, and the talks and publications focused on three key research areas related to NLP: Spoken Language Interactive Systems LLM Training and Alignment Language Agents During the event, multiple researchers from universities, institutes, labs, and research groups, including Allen Institute for AI, Imperial College of London, MIT, Harvard University, Stanford University, and Princeton University, presented their latest work. Some of these researchers also work in the industry, at companies including Microsoft, Amazon, Sony, Google, Tencent, Cohere, and, of course, Apple. Here are a few highlights of the talks, and a link to the full list of videos and papers presented at the event. These were two studies presented by Yarin Gal, an associate professor at the University of Oxford, and the UK AI Security Institute Director of Research. The first, AI Model Collapse, explored how there is a limit to how much longer the web will serve as a viable source of data for LLM training, since increased use of these models will lead to more model-generated content being published online. He explained that while training LLMs on such synthetic data may pose a collapse risk, as it will affect their knowledge and reasoning capabilities, this can be fixed with the development of new tools to distinguish between AI-generated and human-generated content, as well as better regulations and further studies on how LLMs shape society. His second study, Detecting LLM Hallucinations, proposes a novel approach to identifying the level of confidence of the LLM, as it generates different portions of an answer. In a nutshell, the idea is to have the model generate multiple answers, and then cluster these answers by semantic meaning. This would allow for a more precise calculation of the level of certainty and accuracy of the answer, and it is a framework that can be adapted to more long-form conversations. This talk, presented by Apple Machine Learning researcher Kevin Chen, showcased an agent his team trained on a method called Leave-one-out proximal policy optimization, or LOOP. The agent was trained to perform multiple-step tasks, based on prompts such as this one: ‘I went on a trip with friends to Maui recently. I have maintained a note of money I owe to others and others owe me from the trip in simple note. Make private venmo payments or requests accordingly. In the payments/requests, add a note, “For Maui trip”.’ During the first half of the talk, Chen showed that, since this task involved multiple frameworks and knowledge dependencies, an agent might not be able to accurately perform what’s been requested. But with LOOP, which iteratively learns from its own past actions and is trained to maximize its reward as it observes itself, the request was performed with fewer errors and assumptions. Chen further explains that the model was trained on 24 different scenarios, but has limitations, such as not supporting multi-turn user interactions. This talk, by Apple Engineering Manager and Technical leader Irina Belousova, showcased the benefits of speculative decoding, which allows for a computationally cheaper way to generate answers with a small model that are as high-quality as those generated by large models. In essence, the small model generates candidate sequences of answers, which are then run by a large model. If the model accepts the answer, its job is done. This allows for less memory usage, faster performance, and requires fewer parameters when compared to similar models. What’s more, this approach “simplifies deployment by removing the complexity of managing, aligning, and switching between multiple models during inference,” which means it requires a simpler infrastructure. This particular study offers many technical details that are worth checking out. The presentation is just over 8 minutes long, but it offers very interesting insights. Click here to check out the videos Apple highlighted, and see the full list of studies from the event. Accessory deals on Amazon

Three highlights from Apple’s recent workshop on natural language processing

Share this article

Related Articles