Modern large language models (LLMs) might write beautiful sonnets and elegant code, but they lack even a rudimentary ability to learn from experience.
Researchers at Massachusetts Institute of Technology (MIT) have now devised a way for LLMs to keep improving by tweaking their own parameters in response to useful new information.
The work is a step toward building artificial intelligence models that learn continually—a long-standing goal of the field and something that will be crucial if machines are to ever more faithfully mimic human intelligence. In the meantime, it could give us chatbots and other AI tools that are better able to incorporate new information including a user’s interests and preferences.
The MIT scheme, called Self Adapting Language Models (SEAL), involves having an LLM learn to generate its own synthetic training data and update procedure based on the input it receives.
“The initial idea was to explore if tokens [units of text fed to LLMs and generated by them] could cause a powerful update to a model,” says Jyothish Pari, a PhD student at MIT involved with developing SEAL. Pari says the idea was to see if a model’s output could be used to train it.
Adam Zweiger, an MIT undergraduate researcher involved with building SEAL, adds that although newer models can “reason” their way to better solutions by performing more complex inference, the model itself does not benefit from this reasoning over the long term.
SEAL, by contrast, generates new insights and then folds it into its own weights or parameters. Given a statement about the challenges faced by the Apollo space program, for instance, the model generated new passages that try to describe the implications of the statement. The researchers compared this to the way a human student writes and reviews notes in order to aid their learning.
The system then updated the model using this data and tested how well the new model is able to answer a set of questions. And finally, this provides a reinforcement learning signal that helps guide the model toward updates that improve its overall abilities and which help it carry on learning.
The researchers tested their approach on small and medium-size versions of two open source models, Meta’s Llama and Alibaba’s Qwen. They say that the approach ought to work for much larger frontier models too.
The researchers tested the SEAL approach on text as well as a benchmark called ARC that gauges an AI model’s ability to solve abstract reasoning problems. In both cases they saw that SEAL allowed the models to continue learning well beyond their initial training.
... continue reading