Wikimedia, the nonprofit behind Wikipedia and sister sites like Wikimedia Commons and Wikidata, just made it easier for AI models to tap into its massive knowledge base.
Wikimedia Deutschland, the organization’s German chapter, released a new resource called the Wikidata Embedding Project. It takes the roughly 120 million open data points stored in Wikidata and converts them into a format that’s simpler for large language models to actually use.
Even though Wikidata’s structured data is already machine-readable, it hasn’t been directly compatible with generative AI systems, which are built to work with natural language.
The new project translates Wikidata entries into vectors, which are basically numerical coordinates that show how different statements relate to each other.
Think of it like a map where closely linked terms like “dog” and “puppy” cluster together, while unrelated ones like “dog” and “bank account” are much farther apart. This helps AI systems understand terms in context and process them more effectively in natural language.
The project is designed to give AI models higher-quality information that leads to more reliable answers, Wikimedia Deutschland said in a press release. It said most AI systems currently rely on opaque datasets.
A secondary goal is to level the playing field. By making Wikidata freely available, Wikimedia says it hopes smaller AI companies can compete with tech giants that would otherwise have the resources to vectorize the data themselves.
“The launch of the embedding project shows that powerful AI does not have to be controlled by a handful of companies – it can be developed openly and collaboratively,” said Wikidata AI project manager Philippe Saadé in a statement.
Wikimedia Deutschland has been working on the project since September 2024 in collaboration with Jina AI, which built the embedding system that turns Wikidata entries into vectors, and IBM’s DataStax, which stores those vectors in its database.
In contrast, the release landed just a day after Elon Musk took to X to announce he’s building a Wikipedia rival called Grokipedia.
... continue reading