At the Wikimedia Foundation, we believe that access to knowledge is a human right. Our mission is to ensure everyone, everywhere can access and share reliable information freely and openly on Wikipedia and other Wikimedia projects. Access to free and open knowledge, supported by the fundamental right to freedom of expression, empowers people to exercise many other rights enshrined in the Universal Declaration of Human Rights, including the rights to education, artistic expression, economic advancement, and political participation. Today, we are sharing a human rights impact assessment (HRIA) on artificial intelligence (AI) and machine learning (ML) that was carried out in 2024 to help the Foundation and Wikimedia volunteer communities better understand how these technologies may affect the exercise of human rights in our ecosystem.
Wikimedia projects, and Wikimedia volunteers everywhere, occupy a unique space in today’s online information ecosystem. This ecosystem is, however, rapidly evolving. The introduction and rapid advancement of emerging technologies such as large language models (LLMs) and other kinds of generative AI introduce opportunities as well as challenges related to the creation, access, and distribution of information. Generative AI is fundamentally changing how the public seeks, receives, and imparts information and ideas online, raising novel questions about the role and responsibility of the Foundation and Wikimedia volunteer communities in this ecosystem.
AI and ML are neither new to Wikimedia projects nor to the Wikimedia volunteers who make these projects possible. Both the Foundation and volunteer communities have developed numerous ML tools to support volunteers in contributing, editing, and curating the ever-growing volume of knowledge across the projects as far back as 2010. Several of these tools have harnessed ML and AI to assist volunteers with frequently recurring tasks such as identifying vandalism or flagging when citations are needed. Most tools currently used were developed before the introduction of generative AI. In the age of these emerging technologies, Wikimedia volunteers are contending with new questions:
What, if any, role should AI play in terms of the knowledge shared on Wikimedia projects?
Given the widespread use of generative AI on the internet, how can we protect and strengthen the accuracy and integrity of knowledge on the Wikimedia projects?
How can ML and AI tools help strengthen, not replace, what humans do best: creating, cultivating, and sharing free knowledge?
How can LLMs and AI tools be used to translate content into new languages, while preserving reliability and cultural nuance and context?
How should the volunteer communities’ policies evolve to account for such uses of these new technologies?
About the AI/ML Human Rights Impact Assessment (HRIA)
This HRIA is the latest outcome of the Foundation’s ongoing efforts to meet our commitment to protect and uphold the human rights of all those who interact with Wikimedia projects. The Foundation commissioned it to identify and analyze the impacts, opportunities, and risks emanating from the use of AI and ML technologies in the Wikimedia ecosystem. The report was written and compiled by Taraaz Research, a specialized research and advocacy organization working at the intersection of technology and human rights. In developing the report, Taraaz consulted Foundation staff, individual volunteers, volunteer affiliates, civil society organizations, and external subject matter experts, though the report does not represent the views or shared consensus of any of these groups. Instead, the report offers suggestions for further inquiry, policy, and technology investment based on the state of the Wikimedia projects and technology from October 2023 to August 2024 when the research was conducted. Furthermore, the findings in the report represent potential areas of risk and opportunity. The report does not identify any actual observed risks, harms, opportunities, or benefits that have resulted from the use of ML or AI technologies on Wikimedia projects.
What are the findings of this report?
This report considered risks emanating from three different categories of issues relating to AI/ML on Wikimedia projects: tools developed in-house by Foundation staff to support the work of volunteer editors; Generative AI (GenAI) and its potential for marginal human rights risks in the Wikimedia context; and content on Wikimedia projects that may be used for external machine learning (ML) development.
It is important to note that the findings contained in this report reflect potential harms that could occur in the future. The report does not find that any such harms have occurred. Rather, it explains that these harms could occur if AI is employed or leveraged at scale in certain ways on Wikimedia projects without proper mitigations in place.
The report found that AI/ML tools developed by the Foundation to support volunteer editors have the potential to contribute positively to several human rights, such as freedom of expression and the right to education, among others. Nonetheless, certain risks exist that stem from known limitations of AI/ML-enabled tools: for example, the possibilities of perpetuating or amplifying existing gaps and biases in knowledge representation or incorrectly flagging or marking content for deletion. Such risks, if they were to materialize at scale, could have negative impacts on the human rights of Wikimedia volunteers.
Furthermore, the report considered in broad terms what new risks external GenAI tools could introduce to Wikimedia projects. The researchers determined that GenAI could increase the scale, speed, and sophistication of harmful content generation, including for disinformation campaigns and to attack individual Wikimedia volunteers or their communities. These tools could also automate the creation of misleading content across multiple languages simultaneously, making its detection and moderation more challenging, and play a role in generating large volumes of personalized, abusive content targeting specific individuals or communities. These risks, among others identified, could negatively affect the human rights of Wikimedia volunteers and, even, the general public if not properly mitigated.
Finally, the report considered the downstream risks of how content from Wikimedia projects are used in the training of large language models (LLMs). While the Wikimedia Foundation cannot control how freely and openly licensed content from the Wikimedia projects is used by the general public, we do have a duty to safeguard risks to human rights that could result from downstream impacts. The researchers identified concerns about how the outputs of LLMs partially trained on Wikimedia content could represent risk in terms of bias and representation, data quality and accuracy, privacy risks, and issues related to cultural sensitivity. As such, they recommended monitoring for these potential risks, although they also found that ongoing data-quality initiatives and equity-focused programs already mitigate the risks in question, since these programs address content and representation gaps across language communities.
Within each of these focus areas, the report notes that the Foundation and Wikimedia volunteer communities have also already implemented many strategies and processes to mitigate the identified risks while providing recommendations for additional mitigation measures as well. Given the prominence of Wikimedia projects in the online information ecosystem, it is critical that we consider new risks emerging from technologies as rapidly evolving and growing as AI and ML. Importantly, the discussions and conclusions in this report allow us to contemplate such potential harms early and to plan how we can best mitigate them proactively.
What does this HRIA report mean for the Wikimedia projects and volunteer communities?
Since we published our first HRIA in July 2022, the Foundation has been clear that implementing many of these reports’ recommendations requires the buy-in and collaboration of the global volunteer communities. It will take time to discuss this HRIA’s findings and recommendations with the volunteer communities in order to decide how best to work together on their implementation, but our actions will be more effective for having done so.
We are publishing this HRIA report to help the Foundation and volunteer communities explore and address the profound societal impacts that might come from the interaction of AI technologies and the Wikimedia projects in the coming years. Wikimedia communities around the world are already grappling with important decisions about how to establish clear policies for appropriate use of generative AI on the projects, or whether any such uses even exist. We hope that considering the risks and opportunities identified in this report will help guide community discussions and decisions to make sure that the projects can continue to contribute positively to the online information ecosystem and our global society.
How can Wikimedians learn more and give feedback?
We want to hear from you! What questions do you have? What are your thoughts on the risks and recommendations discussed in the report? What is your community already doing, or what would you like to do, to responsibly harness the benefits of AI and ML on Wikimedia projects?
Over the coming months, we will create opportunities to hear directly from your communities and you about the findings and recommendations of this report as well as your perspectives on the opportunities and risks associated with AI and ML in the Wikimedia ecosystem. You can already leave your thoughts and comments on the HRIA’s Talk page or join us at one of the following conversations on this topic:
Can you help us translate this article? In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out? Start translation