Skip to content
Tech News
← Back to articles

Updates to GitHub Copilot interaction data usage policy

read original get GitHub Copilot Subscription → more articles
Why This Matters

GitHub's updated data usage policy for Copilot emphasizes leveraging user interaction data to enhance AI coding assistance, promising more accurate and context-aware suggestions. Users can opt in or out, with the update primarily affecting free and pro tiers, aiming to improve model performance through real-world data. This shift underscores the importance of user data in advancing AI tools while highlighting user privacy choices.

Key Takeaways

Today, we’re announcing an update on how GitHub will use data to deliver more intelligent, context-aware coding assistance. From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out. Copilot Business and Copilot Enterprise users are not affected by this update.

Not interested? Opt out in settings under “Privacy.” If you previously opted out of the setting allowing GitHub to collect this data for product improvements, your preference has been retained—your choice is preserved, and your data will not be used for training unless you opt in.

This approach aligns with established industry practices and will improve model performance for all users. By participating, you’ll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production.

Real-world data = smarter models

Our initial models were built using a mix of publicly available data and hand-crafted code samples. This past year, we’ve started incorporating interaction data from Microsoft employees and have seen meaningful improvements, including increased acceptance rates in multiple languages.

The improvements we’ve seen by incorporating Microsoft interaction data indicate we can improve model performance for a more diverse range of use cases by training on real-world interaction data. Should you decide to participate in this program, the interaction data we may collect and leverage includes:

Outputs accepted or modified by you

Inputs sent to GitHub Copilot, including code snippets shown to the model

Code context surrounding your cursor position

Comments and documentation you write

... continue reading