It’s make or break time for AI labeling systems

We’re about to find out if the systems designed to make deepfakes and AI-generated content easy to spot are actually up to snuff. SynthID and C2PA Content Credentials, two distinct technologies for invisibly tagging image, video, and audio files with information about their origins, are getting their biggest expansion to date, and with it, the opportunity to turn the tide against unlabeled AI fakery that’s deceiving people online.

Yesterday during its I/O conference, Google announced that the ability to verify whether images carry SynthID markers — the invisible watermarking system that’s applied to content generated by Google AI models — is coming to Chrome and Search. That’s significant because Chrome absolutely dominates the global market share for web browsers and search engines, so AI verification tools are being shoved in front of more eyeballs. It also streamlines the checking process; if you currently want to check an image for SynthID markers, you’re expected to upload it to the Gemini app.

Not only that, but Google’s verification interfaces will now also check if these files contain C2PA information — provenance metadata that’s embedded into content at the point of creation to tell us how it was made or manipulated and if AI tools were used during the process. This C2PA adoption allows users to check suspicious images from a single interface instead of jumping between the Gemini app and dedicated C2PA verification portals since files might have only one type of label or neither.

Now Google provides the best of both worlds. Image by The Verge

This is the sort of collaborative effort we’ve been waiting for. While both systems work differently, both Google and the Content Authenticity Initiative (which exists to promote the C2PA standard) have made similar claims about what’s needed for them to work: for everyone to be onboard. That means more AI models need to embed this data, and online platforms where AI fakery is most often shared need to clearly display that information. For the latter, having verification tools built into the web browser could serve as a workaround on websites that don’t check or present AI metadata to their users.

OpenAI is also getting involved with this expansion, announcing yesterday that it will now embed SynthID into images generated by ChatGPT, Codex, and the OpenAI API. The company already includes C2PA metadata in generated content, but I’ve found that this is often stripped out when posted to other platforms. OpenAI itself has also wanted to temper expectations about C2PA, despite being a steering member of C2PA and now reaffirming its commitment to the standard. This is what OpenAI said on its C2PA help page, prior to it being updated to include SynthID yesterday:

“Metadata like C2PA is not a silver bullet to address issues of provenance. It can easily be removed either accidentally or intentionally. For example, most social media platforms today remove metadata from uploaded images, and actions like taking a screenshot can also remove it. Therefore, an image lacking this metadata may or may not have been generated with ChatGPT or our API.”

For something that’s considered to be the very best of content authenticity tech, that sounds incredibly flimsy. Even Google describes C2PA as the industry standard, and it’s being pitched to global governments as a solution to appease AI transparency and labeling requirements. But despite being increasingly adopted by AI, hardware, and software providers, I rarely see it successfully used to verify AI fakery in the wild. SynthID seems more robust by comparison because it can’t be easily stripped out — for how limited its reach is compared to C2PA, I can recall several instances where fact-checkers and media agencies have cited its use in debunking deepfakes online.

Both C2PA and SynthID can work cooperatively to cast a wider safety net. This isn’t an industry that would benefit from a verification standards war, but Google has a clear opportunity here to prove whether its system is more reliable and poach some of the spotlight that C2PA has clawed for itself. To prevent this from happening, C2PA needs to prove it can actually be used to demystify where the content we see online is coming from.

Such an opportunity has already presented itself: Google announced yesterday that Meta will start using C2PA metadata to tag images on Instagram that have been captured by a camera. Meta hasn’t responded to our questions about what this will look like or what cameras will be supported, though I presume it will involve labels that say something like “captured on Pixel 10,” akin to the “sent from my iPhone” notes applied to emails. This would effectively help Instagram users to differentiate “real” photos from convincing AI fakery, which plays into the future predicted by Instagram head Adam Mosseri regarding the need to move away “from assuming what we see is real by default.”

... continue reading

It&#8217;s make or break time for AI labeling systems

It’s make or break time for AI labeling systems