Tech News
← Back to articles

Data Activation Thoughts

read original related products more articles

data activation thoughts

The landscape is shifting in recent years — it’s a cliche to start texts like this these days, but the fact that it’s a cliche doesn’t make it any less true. In 2019, the folks at Andreessen Horowitz wrote this about data (in a piece titled The Empty Promise of Data Moats): “Instead of getting stronger, the defensible moat erodes as the data corpus grows and the competition races to catch up.” (Trying to prove some data has value — I’ve experienced it firsthand.)

LLMs have shifted where value comes from. It’s no longer enough to simply have proprietary data; what matters now is how effectively you can make that data useful to these systems (and therefore, to anything else that lives off that). So, if traditional data moats are eroding, the new competitive edge lies in data activation. The pressing question becomes: how quickly can you connect your proprietary data to LLMs in ways that demonstrably improve their performance (before someone else figures out how to replicate your insights without your data)?

Before we continue I want to think about a simple metaphor here — LLMs can ingest the data. They’ll happily consume every row and column you throw at them. But (and this is important) without the right transformation, they can’t metabolize it. The nutritional value passes through unabsorbed. They’re missing the “enzymes” I guess you can call it. Data activation is about providing those enzymes: converting raw information into a form the model can actually digest and turn into a capability.

Why this matters now (healthcare as case study)

Looking specifically at healthcare data, the opportunity is immense — and let’s face it, time limited. Looking at OpenAI’s report from January 2026: more than 5% of all ChatGPT messages globally are healthcare-related. 25% of weekly active users ask health-related questions. More than 40 million people turn to ChatGPT daily for healthcare guidance (!!!).

The big labs are clearly taking notice: within the span of a single week (January 2026), OpenAI launched “ChatGPT for Healthcare” (already rolling out to institutions like Cedars-Sinai, Memorial Sloan Kettering, and Stanford Medicine) and Anthropic announced “Claude for Healthcare” with HIPAA-ready infrastructure and native integrations to medical databases/ontologies (CMS Coverage Database, ICD-10, PubMed). To me, it looks like healthcare is now a primary battleground for frontier AI companies.

Yet, if you look at the numbers from OpenRouter, they claim that health remains “the most fragmented of the top categories”. What does this mean? According to OpenRouter, it signals both the domain’s complexity and the inadequacy of current general-purpose models.

One (potential) method for data activation

It seems that recent research already demonstrates that the bridge between structured medical data and improvements in LLM reasoning is working.

... continue reading