Reading Time: 13 minutes
Intro - the Integration Problem
We talk all the time about what Kafka is, but not so much about why it is the way it is.
What better way than to dive into the original motivation for creating Kafka?
Circa 2012, LinkedIn’s original intention with Kafka was to solve a data integration problem.
LinkedIn used site activity data (e.g. someone liked this, someone posted this) for many things - tracking fraud/abuse, matching jobs to users, training ML models, basic features of the website (e.g who viewed your profile, the newsfeed), warehouse ingestion for offline analysis/reporting and etc.
The big takeaway is that many of these activity data feeds are not simply used for reporting, they’re a dependency to the website’s core functionality.
As such, they require very robust infrastructure.
Their old infrastructure was not robust.
It mainly consisted of two pipelines:
... continue reading