In modern computing, data exchange is foundational to everything from web browsing to microservices and IoT devices. The ability for different systems to represent, share, and interpret structured information drives our digital world. Yet no single perfect format has emerged to meet all needs. Instead, we've seen an evolution of data interchange formats, each addressing the specific challenges and technical requirements of its time.
This narrative traces three pivotal data formats: Extensible Markup Language (XML), JavaScript Object Notation (JSON), and Concise Binary Object Representation (CBOR). We explore their origins and motivations, examine their core design principles and inherent trade-offs, and follow their adoption trajectories within the evolving digital landscape. The journey begins with XML's focus on robust document structure, shifts to JSON's web-centric simplicity and performance, and advances to CBOR's binary efficiency for constrained devices. Understanding this evolution reveals not just technical specifications, but the underlying pressures driving innovation in data interchange formats.
Modern data interchange formats trace back not to the web, but to challenges in electronic publishing decades earlier. SGML provided the complex foundation that XML would later refine and adapt for the internet age.
In the 1960s-70s, IBM researchers Charles Goldfarb, Ed Mosher, and Ray Lorie created Generalized Markup Language (GML) to overcome proprietary typesetting limitations. Their approach prioritized content structure over presentation. GML later evolved into Standard Generalized Markup Language (SGML), formalized as ISO 8879 in 1986.
SGML innovated through its meta-language approach, providing rules for creating custom markup languages. It allowed developers to define specific vocabularies (tag sets) and grammars (Document Type Definitions or DTDs) for different document types, creating machine-readable documents with exceptional longevity independent of processing technologies.
SGML gained traction in sectors managing complex documentation: government, military (CALS DTD), aerospace, legal publishing, and heavy industry. However, its 150+ page specification with numerous special cases complicated parser implementation, limiting broader adoption.
The web's emergence proved pivotal for markup languages. Tim Berners-Lee selected SGML as HTML's foundation due to its text-based, flexible, non-proprietary nature. Dan Connolly created the first HTML DTD in 1992. While HTML became ubiquitous, it drifted toward presentation over structure, with proliferating browser-specific extensions. SGML remained too complex for widespread web use, creating demand for a format that could bring SGML's structural capabilities to the internet in a more accessible form.
By the mid-1990s, the web needed more structured data exchange beyond HTML's presentational focus. In 1996, the W3C established an XML Working Group, chaired by Jon Bosak of Sun Microsystems, to create a simplified SGML subset suitable for internet use while maintaining extensibility and structure.
The W3C XML Working Group developed XML with clear design goals, formalized in the XML 1 Specification (W3C Recommendation, February 1998):
Internet Usability: Straightforward use over the internet Broad Applicability: Support for diverse applications beyond browsers SGML Compatibility: XML documents should be conforming SGML documents Ease of Processing: Simple program development for XML processing Minimal Optional Features: Few or no optional features Human Readability: Legible and clear documents Rapid Design: Quick design process Formal and Concise Design: Formal specification amenable to standard parsing Ease of Creation: Simple document creation with basic tools Terseness is Minimally Important: Conciseness was not prioritized over clarity
... continue reading