How the U.S. National Science Foundation enabled Software-Defined Networking

This article summarizes the story of how SDN arose. So many research projects, papers, companies, and products arose because of SDN that it is impossible to include all of them here. The foresight of NSF in the early 2000s, funding a generation of researchers at just the right time, working closely with the rapidly growing hyperscalers, led quite literally to a transformation—a revolution—in how networks are built today. The commercial success of SDN drove further interest among academic researchers. The NSF and other government agencies, especially the Defense Advanced Research Project Agency (DARPA), sponsored further research on SDN platforms and use cases that continues to this day. The SDN research community broadened significantly, well beyond computer networking, to include researchers in the neighboring disciplines of programming languages, formal verification, distributed systems, algorithms, security and privacy, and more, all helping lay stronger foundations for future networks. These two high-profile use cases—multi-tenant virtualization and wide-area traffic engineering—drew significant commercial attention to SDN. Indeed, NSF-funded research led directly to the creation of several successful SDN start-up companies, including Big Switch Networks (open source SDN controllers and management applications, acquired by Arista), Forward Networks (network verification products), Veriflow (developed network verification products, acquired by VMware), and Barefoot Networks (programmable switches, acquired by Intel), to name a few. SDN influenced the large networking vendors, with Cisco, Juniper, Arista, HP, and NEC all creating SDN products. Today, AMD, Nvidia, Intel, and Cisco all sell P4-programmable products, and in 2019 about a third of papers appearing at ACM SIGCOMM were based on P4 or programmable forwarding. The hyperscalers used SDN to realize two especially important use cases. First, within a single datacenter, cloud providers wanted to virtualize their networks to provide a separate virtual network for each enterprise customer (or “tenant”) with its own IP address space and networking policies. The start-up company Nicira, which emerged from the NSF-funded Ethane project, developed the Network Virtualization Platform (NVP) 26 to meet this need. Nicira was later acquired by VMware and NVP became NSX. Nicira also created Open vSwitch (OVS), 33 an open source virtual switch for Linux, with an OpenFlow interface. OVS grew rapidly and became the key to enabling network virtualization in datacenters around the world. Second, the hyperscalers wanted to control traffic flows across their new private wide-area networks and between their datacenters. Google adopted SDN to control how traffic is routed in its B4 backbone, 23 , 39 using OpenFlow switches, controlled by ONIX, the first distributed controller platform. 27 When Google first described B4 at the Open Network Summit in 2012, it sparked a global surge in research and commercialization of SDN. There were so many papers at ACM SIGCOMM that a separate conference—Hot Topics in Software-Defined Networking (HotSDN, later SOSR) was formed. SDN adoption by cloud hyperscalers. In parallel with the early academic research on SDN, large technology companies such as Microsoft, Google, Amazon, and Facebook began building large datacenters full of servers that hosted these companies’ popular Internet services and, increasingly, the services of enterprise customers. Datacenter owners grew frustrated with the cost and complexity of the commercially available networking equipment; a typical datacenter switch cost more than $20,000 and a hyperscaler needed about 10,000 switches per site. They decided they could build their own switch box for about $2,000 using off-the-shelf switching chips from companies such as Broadcom and Marvell, and then use their own armies of software developers to create optimized, tailored software using modern software practices. Reducing cost was good, but it was control they wanted and SDN gave them a quick path to get it. Programmable Open Mobile Internet (POMI) Expedition: In 2008, the NSF POMI Expedition at Stanford expanded funding for SDN, including its use in mobile networks. POMI funded the early development of ONOS, an open source distributed controller, 8 and the widely used Mininet network emulator for teaching SDN and for testing ideas before deploying them in real networks. POMI also funded the first explorations of programmable forwarding planes, setting the stage for the first fully programmable switch chip 10 and the widely used P4 language. 9 Future Internet Design (FIND): In 2007, NSF started the FIND program to support new Internet architectures that could be prototyped and evaluated on the GENI test bed. The FIND program and its successor Future Internet Architecture (FIA) in 2010 expanded the community, working on clean-slate network architectures and fostering alternative designs. The resulting ideas were bold and exciting, including better support for mobility, content delivery, user privacy, secure cloud computing, and more. NSF’s FIND and FIA programs fostered many clean-slate network designs with prototypes and real-world evaluation, many leveraged SDN and improved its foundations. As momentum for clean-slate networking research grew in the U.S., the rest of the world followed suit, such as the EU Future Internet Research and Experimentation (FIRE) program. Global Environment for Network Innovation (GENI): NSF and researchers wanted to try out new Internet architectures on a nationwide, or global, platform. Computer virtualization was widely used to share a common physical infrastructure, so could we do the same for a network? In 2005, “Overcoming the Internet Impasse through Virtualization” proposed an approach. 5 The next year, NSF created the GENI program, with the goal of creating a shared, programmable national infrastructure for researchers to experiment with alternative Internet architectures at scale. GENI funded early OpenFlow deployments on college campuses, sliced by FlowVisor 35 to allow multiple experimental networks to run alongside each other on the same production network, each managed by their own experimental controller. This, in turn, led to a proliferation of new open source controllers (Beacon, POX, and Floodlight). GENI also led to a programmable virtualized backbone network platform, 6 and an experimental OpenFlow backbone network in Internet2 connecting multiple universities. This led to OpenFlow-enabled switches from Cisco, HP, and NEC. GENI funded the purchase of OpenFlow whitebox switches from ODM manufacturers and the open source software to manage them. NSF funded the NetFPGA project, which enabled experimental OpenFlow switches in Internet2. NSF brought together a community of researchers driven by much more than the desire to create experimental test beds; many researchers came to realize that programmability and virtualization were, in fact, key capabilities needed for future networks. 5 , 16 100×100 project: In 2003, the NSF launched the 100×100 project as part of its Information Technology Research program. The goal of the 100×100 project was to create communication architectures that could provide 100Mb/s networking for all 100 million American homes. The project brought together researchers from Carnegie Mellon, Stanford, Berkeley, and AT&T. One key aspect of the 100×100 project was the design of better ways to manage large networks. This research led to the 4D architecture for logically centralized network control of a distributed data plane 21 (which itself built upon and generalized the routing control platform work at AT&T 15 ), Ethane (a system for logically centralized control of access control in enterprise networks), 11 and OpenFlow (an open interface for installing match-action rules in network switches), 28 as well as the creation of the first open source network controller, NOX. 22 Early NSF-funded SDN research. In 2001, a National Academies report, Looking Over the Fence at Networks: A Neighbor’s View of Networking Research, 30 pointed to the perils of Internet ossification: an inability of networks to change to satisfy new needs. The report highlighted three dimensions of ossification: intellectual (backward compatibility limits creative ideas), infrastructure (it is hard to deploy new ideas into the infrastructure), and system (rigid architecture led to fragile, shoe-horned solutions). In an unprecedented move, the NSF set out to address Internet ossification by investing heavily over the next decade. NSF investments laid the groundwork for SDN. We describe NSF investments here, through the lens of the support we received in our own research groups. Importantly, these and other government-funded research programs fostered a community of researchers that together paved the way for commercial adoption of SDN in the years that followed. As a result, SDN revolutionized how networks are built and operated today—the public Internet, private networks in commercial companies, university networks and government networks, and all the way through to the cellular networks that interconnect our smartphones. The first commercial deployments of SDN started around 2008, and its success can be traced back to two intertwined developments that reinforced each other. The first was academic research funded mostly by the U.S. National Science Foundation (NSF). The second was cloud companies starting to build enormous datacenters, which required a new kind of network to interconnect thousands of racks of servers. In a virtuous cycle, the adoption of SDN by the hyperscalers drove further academic research, which in turn created more research, important new innovations, and several successful start-up companies. All of this changed with software-defined networking (SDN), where network owners took control over how their networks behaved. The key ideas were simple. First, network devices should offer a common open interface directly to their packet-forwarding logic . This interface allows separate control software to install fine-grained rules that govern how a network device handles different kinds of packets: which packets to drop, where to forward the remaining packets, how to modify the packet headers, and so on. Second, a network should have logically centralized control , where the control software has network-wide visibility and direct control across the distributed collection of network devices. Rather than running on the network devices themselves, the software can run on a separate set of computers that monitor and control the devices of a single network in real time. The Internet underlies much of modern life, connecting billions of users via access networks across wide-area backbones to countless services running in datacenters. The commercial Internet grew quickly in the 1990s and early 2000s because it was relatively easy for network owners to connect interoperable equipment, such as routers, without relying on a central administrative authority. However, a small number of router vendors controlled both the hardware and the software on these devices, leaving network owners with limited control over how their networks behave. Adding new network capabilities required support from these vendors and a multi-year standardization process to ensure interoperability across vendors. The result was bloated router software with tens of millions of lines of code, networks that were remarkably difficult to manage, and a frustratingly slow pace of innovation. SDN Grew First and Fastest in Datacenters The first large-scale deployments of SDN took place in hyperscale data centers, beginning about 2010. The story is best told by the hyperscaler companies themselves, and so we asked leaders at Google, Microsoft Azure, and Meta to tell their stories about why and how they adopted SDN. As you will see, they all started from the ideas and principles that came from the NSF-funded research; and each tailored SDN to suit their specific needs and culture. SDN at Google, as recounted by Amin Vahdat From its inception, Google infrastructure services focused on scaling distributed systems. Our services, including data processing, storage systems, and Web search, all ran across thousands of clustered servers, which were, in turn, connected across the planet by a private wide-area network (WAN). Existing Internet protocols were ideally suited to decentralized operation and control among many independent systems, and hence were optimized for autonomy rather than for the highest levels of performance or reliability. These protocols relied on pairwise message exchange, endpoint-based measurement of available capacity, and eventual consistency for network dynamics, including failure or expansion. Inspired by both our own distributed systems architecture and the work coming out of NSF-funded university research on optically reconfigurable topologies, datacenter architecture, and network virtualization,2,31 we observed that our network systems could be fundamentally more reliable, performant, and efficient through logically centralized control. Rather than running fully decentralized protocols on limited capacity embedded switch/router CPUs or relying on thousands of endpoints to independently converge to their fair share of network bandwidth, we observed that centralized controllers with a blueprint for what the network should look like would fundamentally deliver dramatically higher levels of reliability and scalability. Thus, we set out to systematically redesign our networks around the principles of SDN. We built B4,23,39 our private SDN WAN connecting all our datacenters to one another across the planet, initially to deliver 10x more bandwidth between sites than would be possible using conventional architectures leveraging two observations. First, we could use shallow buffer switches for much less space, power, and cost than traditional WAN routers by performing dynamic traffic engineering to respond to network dynamics. With a private WAN, we further had the freedom to modify all end-host-congestion control protocols to optimize for the dynamics and latencies of our WAN. For our datacenter networks, we built Jupiter36 to scale out to many petabits per second of bisection bandwidth per datacenter. The key observation in Jupiter was the ability to leverage massive multipath to deliver scalable bandwidth with very rapid reaction to inevitable individual switch and link failure when operating with thousands of switches and hundreds of thousands of links per building, all under logically centralized control. Finally, we extended SDN all the way to the public Internet with Espresso40 by using real-time, end-host measurements to determine the performance available through individual egress points from Google servers communicating with users across the globe. In effect, each service front end, such as those for Web search, encapsulates traffic to a Google front-end server running at the edge of our network. Since we peer with many networks at many locations across the globe, we can dynamically determine which ingress/egress point will likely deliver the best performance and reliability for a given client population based on rapidly changing, real-time, end-host measurements. Across all our work in SDN, we landed on two perhaps unexpected benefits. First, by logically separating the control plane from the data plane, we could build a replicated, server-based control system that could be upgraded in a fault-tolerant manner using make-before-break techniques. This allowed us to ship new features and functionality on a weekly basis to our global network. In the end, this rapid iteration speed was one of the biggest determiners of our speed. By moving every week rather than once every six to 12 months in maintenance windows, we could learn and improve our networks at a much faster rate. Second, our centralized view of network dynamics allowed us to respond to failures and congestion much more rapidly than standard decentralized protocols that relied on eventually consistent, and typically lagging, views of network state. Although initially focusing on cost and capability, we found our SDN designs delivered 10x-higher levels of reliability compared to congenital designs, in part also based on our ability to iterate very rapidly in production. This improved reliability was perhaps, in the end, the most gratifying aspect of Google’s SDN journey. SDN at Microsoft Azure, as recounted by Dave Maltz The economics of a cloud service provider are largely dependent on SDN—we buy very large quantities of rather homogeneous server capacity at a low cost, and then we use techniques like SDN to carve that vast capacity up into slices that allow each tenant to specify exactly the network policies and behaviors they want. While each cloud service provider has innovated to turn the ideas of SDN into scalable and reliable systems that handle billions of changes a day, the foundational principles date back to the NSF-funded research. Beyond the intellectual ideas of SDN, the human relationships enabled by NSF funding have been just as critical. For example, by funding the 100×100 project, the NSF brought together people across disciplines who went on to collaborate on many other projects that each had huge economic and technical impact. The VL2 paper20 that laid out the principles now used in most cloud and private datacenter networks stemmed from conversations between router architects and network-management researchers initially brought together by the 100×100 project. The inspiration was asking, “what would it take to operate an entire datacenter using the ideas employed inside a single router?” Economic analysis performed jointly by economists and technologists led to decisions on the infrastructure and architecture in which we should invest. NSF funded functions not only to call attention to areas that need more research but to nucleate groups of leaders who would otherwise not interact. The SDN concepts of separating the network control logic from the forwarding plane have influenced technologies far from their origin. For example, the idea of disaggregated networking holds that network architects should be able to select the network operating system that controls their routers independently from the router hardware itself, and they should be able to customize that network operating system at will. The Software for Open Networking in the Cloud (SONiC) operating system is deployed on millions of ports across major cloud and Internet service providers, private datacenters, and industrial and retail companies. SONiC has enabled the rapid development and deployment of new ideas for AI, optical, enterprise, and network monitoring, and it traces its root back to SDN. SDN at Meta, as recounted by Omar Baldonado The advent of SDN came exactly timed with the initial rapid expansion of Meta (then Facebook) and our infrastructure in the late 2000s and early 2010s. While the general networking industry had greatly expanded with the Web, we in Facebook’s networking team still required increased control and flexibility in our growing network. After studying some of the early successes in NSF-sponsored SDN research and early deployments, we started several of our own SDN-inspired programs. Breadth of inspiration. At Meta, we built on the SDN concepts of a data-plane API and a logically centralized control plane, and we developed hybrid control planes that leveraged both central controllers and distributed protocols. We deployed these systems throughout our global network, including in our wide-area IP backbone (Express Backbone,13 central controllers coupled with distributed, OpenR-based routing); in our Edge networks, where we connect to the Internet (Edge Fabric,34 a BGP-based controller that controls egress traffic); and in our datacenter networks, where we have centralized controllers working with our own distributed protocol implementations.1 Improving reliability and operations. More generally, SDN inspired us to rethink how software systems could be reimagined throughout the network lifecycle. Central network management systems had been a mainstay of networking pre-SDN, but with SDN, we more tightly coupled our central configuration and monitoring systems (Robotron and fbflow) with automation and control for higher network reliability. Open disaggregation. NSF’s GENI project and early commercial SDN adopters helped kick-start white-box switches. At Meta, we wanted to generalize this with an open source ecosystem, so we started the networking area within the Open Compute Project with several other hyperscaler companies to disaggregate the entire networking hardware and software stack. The OCP Networking Project fostered sharing hardware designs and hardware-related software. We now run our open-source FBOSS software12 on switches which leverage OCP’s switch abstraction interface (SAI). Networking for AI. In the last few years, we have continued to leverage the concepts of SDN as we build massive, highly performant networking clusters for AI workloads. For these AI clusters, we have leveraged and optimized the same FBOSS-based, Ethernet switches and centralized and disaggregated software systems for the training and inference needs of our models. As we look ahead, we’re effectively building global, networked supercomputers for AI.38 Overall, NSF’s foresight in funding SDN has inspired many innovations and open collaborations in networking for Meta over the last 10 to 15 years. As we expand our infrastructure to build even larger clusters for personalized artificial general intelligence (AGI) and the AI needs of Meta’s family of applications, we will continue leveraging SDN-inspired concepts and systems. The Internet Service Providers (ISPs) and telecommunication companies also had a strong interest in SDN. AT&T played a large role in its definition, engaging in research and early deployments in the mid 2000s. We invited Albert Greenberg, who was at AT&T at the time, to tell the story. SDN at AT&T, as recounted by Albert Greenberg In the 1990s at AT&T Labs, the guiding principles of SDN were born. AT&T’s global WAN (AT&T’s common backbone) had to be designed, built, and managed cost-effectively as mission-critical infrastructure. Moreover, AT&T’s common backbone had to adapt to stunning rates of Internet traffic growth, while being resilient to a huge array of operational failure modes—failures at the switches, at the links, in peer networks, and in customer networks. In the midst of constant and global capacity build-outs, failures, and repairs, the network had to deliver consistent, predictable, and high-quality service. Consider the building blocks. IP network mechanisms are designed to be super-adaptive—just plug in a device, let it learn its neighbors and reachable destinations, and then the traffic flows. Easy. This extreme simplicity and lack of central control made IP networks amazingly successful, relegating all earlier high-scale technologies to the dust bin of history. That said, these are just building blocks. To deliver high-quality service, systems and tools were needed to program the network, on top of the building blocks. At AT&T Labs, many such systems and tools were created, which are still at the heart of high-scale network design and operations across the industry—high-scale traffic matrix-estimation methods; network simulators and code emulators; massive data pipelines to ingest, clean, and make decisions from network telemetry; and programmable configuration systems based on network-wide views and network-wide objectives.15,21 Sadly, SDN in these early days did not support direct expression of customer or operator intent. For example, SDN methods maintaining service integrity during emergency maintenance required orchestration with humans in the loop. As AT&T entered the cloud era, APIs for expressing and realizing intent improved, in particular in bridging hundreds of thousands of AT&T customers into virtualized networks in the cloud with no human touch via NetBond, leveraging the Intelligent Routing Service Control Point (IRSCP) and earlier SDN tools.15,29 NSF-funded academic collaborations also helped fuel SDN research, at AT&T Labs and across the SDN landscape. At AT&T Labs, tremendous innovation arose, matching engineers, researchers, and interns working towards graduate degrees. Interns earned industry dollars for a few months, along with invaluable insights and reputation, which turned into their PhD theses and careers. Ideas and prototypes were validated in the NSF-sponsored developments at Emulab and in university labs behind projects such as the 100×100 Clean Slate project, many of which made their way into production. Out of the 100×100 Clean Slate project, the 4D paper emerged,21 one of the seminal papers that sparked the SDN revolution. The missing link needed to fulfill the 4D vision was device programmability, which arrived with amazing commodity or white-box switching hardware from Broadcom and others and powerful commodity servers from Intel and others. On that substrate, open source and proprietary software emerged in a flood of innovation, which moved to the cloud in the 2000s. Like-minded engineers working on the AT&T project (Greenberg, Maltz) and Emulab project (Patel) moved on to harvest that power for the cloud. SDN insights embodied in the VL2 paper20 paved the way for Microsoft Azure to create and program vast numbers of high-scale virtual networks in seconds and operate them reliably with previously unimaginable power and adaptivity. At the same time, at AT&T, the SDN revolution flourished under John Donovan for cellular mobile and wide-area networks, delivered in partnership with companies like DriveNets and Affirmed Networks (later acquired by Microsoft). Nicira was perhaps the startup that epitomized the SDN movement. It grew out of the NSF-funded 100 × 100 program and the Clean Slate Program at Stanford, based on the Ph.D. work of Martín Casado. Nicira developed ONIX, the first distributed control plane, used by Google in its infrastructure; OVS, the first OpenFlow-compliant software switch; and NVP (later NSX), the first network virtualization platform. We invited Teemu Koponen, a principal architect at Nicira, to tell the story. Nicira: SDN and Network Virtualization, as recounted by Teemu Koponen Virtualization is now foundational across servers, storage, and networking in datacenters, but these capabilities did not take hold simultaneously. Server virtualization was the first to gain traction, giving rise to the multi-tenant datacenter and initiating a fundamental shift in datacenter networking. A new access layer emerged, where most ports were virtual rather than physical, managed by a virtual switch residing within the hypervisor instead of the top-of-rack switch. Early hypervisors connected VMs directly to physical networks, tightly coupling them to the underlying topology and addressing. At this stage, the network itself was not yet virtualized—there were no virtual networks with independent service models, topologies, or addressing spaces that operated over shared physical infrastructure. This created significant operational challenges: Provisioning and migrating workloads required changes to the physical network, and scalability was inherently constrained. It was under these circumstances that the startup Nicira emerged in 2007, building on the NSF-funded Ethane research project11 at Stanford and Berkeley. Drawing from two key intellectual foundations, the company set out to bring network virtualization to enterprise environments. First, it embraced the principles of SDN by decoupling the control plane from the data plane. Second, it drew from research exploring network virtualization outside traditional enterprise networks (for example, Peterson et al.32). This convergence of academic insight and enterprise need led to two major innovations. The first was Open vSwitch (OVS), a production-grade OpenFlow software switch.33 As operational pressures pushed virtual switches to become the primary provider of network services for VMs, the physical datacenter network was primarily responsible for transporting IP-tunneled packets between hypervisors. This division of responsibilities called for a highly programmable virtual switch: Instead of acting as a static interface to the physical network, each virtual port had to implement the virtual network abstraction according to its current configuration. Nicira created OVS to meet this need, and it quickly became a foundational component enabling network virtualization in datacenters worldwide. The second innovation addressed the control plane: Network virtualization required a control plane capable of computing and coordinating rich, distributed state across all virtual switches—effectively establishing and maintaining the virtual network abstractions. While centralization offered the necessary flexibility,22 real-world demands for scalability and availability necessitated distribution. To resolve this tension, Nicira turned not to traditional distributed routing algorithms but to distributed systems principles, achieving a more favorable balance between flexibility and design complexity. This approach was realized in Onix,27 the first distributed, general-purpose controller platform for managing network resources. Onix influenced subsequent distributed controllers, which became a key element in the widespread commercial success of SDN. Nicira productized these innovations in its Network Virtualization Platform (NVP) for multi-tenant enterprise datacenters.26 In 2012, VMware acquired Nicira, and NVP became the foundation of VMware NSX, a widely deployed network virtualization platform. During the early 2010s, the networking industry began to realize that SDN has many big advantages. It lifts complex protocols up and out of the switches into the control plane, where it is written in a modern programming language. This made it possible to reason about the correctness of the protocols simply by examining the software controlling the network and the forwarding state maintained by the switches. For the first time, it became possible to formally verify the behavior of a complete network. Researchers, startups, network equipment vendors, and hyperscalers have all taken advantage of SDN principles to develop new ways to verify network behavior. We invited Professor George Varghese, who has been deeply involved in network verification research, to give us his perspective on network verification. From SDN to Network Verification, as recounted by George Varghese Large enterprise networks have diverse components, such as routers, firewalls, and load balancers; even basic questions, such as reachability—who can talk to who—are hard to answer. Errors can lead to outages or security failures. SDN provided a conceptual launch pad to launch a set of tools to compute reachability properties of complex networks. Besides separating data and control, SDN showed that—regardless of classical layers like Ethernet and IP—forwarding in routers could be abstracted as predicates, as rules on packet headers. The same insight, but taken further to compute network-wide reachability, was used to start the field of network-specific formal verification of network reachability with Veriflow25 and Header Space Analysis (HSA).24 HSA began with a ternary algebra devised for analyzing SDN sharing that was generalized24 to answer data-plane reachability questions in distributed (non-SDN) IP networks. Veriflow was commercialized and later acquired by VMWare. HSA was commercialized by Forward Networks, which still exists 10 years later and serves around 40 Fortune 500 customers. In the second generation, SDN again pointed the way to move beyond data-plane verification to control-plane verification to prove similar reachability guarantees but across changes in routing, generalizing from centralized SDN control to distributed IP routing, such as OSPF and BGP. Now the analysis must reason across not just all packet headers but also across all routing environments. Batfish17 began as a control-plane “simulator” and is used today by 75+ large companies. Its careful software structuring led to it becoming the “LLVM of network configurations” others have built tools on. The first tool to formally reason across control-plane environments was Minesweeper,7 which showed that it sufficed to reason about the stable states of distributed routing algorithms like BGP, thus approaching the simplicity of centralized SDN control planes. An NSF Large proposal called “Network Design Automation” sought to generalize network verification to a broader set of tools for networks, akin to those used in electronic design automation for chips, fleshing out a vision articulated by McKeown in a SIGCOMM 2012 keynote. Among other artifacts, the grant produced a tool called Lightyear,37 which allowed control-plane verification to scale to large networks (beyond Minesweeper) by exploiting locality; it is used by Microsoft’s Azure. SDN helped the community create abstractions and tools to tackle the more complex control plane of legacy protocols. These efforts were all seeded and sustained by NSF. Veriflow and Batfish were both supported by multiple NSF grants and commercialized using an NSF SBIR. HSA was supported by the POMI Expedition Award, and Minesweeper and Lightyear by other NSF grants. Much other pioneering NSF-funded work in network verification is omitted here for lack of space; suffice it to say that, starting with intellectual roots in SDN and fueled by NSF funding, network verification is now a thriving subfield in industry and academia. A main benefit of SDN is that it hands over the keys (of control) from the networking equipment vendors—who kept their systems closed and proprietary, and hence tended to evolve slowly—to software programmers, who could define the behavior for themselves, often in open source software. And indeed it happened: Today, most large networks are controlled by software written by those who own and operate networks rather than by networking equipment vendors. But what about the hardware? Switches, routers, firewalls, and network interface cards are all built from special-purpose ASICs—highly integrated, cost-effective, and super-fast. The problem was the features and protocols that operated on packets (for example, forwarding, routing, firewalls, and security) were all baked into hardware at the time the chip was designed, two to three years before it was deployed. What if the network owner and operator needed to change and evolve the behavior in their network, for example to add a new way to measure traffic or a new way to verify behavior? A group of researchers and entrepreneurs set out to make the switches and NICs programmable by the user, to allow more rapid improvement and give the operator greater control. Not only did new programmable devices emerge, but a whole open source movement around the P4 programming language. We invited Professor Nate Foster, who leads the P4 language ecosystem, to tell the story of how programmable forwarding planes came about. P4 and Networking DSLs, as recounted by Nate Foster If the first phase of SDN put network owners in control of the control plane, the second gave them control of the forwarding plane as well. The P4 language (“Programming Protocol-Independent Packet Processors”)9 provides network-specific constructs for describing how packets are parsed, processed in hardware and software pipelines, and forwarded—rather than accepting whatever behaviors are baked into a device. Today, P4 has grown into a vibrant open source ecosystem with support across a range of targets, including switch ASICs, smartNICs, and software switches. It is used both for programming P4-enabled targets and for validation of non-programmable targets.3 P4 built on an earlier wave of academic research on networking DSLs, much of it supported by federal funding. Frenetic adapted ideas from functional reactive programming (FRP) to SDN, enabling modular programming and high-level coordination of control and data planes.18 NetKAT extended Kleene Algebra with Tests (KAT) to provide a rigorous framework for specifying and reasoning about network-wide forwarding behavior.4 NetKAT offers a sound and complete equational theory, along with push-button verification based on symbolic automata. These languages elevated the level at which programmers operate, replacing low-level configuration knobs with network-wide, modular, and verifiable abstractions. The influence of these ideas can be seen in industry: Intent-based networking adopts the high-level specifications pioneered by early SDN DSLs, while NetKAT is used commercially to verify cloud-isolation policies. This progress reflects deep collaboration between networking and programming languages researchers, catalyzed by pivotal NSF investments over more than a decade. A 2011 award supported foundational work on Frenetic and NetKAT, as well as a 2013 summer school on network verification. Subsequent NSF programs, including Formal Methods in the Field (FMiTF), advanced the field further—supporting probabilistic extensions to NetKAT and the development of formal semantics and verification tools for P4.14 So far, we have focused on SDN wireline networks running over electrical and optical cables in datacenters, enterprises, and long-haul WANs. SDN was originally defined with wireline networks in mind. Yet, for cellular networks, the most widely used networks in the world, the need was even greater: Cellular networks have been held back for decades by closed, proprietary, and complex “standards” designed to allow equipment vendors to maintain a strong grip on the market. SDN provides an opportunity to open up networks, introducing well-defined control APIs and interfaces, moving control software to common operating systems running on commodity servers. This story has only just begun, but it started thanks to NSF-funded research in the mid 2000s, then boosted by DARPA-funded programs to support open source software for cellular infrastructure. We invited Guru Parulkar and Oğuz Sunay to tell the story, both of whom developed open source cellular systems at the Open Networking Foundation and for the DARPA-funded Pronto project.

How the U.S. National Science Foundation enabled Software-Defined Networking

Share this article

Related Articles