Thermodynamics rules future orbital data centers

“Space computing, the final frontier, has arrived,” Nvidia CEO Jensen Huang declared at the Nvidia GTC conference in March.

Indeed, the idea of data centers in orbit has gone from science fiction to a serious spending category. Elon Musk’s SpaceX has acquired xAI (also Musk’s) and is planning a constellation of space-based data centers. Google, not to be outdone, announced Project Suncatcher in partnership with Planet, planning to launch two satellites equipped with Google Tensor Processing Unit (TPU) AI chips by early 2027. Startup Starcloud has already filed a proposal with the Federal Communications Commission for an 88,000-satellite constellation for orbital data centers. As Starcloud’s filing suggests, these companies are all proposing fleets of satellites numbering in the thousands, each housing a rack or multiple racks of AI-grade GPUs, interconnected with each other through free-space optical links and communicating back to Earth via microwave links, either directly or through other satellites.

Proponents tout the many wonders of computing in space: abundant solar energy, free cooling, and freedom from Earth-based disturbances like earthquakes, floods, and protesters. But a sober look at the physics of space-based computing paints a much more nuanced picture.

Free cooling is perhaps the biggest misconception. Space is cold, but it also has no atmosphere. That means the best heat-removal mechanisms, conduction and convection, are off the table. The only option is radiation. To prevent a chip from overheating in space, a large, costly surface area is required to dissipate the energy and then radiate it.

Solar energy is abundant, but collecting it with functional solar panels that maintain perfect alignment toward the sun is a complex task requiring extensive attitude control systems. On top of that, ionizing radiation in space from cosmic rays and other sources poses a unique challenge, degrading the solar panels, the radiative coolers, and the chips themselves. Because regular maintenance in space is difficult, redundancy has to be built in at launch, and cost estimates have to account for efficiency degradation over time.

At ABI Research, where I work as an aerospace analyst, we did a rough total-cost-of-ownership comparison between a data center on Earth and one in space. It showed that the cost to launch and run a GPU in space for a year is at least an order of magnitude higher than the same feat in a terrestrial data center. Our model was simple, assuming an Nvidia H100 server rack launched with the requisite-size solar panel and radiator on a spacecraft akin to Starcloud’s pilot launch. We assumed SpaceX’s Starship was used at a highly optimistic launch cost per kilogram of US $44, and a terrestrial energy cost of $0.20 per kilowatt hour. This is a simple back-of-the-envelope calculation, but it does signal something real.

From our perspective, the cost of delivery and space hardening of the payload makes general-purpose space-based data centers difficult to justify economically today, despite the fact that data-center builders in many regions are scrambling for electric power. However, there are niche applications where the much higher costs of computing in space could be justified. Examples include preprocessing data from Earth-observation satellites, real-time detection and tracking of hypersonic missiles, and active collision avoidance in the increasingly crowded low Earth orbit. Even for these, though, contending with fundamental physics will still be a demanding challenge. And a technologically compelling one, too.

The Cooling Challenge in Space

Cooling is where physics separates the science from the fiction. The governing equation for radiative cooling, the only type of cooling available in space, is known as the Stefan-Boltzmann Law. It states that the amount of power you can radiate is proportional to the area of the radiator times its temperature to the fourth power. For a space systems architect, the implications of this law are brutal. In orbit, the only variable we can control is area. This restriction creates a geometric penalty, or a “physics tax,” for cooling in space: The more power you need to reject, the bigger the area of the radiator you need to bring along from Earth.

To understand how big this baseline area is in practice, I used the Stefan-Boltzmann law to model the heat-rejection area needed to keep a single chip that draws 700 watts of power—such as the H100 GPU chip, an AI stalwart—at a constant 60 °C, usually considered the sweet spot for GPU longevity and stability. I further assumed that the radiator is perfectly facing deep space, at a chilly background temperature of 3 kelvins. By this calculation, a single chip would require 1.4 square meters of radiator surface. To put this into perspective, consider that a common AI rack can hold approximately 32 GPUs (four H100 server boards). With CPUs, memory, and networking equipment, this rack would draw around 40 kilowatts of power. This single rack includes 2.5 terabytes of memory—enough capacity to serve over 20,000 concurrent users or run 16 simultaneous instances of Llama 3, an open-source AI model. But to cool this thermal load in a vacuum, that single rack would require an 80-square-meter radiator, roughly the size of a pickleball court. For an aggregate 100-megawatt data center, you’d need at least 2,500 of those radiators. And that’s the best-case scenario. Additional problems are hidden in the low Earth orbit environment itself. Space exposes radiators and their coatings to a chemically hostile brew of ultraviolet light and atomic oxygen, quite the opposite of a clean-room environment. Over a LEO satellite’s typical 5-year lifespan, these elements degrade the radiator’s surface properties and lower its ability to shed heat. Including this degradation in the model reveals that as the radiator degrades from a “fresh” state to an “end-of-life” state, the physics demands a further penalty. To maintain that same 60 °C operating temperature for the GPU chips, the required surface area jumps from about 1.4 square meters per chip to nearly 2.0 square meters. In other words, the physics tax rises by 40 percent. Therefore, you must launch at least 40 percent more radiator mass, endure higher atmospheric drag, and sacrifice valuable launch volume just to survive the degradation of the thermal coating. This increase adds significantly to the launch cost and further erodes the economics of a space-based data center. The Silicon Challenge in Space Solving the heat problem is only part of the battle. The other significant challenge in low Earth orbit is ionizing radiation, which affects the computing hardware itself. Today’s satellites typically use radiation-hardened processors, which are very reliable but also much more expensive, and they perform poorly compared to commercial off-the-shelf processors. A standard rad-hard chip doesn’t have the processing power to run a modern large language model (LLM). As a result, satellite operators aspiring to launch a data center have no choice but to make a risky compromise: to use hardware meant for terrestrial use. In order to achieve the necessary compute density, orbital data centers must use the same Nvidia H100s or Google TPUs found in terrestrial server farms. The problem is that these chips are “soft” targets in space. High-energy particles can flip bits in memory or cause “latch-ups” in logic that fry the circuit. One possible option is to shield the computers from radiation with thick, absorbent panels. However, the shielding would add significantly to the already heavy satellites. The other option is to compensate for the radiation damage with redundancy. Indeed, edge computing architects are moving toward software-defined resilience, where instead of one perfectly hardened computer, operators fly a cluster of imperfect, commercial ones whose total cost could be as low as one-tenth to one-hundredth that of the rad-hard model. This redundant approach is used in many spacecraft, including Artemis II, which recently carried astronauts around the moon, as well as SpaceX’s flight computers and the Hewlett Packard Enterprise edge servers for the International Space Station. By running three (or more) instances of the same calculation on three different nodes and comparing the answers, the system can detect a corrupted processor. If a node fails, the “orchestrator” reboots it while the others continue the mission. While this ensures resiliency, it also means that some fraction of the compute capacity is dedicated to redundancy, further increasing the costs. The Energy Challenge in Space An often-touted advantage of space-based data centers is the seemingly unlimited supply of free, clean energy from the sun. Solar energy in orbit is indeed abundant, at 1,361 watts per square meter. Of course, capturing that free energy is made possible only by the very costly launching of large solar panels into orbit. And those solar panels also degrade over time due to radiation exposure, typically losing 1 to 3 percent efficiency per year. Let’s say a solar array collects 1 MW of power to run an AI cluster. The laws of physics demand that the satellite must eventually radiate 1 MW of waste heat. Because the square area needed to generate the solar power—around 400 W/m2—and to reject the heat—around 450 W/m2—are nearly equivalent, every square meter of power generation now demands approximately another square meter of cooling. The radiator needs to be a structural equal, not merely a passive coating on a surface used for something else. As Elon Musk recently noted in Davos, the most efficient radiator is one that never sees the sun. By orienting the spacecraft so the solar panels face the sun and the radiators face the deep vacuum of space, efficiency skyrockets for both. But there’s a catch: Maintaining this perfect three-way alignment—panels to sun, radiator to the void, antennas to Earth—requires complex, high-torque attitude control systems. So this configuration means more payload and more computing power. Plus, these control systems are complex components with many failure modes, which is not optimal in a situation where maintenance is difficult. The Killer Apps for Computing in Space Given all these challenges of deploying massive radiators for satellites in the hostile environment of space, why build data centers in space at all? While training or inference on LLMs in space doesn’t seem economical today, there are other, very compelling applications for computing in space. Here are two: solving the downlink bottleneck from Earth-observation satellites and enabling collision-preventing maneuvers in the increasingly crowded low Earth orbit. The latest Earth-observation satellites, equipped with hyperspectral and synthetic aperture radar sensors, are used for a range of important reconnaissance missions, such as battlefield intelligence, tracking the global shadow fleet of ships carrying contraband, and assessing earthquakes or infrastructure failures down to the millimeter. These systems can generate hundreds of terabytes of raw data per day that must be transmitted to Earth. However, the radio-frequency “pipes” used to downlink the data are congested, and the ground infrastructure cannot absorb the sheer volume of raw data. Another immediate, mission-critical application for in-space computation is protecting the orbital environment. With over 17,000 satellites in orbit, the overwhelming majority of which are in low Earth orbit, avoiding collisions between these satellites is crucial. As NASA astrophysicist Donald Kessler pointed out back in 1978, a single space collision could cause a cascading effect that renders the entirety of LEO unusable. RELATED: Have We Reached a Space-Junk Tipping Point? According to SpaceX’s recent annual report, the Starlink constellation executes a collision avoidance maneuver every 2 minutes on average. Each maneuver already relies on onboard AI systems but still requires most of the processing to happen on the ground.

... continue reading