Skip to content
Tech News
← Back to articles

Content-defined chunking added to Bazel

read original get Bazel Build Automation Tool → more articles
Why This Matters

The addition of Content-Defined Chunking (CDC) to Bazel's build caching significantly reduces data transfer and storage by only syncing the changed parts of large build outputs. This advancement enhances build efficiency, especially for complex projects with large, composite artifacts, leading to faster builds and lower resource consumption for developers and organizations. It marks a crucial step toward more granular and intelligent build caching in the tech industry.

Key Takeaways

The goal: move the changed bytes, not the whole output.

BuildBuddy's Remote Cache uses Content-Defined Chunking (CDC) to make large build outputs behave more incrementally. When a binary, bundle, package, or archive is mostly unchanged, BuildBuddy can reuse chunks it has already seen instead of re-uploading or re-downloading the entire file.

In our Bazel chunking implementation PR, we observed 40% less data uploaded and a 40% smaller disk cache when benchmarked on BuildBuddy's own repo. To enable client-side CDC with BuildBuddy, use Bazel 8.7 or 9.1+ and pass --experimental_remote_cache_chunking .

The next frontier for build caching is not just skipping actions. It is skipping bytes.

Build caching has come a long way. Instead of rebuilding the world after every edit, Bazel and remote caching let teams reuse action outputs across machines and CI jobs. In practice, builds have moved from something closer to O(size of repo) toward O(size of change).

But "size of change" can be misleading. What really matters is the size of the transitive actions affected by the edit. A small source change can still ripple into many binaries, packages, bundles, and other large outputs, even when only a small part of each output actually changes.

That invalidation is expected. Build systems should rerun an action when its inputs change. The remote-cache problem is what happens next: the cache sees a new digest and moves the whole blob, even if that blob is mostly the same bytes as the previous version.

Linking, bundling, packaging, and archiving are where this shows up most often. They combine many transitive inputs into one output.

That makes them different from actions that operate on a small, direct set of files. A typical compile action might compile one source file using a smaller set of direct inputs. A transitive action, on the other hand, often consumes the accumulated outputs of many dependencies and produces one final binary, bundle, package, or archive.

In Bazel rules, this often shows up as a rule collecting files through a transitive depset and passing that accumulated set into a single action. For example, a simplified compile action might look like this:

... continue reading