If Git had a nemesis, it’d be large files.
Large files bloat Git’s storage, slow down git clone , and wreak havoc on Git forges.
In 2015, GitHub released Git LFS—a Git extension that hacked around problems with large files. But Git LFS added new complications and storage costs.
Meanwhile, the Git project has been quietly working on large files. And while LFS ain’t dead yet, the latest Git release shows the path towards a future where LFS is, finally, obsolete.
What you can do today: replace Git LFS with Git partial clone
Git LFS works by storing large files outside your repo.
When you clone a project via LFS, you get the repo’s history and small files, but skip large files. Instead, Git LFS downloads only the large files you need for your working copy.
In 2017, the Git project introduced partial clones that provide the same benefits as Git LFS:
Partial clone allows us to avoid downloading [large binary assets] in advance during clone and fetch operations and thereby reduce download times and disk usage. – Partial Clone Design Notes, git-scm.com
Git’s partial clone and LFS both make for:
Small checkouts – On clone, you get the latest copy of big files instead of every copy. Fast clones – Because you avoid downloading large files, each clone is fast. Quick setup – Unlike shallow clones, you get the entire history of the project—you can get to work right away.
What is a partial clone?
A Git partial clone is a clone with a --filter .
For example, to avoid downloading files bigger than 100KB, you’d use:
git clone --filter='blobs:size=100k'
Later, Git will lazily download any files over 100KB you need for your checkout.
By default, if I git clone a repo with many revisions of a noisome 25 MB PNG file, then cloning is slow and the checkout is obnoxiously large:
$ time git clone https://github.com/thcipriani/noise-over-git time git clone https://github.com/thcipriani/noise-over-git Cloning into '/tmp/noise-over-git' ... into... ... Receiving objects: 100% ( 153/153 ) , 1.19 GiB objects: 100%1.19 GiB real 3m49.052s 3m49.052s
Almost four minutes to check out a single 25MB file!
$ du --max-depth = 0 --human-readable noise-over-git/. dunoise-over-git/. 1.3G noise-over-git/. noise-over-git/. $ ^ 🤬 ^ 🤬
And 50 revisions of that single 25MB file eat 1.3GB of space.
But a partial clone side-steps these problems:
$ git config --global alias.pclone 'clone --filter=blob:limit=100k' git configalias.pclone $ time git pclone https://github.com/thcipriani/noise-over-git time git pclone https://github.com/thcipriani/noise-over-git Cloning into '/tmp/noise-over-git' ... into... ... Receiving objects: 100% ( 1/1 ) , 24.03 MiB objects: 100%24.03 MiB real 0m6.132s 0m6.132s $ du --max-depth = 0 --human-readable noise-over-git/. dunoise-over-git/. 49M noise-over-git/ noise-over-git/ $ ^ 😻 ( the same size as a git lfs checkout ) ^ 😻same size as a git lfs checkout
My filter made cloning 97% faster (3m 49s → 6s), and it reduced my checkout size by 96% (1.3GB → 49M)!
But there are still some caveats here.
If you run a command that needs data you filtered out, Git will need to make a trip to the server to get it. So, commands like git diff , git blame , and git checkout will require a trip to your Git host to run.
But, for large files, this is the same behavior as Git LFS.