Git and libgit2

Git Large File System Overview

Published 2025-01-06.
Time to read: 9 minutes.

This page is part of the git collection.

I have published 5 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Client Installation
  3. Git LFS Server URLs
  4. Git LFS Filename Patterns & Tracking
  5. Git LFS Client Configuration & Commands
  6. Working With Git LFS
  7. Evaluation Procedure For Git LFS Servers

7 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

Do Not Rush Into Git LFS

Getting Git LFS to work can be fraught with issues. Time and patience are required to achieve a working system. I hope that readers will benefit from the time I spent writing these articles.

It is much easier to create a new git repository that uses Git LFS than it is to convert an existing git repository to use Git LFS.

Wait until you are familiar with Git LFS before attempting to convert any repositories that you care about to Git LFS. Create repositories to practice on, as shown in these articles.

It is likely that you will encounter problems trying to get things to work. Hopefully, the solutions that I provide will help you solve them. Learn about the problems you will likely encounter and practice the solutions on the practice repositories before you try enhancing an important git repository with LFS.

Overview

The Git Large File System (LFS) is a way of extending git to allow the versioning of large files, such as audio samples, videos, data, and graphics. Atlassian co-developed Git LFS with GitHub in 2015. Without Git LFS, the largest file that can be committed by commercial git providers is 100 MB.

Git LFS extends git repositories. Without a git repository, LFS cannot be used.

Git and Git LFS are two separate projects. The Git LFS open-source project is hosted at GitHub and has the MIT license. Note that git itself has the more restrictive GNU LGPL v2.1 license.

Like git, LFS requires a client and a server. When setting up Git LFS, you actually configure two servers (a regular git server and an LFS server) and one git client, which has an extension for LFS.

Run Git LFS server on your laptop
Version large files on S3
Profit

From a user’s point of view, all of their files stored within a git repository are versioned, regardless of whether the files are big or small. However, the git server manages small files, while the Git LFS server manages large files. Within the git database, only the paths of large files on the LFS server are versioned, not the contents of the files.

Git LFS is a system for managing and versioning large files in association with a Git repository. Instead of storing the large files within the Git repository as blobs, Git LFS stores special "pointer files" in the repository, while storing the actual file contents on a Git LFS server. The contents of the large file are downloaded automatically when needed, for example when a Git branch containing the large file is checked out.

Git LFS works by using a "smudge" filter to look up the large file contents based on the pointer file, and a "clean" filter to create a new version of the pointer file when the large file’s contents change. It also uses a pre-push hook to upload the large file contents to the Git LFS server whenever a commit containing a new large file version is about to be pushed to the corresponding Git server.
 – From git-lfs man page

Motivational Differences

There are motivational differences between how one might use regular git versus Git LFS.

Git LFS is more like a DNS server, which merely informs DNS clients about how domain names are mapped to IP addresses, than git servers, which both consume and produce lots of traffic. Because Git LFS does not require much computational resource and does not require much bandwidth, it can be run on a laptop or an office server.

Back in 2017, 8 years ago, when LFS started to gain traction, the fact that it required a special server was a problem because those servers did not exist. In 2024, we have many good servers to choose from, and they are surprisingly lightweight processes. Because facts have changed, motivations have also changed over the years.

Git Is a Better Fit:

  1. For versioning a text file that changes often. Git compresses and versions text-based formats efficiently, affording multiple versions of large text files.
  2. When you find it necessary to spend a lot of time curating your revision log.

Git LFS Is a Better Fit:

  1. For binary formats that change often and/or need versioning.
  2. When a lightweight local server is required. Common operations like LFS cloning and fetching are quicker because you’re only downloading code and small pointers, not the large files themselves.
  3. When you have a second machine, or have high-bandwidth remote access to a server. My apartment utilizes fiber optic internet service, and I have Ethernet and Wi-Fi 6 coverage for mobile devices.
Unless you have a solid high-speed network setup, Git LFS is not going to provide significant benefit and, in fact, might actually provide lower productivity and be a source of unwanted aggravation.
Git LFS is computationally lightweight

My 6-year-old laptop runs Git LFS on WSL / Ubuntu on Windows 10 without any problem. I use git and Git LFS on my Songs projects, which store recorded music projects and music video projects.

Large assets can be remote from the LFS Server

Implementations

Git is a distributed versioning system, and so is Git LFS. Pricing for prepackaged LFS storage is unreasonably high from BitBucket, GitHub, and GitLab. Running a local Git LFS server means you can store large assets wherever makes the most sense - including locally, or an S3 server.

With a few exceptions, Git LFS servers from BitBucket, GitHub, and GitLab lack the ability to authenticate against other storage services. If you want to pay the lowest price possible for storage, but want to host on one of the aforementioned git providers, you will need to run your own instance of Git LFS. Happily, a Git LFS server is a lightweight process that does not generate much network traffic, so you could run one on your laptop or a small office server.

For example, you can point a local Git LFS server at large files on AWS S3 or S3 clones like DigitalOcean Spaces or Backblaze B2. You won't incur any extra costs from git providers like GitHub or BitBucket for this, and this configuration is easy to set up.

Many LFS implementations exist; however, as is often the case with software, dead projects live on as zombies for many years. The following LFS implementations were the most interesting to me.

LFS Server Summary

Prices are shown in the next section.

  • Null servers only use locally accessible content, such as might be found on a server in your local area network. This is the simplest setup, and if that server is regularly maintained (including backups), this option can provide good performance at no extra cost and with much better security than provided by commercial PaaS vendors.
  • BitBucket has a tiny free storage offering, so small as to be worthless (1 GB per GitHub user).
  • GitHub also has a tiny free storage offering, so small as to be worthless (1 GB per GitHub user). The Git LFS API for GitHub is documented here.
  • GitLab All projects on GitLab.com have 10 GiB of free storage for their Git repository and Large File Storage (LFS). While much more generous than GitHub's free offering, this is still too small to be useful for many projects. GitLab’s storage pricing is crazy expensive.
  • JFrog Artifactory is a commercial server that provides Git LFS support, and many other features. Unless you are already a JFrog customer, it would not make sense to select this product for Git LFS.
  • Gitbucket is an F/OSS project that works well. However, I would prefer to avoid relying on anything written in Scala for production.
  • Giftless is a pluggable F/OSS Git LFS server. It is written in Python, is very customizable, and is easy to extend. In particular, it supports local storage, and storage on AWS S3.
  • LFS Test Server is an example server that implements the Git LFS API. The product notes say that it is “intended to be used for testing the Git LFS client and is not in a production-ready state.” However, if run on an internal network, or on a laptop, this might be a viable option. LFS Test Server is written in Go, with pre-compiled binaries available for Mac, Windows, Linux, and FreeBSD.
  • Rudolfs is a high-performance, caching, F/OSS Git LFS server with an AWS S3 and local storage back-end.
  • Sonatype Nexus Repository is a commercial server that provides Git LFS support, and many other features. Only supports the Git LFS batch API. It does not offer an online storage option; users should self-host. Unless you are already a Sonatype customer, it would not make sense to select this product for Git LFS.

Storage Pricing

There is a huge disparity in pricing between the various S3-compatible storage providers. Wasabi, at $7/TB/month, is the cheapest, while GitLab, at $6000/TB/month, is the most expensive. I do not believe that GitLab’s storage is 857 times better than Wasabi’s.

Data egress fees can become substantial costs. In fact, for large and active assets, data egress fees can be much greater than storage costs. GitHub has the highest data egress fees, 33 times more than the next most expensive provider. When egress fees are charged, there is no limit to the potential financial liability.

Data egress fees do not apply from a git provider (like GitHub) when storage is provided by a separate storage provider (like Wasabi).

In the following table, data egress fees for providers that do not charge for data egress are shown as $0.

Storage Normalized Storage (TB/month) Egress ($/GB) Comment
AWS $0.023 GB/month $26 $0.090
Azure $0.15 GB/month $15 $0.087 Premium pay-as-you-go pricing shown, other charges apply. Very complex pricing.
Backblaze $6 TB/month $6 $0.010 Minimum charge: $6/month.
Bitbucket $0.100 GB/month $100 ? Incomplete information shown online.
Cloudflare R2 $0.015 GB/month $15 $0 Very complex pricing.
DigitalOcean Spaces $0.02 GB/month $20 $0.010 Minimum charge: $5/month
GitHub $5/data pack/month $100 $100.000
GitLab $5/month: 10 GB storage and 20 GB egress $6000 $3.000 Other charges apply, for example, $29 per user/month
Google Cloud $23 $0.110 Very complex pricing. Other charges apply.
Linode $5/month: 250GB storage, 1 TB egress $20 $0.005 The first 1 TB egress is included
Scaleway $0.015 GB/month $15 $0.010 75 GB/month egress included
Wasabi $7 $0 Minimum charge: $6.99/month.

Use the Latest Version of Git

I had a really difficult time making Git LFS work. A major contributing factor was that my versions of git on the client and server were both old, and the server version was older than the versions on the clients.

I strongly recommend that you upgrade git on every machine to the latest version before attempting to install Git LFS.

As of 2024-12-25, the latest stable version of git was v2.47.1.

You can check the version of git on your clients and the Git LFS server like this by running the following on each computer:

Shell
$ git -v
git version 2.47.1

Git Releases

I asked ChatGPT to summarize the enhancements to git made since v2.43 was released (November 20, 2023) which might affect Git LFS. The response has been edited for readability and relevance.

What changes affecting LFS have been introduced to git since version 2.43.0?

Since Git version 2.43.0, several updates have been introduced that impact Git LFS (Large File Storage) functionality.

Git LFS Enhancements

Version 3.6.0: This release includes support for multi-stage authentication with Git credential helpers (requires Git 2.46.0) and relative worktree paths (requires Git 2.48.0). It also introduces a new object transfer batch size configuration option, better path handling when installing on Windows, more POSIX-compliant hook scripts, and improved performance with sparse checkouts, partial clones, and Git remotes with large numbers of tags.

Git Core Updates

Version 2.44: Introduced faster pack generation with multi-pack reuse, enhancing performance during operations like push and pull.

Version 2.43:

Enhanced git repack to support multiple cruft packs and the ability to split repository contents by object filter, improving repository maintenance and storage efficiency.

According to ChatGPT, I should update git on the server (gojira) to at least v2.48.0. However, v2.47.1 is the current stable release. Support for relative work paths will have to wait for now.

Upgrading Git

StackOverflow provided the information on how to upgrade Ubuntu. As usual, these instructions also apply to WSL/Ubuntu. If you are typing along, do the following on your server and all your clients.

Add the git-core PPA to the apt sources.

Shell
$ yes | sudo add-apt-repository ppa:git-core/ppa
PPA publishes dbgsym, you may need to include 'main/debug' component
Repository: 'Types: deb
URIs: https://ppa.launchpadcontent.net/git-core/ppa/ubuntu/
Suites: noble
Components: main
'
Description:
The most current stable version of Git for Ubuntu.
For release candidates, go to https://launchpad.net/~git-core/+archive/candidate . More info: https://launchpad.net/~git-core/+archive/ubuntu/ppa Adding repository. Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease Hit:2 https://dl.google.com/linux/chrome/deb stable InRelease Get:4 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease [24.3 kB] Get:5 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 Packages [2,840 B] Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease Get:6 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main i386 Packages [2,848 B] Get:7 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main Translation-en [2,088 B] Fetched 32.1 kB in 1s (35.7 kB/s) Reading package lists... Done N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'https://dl.google.com/linux/chrome/deb stable InRelease' doesn't support architecture 'i386'

Now update the apt packages and upgrade git.

Shell
$ sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease
Hit:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease
Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
2 packages can be upgraded. Run 'apt list --upgradable' to see them.
$ sudo apt list --upgradable Listing... Done git-man/noble,noble 1:2.47.1-0ppa1~ubuntu24.04.1 all [upgradable from: 1:2.43.0-1ubuntu7.1] git/noble 1:2.47.1-0ppa1~ubuntu24.04.1 amd64 [upgradable from: 1:2.43.0-1ubuntu7.1]
$ yes | sudo apt upgrade Reading package lists... Done Building dependency tree... Done Reading state information... Done Calculating upgrade... Done Get more security updates through Ubuntu Pro with 'esm-apps' enabled: libcjson1 libavdevice60 ffmpeg libpostproc57 libavcodec60 libavutil58 libswscale7 libswresample4 gh libavformat60 libavfilter9 Learn more about Ubuntu Pro at https://ubuntu.com/pro The following packages will be upgraded: git git-man 2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 8,967 kB of archives. After this operation, 11.9 MB of additional disk space will be used. Get:1 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git amd64 1:2.47.1-0ppa1~ubuntu24.04.1 [6,775 kB] Get:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git-man all 1:2.47.1-0ppa1~ubuntu24.04.1 [2,192 kB] Fetched 8,967 kB in 10s (881 kB/s) (Reading database ... 486991 files and directories currently installed.) Preparing to unpack .../git_1%3a2.47.1-0ppa1~ubuntu24.04.1_amd64.deb ... Unpacking git (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Preparing to unpack .../git-man_1%3a2.47.1-0ppa1~ubuntu24.04.1_all.deb ... Unpacking git-man (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Setting up git-man (1:2.47.1-0ppa1~ubuntu24.04.1) ... Setting up git (1:2.47.1-0ppa1~ubuntu24.04.1) ... Processing triggers for man-db (2.12.0-4build2) ...

Checking the installed version of git shows the desired result:

Shell
$ git --version
git version 2.47.1

Git add Is Slow With Large Files

The git add command creates a compressed snapshot of the working files that you have specified. When files are large, this can take a long time.

Git and Git LFS store the files that they manage differently. Git saves added files to the staging area, which is actually the file .git/index. On the other hand, Git LFS saves added files to .git/lfs/objects/.

Many types of large files are already compressed, for example zips, video files and many types of audio files. This means the git add command will waste time trying to further compress already compressed files. Eventually, a copy of all the files you added will reside in your local copy of the repository.

In contrast, the git commit command is always fast. It just creates a list of the snapshots that should be grouped into a commit.

References

I have published 5 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Client Installation
  3. Git LFS Server URLs
  4. Git LFS Filename Patterns & Tracking
  5. Git LFS Client Configuration & Commands
  6. Working With Git LFS
  7. Evaluation Procedure For Git LFS Servers

7 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.