Git and libgit2

Git Large File System Overview

Published 2025-01-06. Last modified 2025-01-23.
Time to read: 12 minutes.

This page is part of the git collection.

I have published 8 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Scripts
  3. Git LFS Client Installation
  4. Git LFS Server URLs
  5. Git-ls-files, Wildmatch Patterns and Permutation Scripts
  6. Git LFS Tracking, Migration and Un-Migration
  7. Git LFS Client Configuration & Commands
  8. Git LFS SSH Authentication
  9. Working With Git LFS
  10. Evaluation Procedure For Git LFS Servers
  11. Git LFS server tests:
    1. Null Git LFS Server

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

Overview

The Git Large File System (LFS) is a way of extending Git to allow the versioning of large files, such as audio samples, videos, data, and graphics. Without Git LFS, the largest file that can be committed by commercial Git providers is 100 MB.

Git LFS extends Git repositories. Without a Git repository, Git LFS cannot be used.

Git LFS speeds up common operations like cloning projects with large files and fetching large files. This is because when an Git repository that has been enhanced with LFS is cloned, the only data that is downloaded are small pointers, not the large files themselves. The files are fetched on demand and efficiently managed.

I call Git without Git LFS “Plain Old Git”.

Network Bandwidth

Moving large files around in bulk requires a lot of network bandwidth. Unless you have a solid high-speed network setup, Git LFS is not going to provide significant benefit and, in fact, might actually provide lower productivity and be a source of unwanted aggravation.

My apartment utilizes fiber optic internet service, and I have Ethernet and Wi-Fi 6 coverage for mobile devices. Git LFS has worked well for me with this setup.

History

In 2015, 10 years ago, Atlassian started co-developing Git LFS for their BitBucket product along with GitHub and GitLab.

Be sure to check the publication date of information you find on the interwebs regarding Git LFS. There was a huge flood of information when it was first announced. Version 1.0.0 was released on 2015-10-01.

Git LFS started to gain traction in 2017. In the early days, the requirement for specialized Git LFS servers was a problem because those servers were scarce, and those that were available had serious issues and limitations.

As recently as 2021, Git LFS was not ready for prime time.

In 2025, we have many good servers to choose from, and they are surprisingly lightweight processes. The technology has matured considerably in recent years.

Do not believe any advice unless it was recently written. 'Facts' that were cited in the early days have probably changed.

😎

My purpose in writing these articles is to provide current information and advice, backed up with solid references and working code. I am not there yet; this is a work in progress.

Licensing

Git and Git LFS are two separate projects. The Git LFS open-source project is hosted at GitHub and has the MIT license. In contrast, Plain Old Git has the more restrictive GNU LGPL v2.1 license.

The more permissive licensing for Git LFS means that a second independent project to provide a programmable interface to Git LFS is unnecessary from a legal standpoint. There is no need for the equivalent of Plain Old Git’s libgit2. Phew! ... now we just need an API facade and some language bindings.

Distributed System Components

Like Plain Old Git, Git LFS requires a client and a server. When setting up a Git server with Git LFS, you actually configure two servers (a Plain Old Git server and an LFS server). Every Git user also needs their Plain Old Git client to be enhanced with a Git LFS extension.

– Use Any Git Server –
– Run Git LFS server on your laptop –
– Store large file versions wherever –

From a Git LFS user’s point of view, all of thee files stored within a Git repository are versioned, regardless of whether the files are big or small. However, the Plain Old Git server manages small files, while the Git LFS server manages large files. Within the Plain Old Git database, only the pointers to large files on the Git LFS server are versioned, not the contents of the files. It is the responsibility of the Git LFS server to maintain version history for large files.

Git LFS is a system for managing and versioning large files in association with a Git repository. Instead of storing the large files within the Git repository as blobs, Git LFS stores special "pointer files" in the repository, while storing the actual file contents on a Git LFS server. The contents of the large file are downloaded automatically when needed, for example, when a Git branch containing the large file is checked out.

Git LFS works by using a "smudge" filter to look up the large file contents based on the pointer file and a "clean" filter to create a new version of the pointer file when the large file’s contents change. It also uses a pre-push hook to upload the large file contents to the Git LFS server whenever a commit containing a large new file version is about to be pushed to the corresponding Git server.
 – From git-lfs man page

Git LFS Resembles DNS

Git LFS behaves somewhat like a DNS server, which merely informs DNS clients about how domain names are mapped to IP addresses. In contrast, Plain Old Git servers both consume and produce lots of traffic. Because a Git LFS server does not require much computational resource and does not require much bandwidth, it can be run on a laptop or a remote office server.

Data does not travel through a Git LFS server; Git LFS servers do not act as data conduits. Instead, large data transfers occur directly between the Git client and LFS storage. The role of Git LFS servers is merely to control large data transfers.

Git LFS is computationally lightweight

My 6-year-old laptop runs Git LFS on WSL / Ubuntu on Windows 10 without any problem. I use Git LFS on my Songs projects, which store recorded music video projects.

Large assets can be remote from the Git LFS Server

Motivational Differences

There are motivational differences between how one might use Plain Old Git versus Git LFS.

Differing Origin Stories

Git was created by Linus Torvalds, a power user who performed a lot of demanding code-related administrative tasks. In contrast, the Git LFS project was cooperatively initiated by large-scale commercial vendors.

  • Git LFS is the vendors’ project
  • To solve their problems
  • And fulfill demands made by their customers.

Git LFS users outside that use case, like me, have to figure many things out for themselves. The only way to do that properly is to experiment extensively, carefully read a lot of documentation, and sometimes examine source code.

Even though a lot of documentation has been written about Git, the Git LFS documentation was written by and for commercial vendors. I have tried to fill in the gaps with this collection of articles.

Git Is a Better Fit

  1. For versioning text files that change often. Git compresses and versions text-based formats efficiently, affording multiple versions of large text files.
  2. When you find it necessary to spend a lot of time curating your revision log.

Git LFS Is a Better Fit

  1. For binary formats that change often and/or need versioning.
  2. When you have a local server for large files, or have high-bandwidth access to a remote server.

Do Not Rush Into Git LFS

Getting Git LFS to work can be fraught with issues. Time and patience are required to achieve a working system. I hope that readers will benefit from the time I spent writing these articles.

It is easier and less risky to create a new Git repository that uses Git LFS right away than it is to migrate an existing Git repository to use Git LFS and maintain the structure of the commits.

By default, Git LFS rewrites the Git repository history during the migration process; this preserves the structure while reducing the size of the Git repository. The repository is smaller after the migration rewrites the Git history because large files are moved from the Git repository to the associated Git LFS repository.

If a project requires a carefully groomed commit graph, the Git history must be rewritten. Rewriting the Git history requires everyone to re-clone the Git repository. If many people share a Git repository and the history is rewritten, chaos can result because some people will invariably continue to work on the now-obsolete old Git repository. The problem users are the ones who do not read memos. They will lose their work unless a recovery procedure is followed.

If maintaining the Git commit ordering is not important to you and your organization, then you will be happy to know that this incantation:

Shell
$ git lfs migrate import --no-rewrite

... preserves compatibility for all your other users. However, the price of this compatibility is that after the import, any copies of large files that were in the Git repository will remain there. If those files were very large, the repository would remain bloated for all time.

So you must choose one of the following when enhancing a Git repository with Git LFS (see Git LFS Tracking, Migration and Un-Migration):

  1. Benefit: smaller Git repository and consistent commit ordering.
    Risk: a potential for unhappy users who lost work while the Git upgrade was in process.
  2. Benefit: Unlikely anyone will lose work during the upgrade.
    Potential issue: The Git repository forever remains bloated with large files that no longer have a purpose.

Wait until you are familiar with Git LFS before attempting to convert any repositories that you care about to Git LFS. Create repositories to practice on, as shown in these articles.

It is likely that you will encounter problems trying to get things to work. Hopefully, the solutions that I provide will help you solve them. Learn about the problems you will likely encounter and practice the solutions on the practice repositories before you try enhancing an important Git repository with LFS.

Implementations

Git is a distributed versioning system, and so is Git LFS. Pricing for prepackaged LFS storage is unreasonably high from BitBucket, GitHub, and GitLab. Running a local Git LFS server means you can store large assets wherever makes the most sense, including local storage or on an S3 server.

With few exceptions, Git LFS servers from BitBucket, GitHub, and GitLab lack the ability to authenticate against other storage services. If you want to pay the lowest price possible for storage but want to host on one of the aforementioned Git providers, you will need to run your own instance of a Git LFS server. Happily, Git LFS servers are lightweight processes that do not generate much network traffic, so you could run one on your laptop or a small office server.

For example, you can point a local Git LFS server at large files on AWS S3 or S3 clones like DigitalOcean Spaces or Backblaze B2. You won't incur any extra costs from Git providers like GitHub or BitBucket for this, and this configuration is easy to set up.

Many LFS implementations exist; however, as is often the case with software, dead projects live on as zombies for many years. The following LFS implementations were the most interesting to me.

APIs

The Git LFS Batch API is documented here. It is implemented by most vendors and Git LFS storage solutions as the default HTTP-based Git LFS object transfer protocol.

In contrast, the SSH-based Git LFS object transfer protocol, labeled the "SSH protocol proposal" in the linked page, was introduced with Git LFS v3.0.0. This protocol is only offered by a few vendors today, such as GitLab, who added it in their v17.2 release. I believe that GitHub and BitBucket also offer Git LFS over SSH. I will update this paragraph as I learn more.

LFS Server Summary

Prices are shown in the next section.

  • Null servers only use locally accessible content, such as might be found on a server in your local area network. This is the simplest setup, and if that server is regularly maintained (including backups), this option can provide good performance at no extra cost and subject to whatever security regime you decide to implement.
  • BitBucket has a tiny free storage offering, so small as to be almost useless (1 GB per GitHub user).
  • Git LFS S3 Proxy is a Cloudflare Pages site that acts as a Git LFS server backed by any S3-compatible service.
  • GitHub also has a tiny free storage offering, so small as to be almost useless (1 GB per GitHub user).
  • GitLab: All projects on GitLab.com have 10 GiB of free storage for their Git repository and Large File Storage (LFS). While much more generous than GitHub's free offering, this is still too small to be useful for many projects. GitLab’s storage pricing is crazy expensive.
  • JFrog Artifactory is a commercial server that provides Git LFS support, and many other features. Unless you are already a JFrog customer, it would not make sense to select this product for Git LFS.
  • Gitbucket is an F/OSS project that works well. However, I would prefer to avoid relying on anything written in Scala for production.
  • Giftless is a pluggable F/OSS Git LFS server. It is written in Python, is very customizable, and is easy to extend. In particular, it supports local storage, and storage on AWS S3.
  • LFS Test Server is an example server that implements the Git LFS API. The product notes say that it is “intended to be used for testing the Git LFS client and is not in a production-ready state.” However, if run on an internal network, or on a laptop, this might be a viable option. LFS Test Server is written in Go, with pre-compiled binaries available for Mac, Windows, Linux, and FreeBSD.
  • Rudolfs is a high-performance, caching, F/OSS Git LFS server with an AWS S3 and local storage back-end.
  • Sonatype Nexus Repository is a commercial server that provides Git LFS support, and many other features. Only supports the Git LFS batch API. It does not offer an online storage option; users should self-host. Unless you are already a Sonatype customer, it would not make sense to select this product for Git LFS.

Storage Pricing

There is a huge disparity in pricing between the various S3-compatible storage providers. Wasabi, at $7/TB/month, is the cheapest, while GitLab, at $6000/TB/month, is the most expensive. I do not believe that GitLab’s storage is 857 times better than Wasabi’s.

Data egress fees can become substantial costs. In fact, for large and active assets, data egress fees can be much greater than storage costs. GitHub has the highest data egress fees, 33 times more than the next most expensive provider.

When data egress fees are charged, there is no limit to the potential financial liability

Data egress fees do not apply from a Git provider (like GitHub) when storage is provided by a separate storage provider (like Wasabi).

In the following table, data egress fees for providers that do not charge for data egress are shown as $0.

Storage Normalized Storage (TB/month) Egress ($/GB) Comment
AWS $0.023 GB/month $26 $0.090
Azure $0.15 GB/month $15 $0.087 Premium pay-as-you-go pricing shown; other charges apply. Very complex pricing.
Backblaze $6 TB/month $6 $0.010 Minimum charge: $6/month.
Bitbucket $0.100 GB/month $100 ? Incomplete information shown online.
Cloudflare R2 $0.015 GB/month $15 $0 Very complex pricing. 10 GB free storage, unlimited bandwidth to write up to 1 million objects and read up to 10 million objects.
DigitalOcean Spaces $0.02 GB/month $20 $0.010 Minimum charge: $5/month
GitHub $5/data pack/month $100 $100.000
GitLab $5/month: 10 GB storage and 20 GB egress $6000 $3.000 Other charges apply, for example, $29 per user/month
Google Cloud $23 $0.110 Very complex pricing. Other charges apply.
Linode $5/month: 250GB storage, 1 TB egress $20 $0.005 The first 1 TB egress is included
Scaleway $0.015 GB/month $15 $0.010 75 GB/month egress included
Wasabi $7 $0 Minimum charge: $6.99/month.

Use the Latest Version of Git

At first, I had a really difficult time making Git LFS work. A major contributing factor was that my versions of Git on the client and server were both old, and the server version was older than the versions on the clients. Once I sorted that out, things got much easier.

I strongly recommend that you upgrade Git on every machine to the latest version before attempting to install Git LFS.

As of 2025-01-10, the latest stable version of Git was v2.47.1.

You can check the version of Git on your clients and the Git LFS server like this by running the following on each computer:

Shell
$ git -v
git version 2.47.1 

Git Releases

I asked ChatGPT to summarize the enhancements to Git made since v2.43 was released (November 20, 2023) that might affect Git LFS. The response has been edited for readability and relevance.

What changes affecting LFS have been introduced to Git since version 2.43.0?

Since Git version 2.43.0, several updates have been introduced that impact Git LFS (Large File Storage) functionality.

Git LFS Enhancements

Version 3.6.0: This release includes support for multi-stage authentication with Git credential helpers (requires Git 2.46.0) and relative worktree paths (requires Git 2.48.0). It also introduces a new object transfer batch size configuration option, better path handling when installing on Windows, more POSIX-compliant hook scripts, and improved performance with sparse checkouts, partial clones, and Git remotes with large numbers of tags.

Git Core Updates

Version 2.44: Introduced faster pack generation with multi-pack reuse, enhancing performance during operations like push and pull.

Version 2.43:

Enhanced Git repack to support multiple cruft packs and the ability to split repository contents by object filter, improving repository maintenance and storage efficiency.

According to ChatGPT, I should update Git on the server (gojira) to at least v2.48.0. However, v2.47.1 is the current stable release. Support for relative work paths will have to wait for now.

Upgrading Git

StackOverflow provided the information on how to upgrade Ubuntu. As usual, these instructions also apply to WSL/Ubuntu. If you are typing along, do the following on your server and all your clients.

Add the git-core PPA to the apt sources.

Shell
$ yes | sudo add-apt-repository ppa:git-core/ppa
PPA publishes dbgsym, you may need to include 'main/debug' component
Repository: 'Types: deb
URIs: https://ppa.launchpadcontent.net/git-core/ppa/ubuntu/
Suites: noble
Components: main
'
Description:
The most current stable version of Git for Ubuntu.
For release candidates, go to https://launchpad.net/~git-core/+archive/candidate . More info: https://launchpad.net/~git-core/+archive/ubuntu/ppa Adding repository. Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease Hit:2 https://dl.google.com/linux/chrome/deb stable InRelease Get:4 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease [24.3 kB] Get:5 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 Packages [2,840 B] Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease Get:6 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main i386 Packages [2,848 B] Get:7 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main Translation-en [2,088 B] Fetched 32.1 kB in 1s (35.7 kB/s) Reading package lists... Done N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'https://dl.google.com/linux/chrome/deb stable InRelease' doesn't support architecture 'i386'

Now update the apt packages and upgrade Git.

Shell
$ sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease
Hit:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease
Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
2 packages can be upgraded. Run 'apt list --upgradable' to see them. 
$ sudo apt list --upgradable Listing... Done git-man/noble,noble 1:2.47.1-0ppa1~ubuntu24.04.1 all [upgradable from: 1:2.43.0-1ubuntu7.1] git/noble 1:2.47.1-0ppa1~ubuntu24.04.1 amd64 [upgradable from: 1:2.43.0-1ubuntu7.1]
$ yes | sudo apt upgrade Reading package lists... Done Building dependency tree... Done Reading state information... Done Calculating upgrade... Done Get more security updates through Ubuntu Pro with 'esm-apps' enabled: libcjson1 libavdevice60 ffmpeg libpostproc57 libavcodec60 libavutil58 libswscale7 libswresample4 gh libavformat60 libavfilter9 Learn more about Ubuntu Pro at https://ubuntu.com/pro The following packages will be upgraded: git git-man 2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 8,967 kB of archives. After this operation, 11.9 MB of additional disk space will be used. Get:1 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git amd64 1:2.47.1-0ppa1~ubuntu24.04.1 [6,775 kB] Get:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git-man all 1:2.47.1-0ppa1~ubuntu24.04.1 [2,192 kB] Fetched 8,967 kB in 10s (881 kB/s) (Reading database ... 486991 files and directories currently installed.) Preparing to unpack .../git_1%3a2.47.1-0ppa1~ubuntu24.04.1_amd64.deb ... Unpacking git (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Preparing to unpack .../git-man_1%3a2.47.1-0ppa1~ubuntu24.04.1_all.deb ... Unpacking git-man (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Setting up git-man (1:2.47.1-0ppa1~ubuntu24.04.1) ... Setting up git (1:2.47.1-0ppa1~ubuntu24.04.1) ... Processing triggers for man-db (2.12.0-4build2) ...

Checking the installed version of Git shows the desired result:

Shell
$ git --version
git version 2.47.1 

Git add Without LFS Is Slow With Large Files

Plain Old Git lacks the ability to recognize compressed files. Git LFS addresses that problem.

Without Git LFS, the git add command creates a compressed snapshot of the working files that you have specified. When files are large, this can take a long time.

Git and Git LFS store the files that they manage differently. Git saves added files to the staging area, which is actually the file .git/index. On the other hand, Git LFS saves added files to the .git/lfs/objects/ directory.

Many types of large files are already compressed, for example zips, video files and many types of audio files. Without Git LFS, the git add command will waste time trying to further compress already compressed files. Again, Git LFS is designed for this task.

In contrast, the git commit command is always fast. It just creates a list of the snapshots that should be grouped into a commit.

References

I have published 8 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Scripts
  3. Git LFS Client Installation
  4. Git LFS Server URLs
  5. Git-ls-files, Wildmatch Patterns and Permutation Scripts
  6. Git LFS Tracking, Migration and Un-Migration
  7. Git LFS Client Configuration & Commands
  8. Git LFS SSH Authentication
  9. Working With Git LFS
  10. Evaluation Procedure For Git LFS Servers
  11. Git LFS server tests:
    1. Null Git LFS Server

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.