Published 2025-01-06.
Last modified 2025-12-25.
Time to read: 25 minutes.
git collection.
- Git Large File System Overview
- Git LFS Client Installation
- Git LFS Server URLs
- Git-ls-files, Wildmatch Patterns and Permutation Scripts
- Git LFS Tracking, Migration and Un-Migration
- Git LFS Client Configuration & Commands
- Working With Git LFS
Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, the compiled Go programs provided on GitHub should install easily, and most of the textual information should be helpful.
Overview
The Git Large File System (LFS) is a way of extending Git to allow the versioning of large files, such as audio samples, videos, data, and graphics. Without Git LFS, the largest file that can be committed by commercial Git providers is 100 MB.
Git LFS extends Git repositories. Without a Git repository, Git LFS cannot be used.
Git LFS speeds up common operations like cloning projects with large files and fetching large files. This is because when an Git repository that has been enhanced with LFS is cloned, the only data that is downloaded are small pointers, not the large files themselves. The files are fetched on demand and efficiently managed.
I call Git without Git LFS “Plain Old Git”.
Network Bandwidth
Moving large files around in bulk requires a lot of network bandwidth. Unless you have a solid high-speed network setup, Git LFS is not going to provide significant benefit and, in fact, might actually provide lower productivity and be a source of unwanted aggravation.
My apartment utilizes fiber optic internet service, and I have Ethernet and Wi-Fi 6 coverage for mobile devices. Git LFS has worked well for me with this setup.
History
In 2015, 10 years ago, Atlassian started co-developing Git LFS for their BitBucket product along with GitHub and GitLab.
Be sure to check the publication date of information you find on the interwebs regarding Git LFS. There was a huge flood of information when it was first announced. Version 1.0.0 was released on 2015-10-01.
Git LFS started to gain traction in 2017. In the early days, the requirement for specialized Git LFS servers was a problem because those servers were scarce, and those that were available had serious issues and limitations.
As recently as 2021, Git LFS was not ready for prime time.
In 2025, we have many servers to choose from, and they are surprisingly lightweight processes. The technology has matured considerably in recent years, however after examining them closely, the only implementation that follows proper software standards (such as proper error handling) is the GitLab implementation.
Do not believe any advice unless it was recently written. 'Facts' that were cited in the early days have probably changed.
My purpose in writing these articles is to provide current information and advice, backed up with solid references and working code.
Licensing
Git and Git LFS are two separate projects. The Git LFS open-source project is hosted at GitHub and has the MIT license. In contrast, Plain Old Git has the more restrictive GNU LGPL v2.1 license.
The more permissive licensing for Git LFS means that a second independent
project to provide a programmable
interface to Git LFS is unnecessary from a legal standpoint.
There is no need for the equivalent of Plain Old Git’s
libgit2.
Phew!
... now we just need an API facade and some language bindings.
Distributed System Components
Like Plain Old Git, Git LFS requires a client and a server. When setting up a Git server with Git LFS, you actually configure two servers (a Plain Old Git server and an LFS server). Every Git user also needs their Plain Old Git client to be enhanced with a Git LFS extension.
– Run Git LFS server on your laptop –
– Store large file versions wherever –
From a Git LFS user’s point of view, all of thee files stored within a Git repository are versioned, regardless of whether the files are big or small. However, the Plain Old Git server manages small files, while the Git LFS server manages large files. Within the Plain Old Git database, only the pointers to large files on the Git LFS server are versioned, not the contents of the files. It is the responsibility of the Git LFS server to maintain version history for large files.
Git LFS works by using a "smudge" filter to look up the large file contents based on the pointer file and a "clean" filter to create a new version of the pointer file when the large file’s contents change. It also uses a pre-push hook to upload the large file contents to the Git LFS server whenever a commit containing a large new file version is about to be pushed to the corresponding Git server.
git-lfs man pageGateway, Not a Proxy
Standard Git LFS servers (e.g., GitLab, Artifactory, Bitbucket, or custom
implementations like the git-lfs-s3 Ruby gem) act as a gateway
that does not process or stream any file data. The Git LFS standard specifies
how to use presigned URLs to let clients upload and download directly to and
from storage backends like S3. Presigned URLs are temporary, signed
S3-compatible links for secure and private transmission.
This architecturally separates metadata handling from data transfer, which greatly reduces the server load, which means almost anything is able to become a server. This flexibilty and simplicity reduces cost and latency. Metadata is used by the batch API for pointers and authentication.
Request Flow
In a typical S3-backed Git LFS setup:
- Git LFS client requests upload/download via LFS batch API to the platform’s LFS server.
- Git LFS server authenticates and generates presigned S3 URL.
- Git LFS client transfers data directly to/from S3 using the URL, without any data flowing through the LFS server.
- Git server updates metadata (pointers in Git repo).
This is efficient and scalable, unlike a true proxy (e.g., the Cloudflare
Worker-based git-lfs-s3-proxy, which does relay data).
Even if you use SSH for standard Git operations, Git LFS almost always issues its own separate requests over HTTPS. This means your network must allow HTTPS traffic to the LFS server even if Git is working over SSH.
Git LFS can be computationally lightweight
My 6-year-old laptop ran a custom Git LFS server on WSL / Ubuntu on Windows 10 without any problem. I use Git LFS on my Songs projects, which store recorded music video projects.
Large assets can be in data storage that is physically remote from the Git LFS Server.
Grok Told Me This
I had a good session with Grok.
Here is the typical handshake / protocol flow for Git LFS when using a standard implementation
(e.g., GitHub, GitLab, Bitbucket, Gitea, or a custom LFS server like git-lfs-s3)
backed by an S3-compatible storage (AWS S3, MinIO, R2, etc.).
This flow uses the modern Batch API (introduced in Git LFS v1.1+ ~2016 and now universal). The LFS server never streams the large binary data itself — it only acts as an authorization & metadata control plane, issuing presigned URLs for direct client ↔ S3 communication.
Download Flow (git pull / fetch / clone — getting objects to your laptop)
-
Git client detects LFS pointers During
git fetch,pull, orclone, Git sees pointer files in the repository (small text files withversion,oid= SHA256 hash, andsize). -
Git invokes git-lfs client (smudge / filter process or explicit
git lfs fetch) → git-lfs collects all missing object IDs (SHA256 oids) that need to be downloaded. - git-lfs client → LFS server: POST /objects/batch (the key “handshake” request)
- HTTP POST to the LFS endpoint (usually
/info/lfs/objects/batchor similar) - Body: JSON with
"operation": "download", list of object specs ({oid, size}), and transfer type hints - Headers:
Accept: application/vnd.git-lfs+json - Authentication: Usually Bearer token / Basic auth / SSH key forwarded from Git credentials (same as git push/pull)
- HTTP POST to the LFS endpoint (usually
- LFS server authenticates the request
- Verifies user/repo access (via Git server integration)
- Checks which objects already exist in storage (by oid)
- LFS server → git-lfs client: 200 OK Batch response
- JSON response with
"transfer": "basic"(or other) - For each requested object:
{ "oid": "...", "size": 123456789, "actions": { "download": { "href": "https://s3.amazonaws.com/my-bucket/.../sha256:...?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=...&X-Amz-Expires=900&X-Amz-Signature=...", "header": { "Accept": "application/octet-stream" } // optional } } }→ This is the presigned GET URL (usually expires in 5–60 minutes)
- JSON response with
- git-lfs client → S3-compatible backend: HTTP GET presigned URL (direct!)
- Client downloads the raw binary data directly from S3
- No data flows through the LFS server
- Uses plain HTTP GET (with query params for signature validation)
- Progress reporting shown in terminal
- git-lfs client stores object locally
- Saves to
.git/lfs/objects/xx/xx/...(content-addressable) - Verifies SHA256 matches oid
- Git smudge filter replaces pointer with real file content during checkout
- Saves to
Upload Flow (git push — sending new large files from your laptop)
The sequence is very similar but reversed:
-
git-lfs client detects new/changed LFS objects during
git push(after Git has prepared the commit locally) - git-lfs client → LFS server: POST /objects/batch
- Body: JSON with
"operation": "upload", list of new objects
- Body: JSON with
- LFS server authenticates + checks
- May check if object already exists (skip upload if yes — deduplication by oid)
- LFS server → git-lfs client: 200 OK Batch response
- For each object:
{ "actions": { "upload": { "href": "https://s3.amazonaws.com/my-bucket/...?...&X-Amz-Algorithm=...&X-Amz-Signature=...", "header": { "Content-Type": "application/octet-stream", "x-amz-meta-foo": "bar" // optional custom headers } } } }→ Presigned PUT URL
- For each object:
- git-lfs client → S3: HTTP PUT presigned URL (direct upload)
- Streams the binary file directly from laptop to S3
- Sends any required headers (e.g., Content-Type)
- S3 validates signature and stores object by oid path
- git-lfs client → LFS server: POST /objects (optional verify step in some implementations)
- Tells server “upload completed successfully” (used by some servers for tracking/logging)
- git push completes
- Only small pointer files are pushed through the Git protocol
- Large data transfer happened directly laptop ↔ S3
Summary Table – Who Talks to Whom
| Step | Participant | Talks to | Transfers Data? | Protocol/Method |
|---|---|---|---|---|
| 1–2 | Git client + git-lfs | — (local detection) | No | Local |
| 3–5 (handshake) | git-lfs client | LFS server (/batch) | No (metadata) | HTTP POST JSON |
| 6 (data movement) | git-lfs client | S3-compatible storage | Yes (big!) | HTTP GET/PUT + presigned |
| 7 | git-lfs client | LFS server (optional verify) | No | HTTP POST |
Key architectural beauty: The LFS server handles authentication, authorization, deduplication, and URL signing — but zero bytes of actual large file data ever touch the LFS server. This is why it’s extremely scalable and cheap when backed by object storage like S3.
Motivational Differences
There are motivational differences between how one might use Plain Old Git versus Git LFS.
Differing Origin Stories
Git was created by Linus Torvalds, a power user who performed a lot of demanding code-related administrative tasks. In contrast, the Git LFS project was cooperatively initiated by large-scale commercial vendors.
- Git LFS is the vendors’ project
- To solve their problems
- And fulfill demands made by their customers.
Git LFS users outside that use case, like me, have to figure many things out for themselves. The only way to do that properly is to experiment extensively, carefully read a lot of documentation, and examine source code.
Even though a lot of documentation has been written about Git, the Git LFS documentation was written by and for commercial vendors. I have tried to fill in gaps with this collection of articles.
How Many Copies Of Large Files on Clients?
Gemini and I had a good discussion.
when a user checks out a large file using git lfs, how many copies of the large file exist on their computer? under which circumstances might they have more than one copy when using git lfs?
In a standard Git LFS setup on a workstation, every tracked large file exists at least twice on your local disk.
This is a fundamental part of its design, and it is the primary reason why it is often unsuitable for real-time media workstations with limited SSD space.
1. The Two-Copy Reality
When you perform a git checkout, Git LFS populates your folder in two distinct locations:
- The LFS Cache (The “Secret” Copy): Stored inside
.git/lfs/objects/. This is the “source of truth” for the specific version you downloaded. It is named by its SHA-256 hash (e.g.,ad1b8...). - The Working Directory (The “Real” Copy): This is the file you actually see and open in your DAW or video editor (e.g.,
shot_01.mov).
Why does it do this? Git LFS follows Git’s internal philosophy: the repository’s database (the cache) must remain separate from the user’s workspace. When you switch branches, Git LFS deletes the file in your working directory and copies a different version from the .git/lfs cache.
2. Circumstances for More (or Fewer) Copies
While two copies is the default, your workstation might end up with more—or slightly fewer—depending on these specific technical scenarios:
A. More than two: Resolving Merge Conflicts
If you have a merge conflict on a 250MB video file, you may temporarily have three or more copies.
- The version from the
mainbranch. - The version from your
featurebranch. - The common ancestor (if you are running a 3-way diff).
- The “resolved” version you are currently writing.
B. More than two: Multiple Branches in Cache
By default, Git LFS does not delete old files from the .git/lfs/objects folder
when you switch branches.
- If you have 10 versions of a 250MB file across your history and you have checked them all out at some point, you will have 10 copies in the cache plus 1 copy in your working directory.
- Fix: You must manually run
git lfs pruneto delete the versions that aren’t currently checked out.
C. Only one copy: Using git lfs dedup (The Exception)
On specific filesystems that support Reflinks (APFS on Mac, ReFS on Windows, or Btrfs/XFS on Linux), you can run a manual command:
git lfs dedup
This command attempts to “de-duplicate” the two copies by telling the OS to point both the cache and the working directory to the same physical blocks on the SSD.
- The Catch: This is not automatic during a standard
git checkout. Every time you switch branches, Git LFS will revert to making a full copy, and you would have to rundedupagain to reclaim the space.
Summary of the Space Penalty
For your 250MB video files, a standard Git LFS workflow effectively treats them as 500MB of consumed space on your drive.
| Tool | Copies on Disk | Workflow Impact |
|---|---|---|
| Git LFS | 2+ | Significant SSD bloat; slow checkouts due to “smudge” copy time. |
| XVC / DVC | 1 | Uses hardlinks/reflinks automatically; checkout is nearly instant. |
but the copy in .git/objects/ is just a small reference (smudge) file
To be precise about the “two copies” mechanic:
1. The Pointer (Smudge) File
There is a small text-based pointer file (the “smudge” target),
that it lives in the Git Object Database
(.git/objects/).
- Size: ~100 bytes.
- Content: The SHA-256 hash and the file size.
- Role: This is what Git commits, pushes, and pulls.
2. The Local LFS Cache (The First Full Copy)
When you git pull, Git LFS downloads the actual 250MB binary from the server.
It stores this full file in:
.git/lfs/objects/[hash-prefix]/[full-hash]
- Size: 250MB (The actual binary).
- Role: This is your local “Source of Truth.” Git LFS keeps this here so that if you switch branches and then come back, it doesn’t have to download the 250MB from the internet again.
3. The Working Directory (The Second Full Copy)
When you run git checkout, the Git LFS smudge filter is triggered. It
reads the 100-byte pointer from Git, looks up the 250MB binary in your
.git/lfs/objects/ cache, and copies it into your project folder.
- Size: 250MB.
- Role: This is the file your DAW or video editor actually opens.
When do you have more than 2 copies?
These scenarios increase the duplication:
-
Multiple Branches: If you have 5 different edits of the same 250MB video file, and you have checked them out at different times, you will have 5 full copies in
.git/lfs/objects/and 1 copy in your working directory. Total: 1.5 GB for one active file. -
Staging Area (The “Index”): During a
git add, Git LFS runs the clean filter. It creates a temporary copy while hashing the file to move it into the LFS storage. -
Cross-Drive Projects: If your
.gitfolder is on a different drive (or partition) than your working files, the OS is forced to perform a full byte-for-byte copy during checkout.
The “Immutable” Optimization You’re Looking For
Because your large files are immutable, you correctly identify that we should only need one copy.
- Git LFS fails here because its default behavior is
cp(copy). - DVC/XVC/Dud succeed here because they use
ln(hard link) orreflink.
If you were to stay with Git LFS, you would have to manually run git lfs dedup
after every checkout to turn those two 250MB copies into a single 250MB physical
allocation via reflinks. This is why a tool like XVC is better for you; it
makes that “one copy” behavior the automated default.
is it true that all git lfs client-side traffic must flow through the git client? how might that be a bottleneck for git lfs signed urls?
Actually, it is not true that all Git LFS traffic flows through the Git client. Understanding this distinction is key to diagnosing the performance issues you’re seeing on your workstation.
1. The “Control Plane” vs. “Data Plane”
Git LFS separates the negotiation from the transfer.
- Control Plane (Git Client): When you run
git pull, the Git client talks to the Git server (like GitHub) to exchange those small 100-byte pointer files. Git itself handles the versioning logic. - Data Plane (LFS Client): Once Git encounters a pointer, it triggers the Git LFS agent (a separate binary). This agent talks directly to the LFS server (often an S3 bucket or Artifactory) using the Batch API.
The data transfer happens directly between the git-lfs binary and the
storage backend. It does not go through the main git process.
2. How Signed URLs Bypass the Server
Modern LFS implementations (like GitHub or GitLab) use Signed URLs specifically to avoid bottlenecks.
- Request: Your LFS client asks the LFS server: “I need the bytes for
SHA-256:
ad1b8....” - Redirect: The server doesn’t send the data. It sends back a Temporary Signed URL (e.g., an AWS S3 URL with an authentication token).
- Direct Download: Your LFS client then makes a standard
HTTP GETrequest directly to S3.
The data never touches the Git server. It flows from S3 directly to your
git-lfs process and then to your SSD.
3. The Real Bottleneck: The “Smudge” Pipeline
The bottleneck you’re likely experiencing isn’t the network traffic—it’s the OS-level IPC (Inter-Process Communication).
Even though the data doesn’t flow through the Git server, it must flow through a Pipe if you use the default “Smudge” filter:
- The Pipe Bottleneck: Git expects to receive the file contents on
STDOUTfrom the smudge filter to write it to your working directory. For a 250MB video,git-lfsis effectively “piping” those 250MB togit. - Sequential Processing: By default, Git runs this process one file at a time. If you have 100 video files, Git starts the LFS agent, feeds it a pointer, waits for the 250MB download/pipe, closes it, and repeats. This is a massive serial bottleneck, especially on Windows where process creation is expensive.
4. How to Bypass the Bottleneck
Because you are on a high-performance workstation, you should stop relying on the “automatic” smudge during clone/pull.
The “Systems Programmer” approach to LFS speed:
- Skip Smudging:
git lfs install --skip-smudge. This tells Git to only download the 100-byte pointers. Yourgit pullwill be near-instant. - Bulk Pull: Run
git lfs pullmanually afterward.- Why? Unlike the smudge filter,
git lfs pulluses the Batch API to download multiple files in parallel (usually 3–10 at a time). It writes directly to the.git/lfs/objectsfolder and then to your workspace, bypassing thegitpipe entirely.
- Why? Unlike the smudge filter,
Summary for your 250MB files:
The sluggishness you feel is the Git LFS agent being forced to talk through a
single-threaded pipe to the Git process. By skipping the smudge and using xvc
or git lfs pull, you move to a parallel, direct-to-disk architecture.
Would you like me to show you the Git config to increase the
lfs.concurrenttransfers limit so your 250MB downloads actually saturate your
network link?
yes
To increase the speed of your 250MB video downloads and bypass the sequential “smudge” bottleneck, you can tune your Git LFS configuration for high-concurrency parallel transfers.
1. Increase Concurrent Transfers
By default, Git LFS downloads only 3 to 8 files at once (depending on your version). On a high-bandwidth workstation, you can safely increase this to saturate your link.
Run this command to set the limit globally:
git config --global lfs.concurrenttransfers 64
- The Logic: This tells the LFS agent to initiate 64 simultaneous HTTP requests to your storage backend (e.g., S3 via Signed URLs). For 250MB files, this ensures that while one file is finishing, dozens of others are already mid-stream.
- Note: Some servers may rate-limit you if this is set too high (e.g., above 100), but 64 is generally a “sweet spot” for modern systems.
2. The “Fast Path” Workflow
To actually see the benefit of this concurrency, you must avoid the git
checkout smudge filter (which is serial) and use the Batch API via lfs
pull.
The High-Performance Routine:
- Configure LFS to skip the “oink” (Smudge):
git config --global filter.lfs.smudge "git-lfs smudge --skip -- %f"
This ensures git pull only downloads the tiny pointers, making your Git
operations instant.
- Pull Data in Parallel:
When you actually need the video files for your DAW:
git lfs pull
Because lfs pull is a standalone command, it uses the concurrenttransfers
setting to download your video assets in parallel directly to your disk.
3. Verification
You can verify your current active settings by running:
git lfs env | grep ConcurrentTransfers
Why this fixes the Signed URL bottleneck:
When using Signed URLs, the LFS server hands you a list of 100+ URLs at once.
- The Default Way: Git LFS downloads URL #1, finishes it, then starts URL #2. Your 10Gbps link sits at 5% utilization.
- The Concurrent Way: Git LFS fires off requests for URLs #1 through #64 simultaneously. Your network card finally has enough work to do, and your 250MB files arrive in “waves” rather than a slow trickle.
Would you like me to show you how to set up a “transfer agent” in Go if you find that the standard Git LFS client is still too slow for your specific workstation needs?
yes
To gain ultimate control over your 250MB transfers and bypass the standard “smudge” bottleneck, you can write a Custom Transfer Agent in Go.
The Git LFS client supports a JSON-based protocol over stdin/stdout.
When Git LFS needs to move data, it spawns your Go binary and sends it specific
commands. This allows you to implement your own logic (e.g., direct S3 multipart
uploads, peer-to-peer syncing, or custom encryption) while Git LFS handles the
metadata.
1. The Protocol Flow
Git LFS communicates with your agent in three stages:
- Initiation: Exchange capabilities (version check).
- Commands:
download,upload, orterminate. - Progress/Results: Your agent reports back as bytes move.
2. Go Implementation Skeleton
A basic Go agent needs to handle the JSON stream. Here is the architectural skeleton:
package main
import (
"bufio"
"encoding/json"
"os"
)
type Message struct {
Event string `json:"event"`
Oid string `json:"oid,omitempty"`
Size int64 `json:"size,omitempty"`
Path string `json:"path,omitempty"`
Action string `json:"action,omitempty"`
}
func main() {
reader := bufio.NewReader(os.Stdin)
// 1. Initiation: Read the init message and respond with {}
_, _ = reader.ReadString('\n')
os.Stdout.WriteString("{}\n")
// 2. Main Loop: Handle transfer requests
for {
line, _ := reader.ReadString('\n')
var msg Message
json.Unmarshal([]byte(line), &msg)
if msg.Event == "terminate" {
break
}
if msg.Event == "download" {
// Perform your high-speed Go download logic here
// reportProgress(msg.Oid, bytesSoFar)
// Complete the transfer
resp, _ := json.Marshal(Message{Event: "complete", Oid: msg.Oid, Path: "/local/path/to/file"})
os.Stdout.WriteString(string(resp) + "\n")
}
}
}
3. Configuration
Once your Go binary (e.g., my-lfs-agent) is compiled and in your path, you
tell Git LFS to use it:
# Define the agent path
git config lfs.customtransfer.myagent.path "my-lfs-agent"
# Set it as a standalone agent (bypasses the standard LFS API server)
git config lfs.standalonetransferagent myagent
# Optional: Disable concurrency if you want to handle it inside your Go code
git config lfs.customtransfer.myagent.concurrent false
Why this is the “Pro” Move for your Workstation
- No Smudge Pipe: By returning the
Pathin thecompletemessage, Git LFS moves your file directly into the cache. You are no longer streaming 250MB through a standard pipe. - Internal Concurrency: You can use Go routines and
contextto manage your own worker pools, ensuring your 10Gbps link is fully utilized without Git managing the threads. - Signed URL Optimization: If your backend uses Signed URLs, your Go agent can pre-fetch or refresh them dynamically, which the standard LFS client occasionally struggles with during long-running transfers.
Existing implementations to study:
-
charmbracelet/git-lfs-transfer: A Go implementation of the SSH-based transfer protocol.
-
nicolas-graves/lfs-s3: A specialized Go agent for S3 backends.
Do Not Rush Into Git LFS
Getting Git LFS to work can be fraught with issues. Time and patience are required to achieve an acceptable working system for particular circumstances. I hope that readers will benefit from the time I spent writing these articles.
It is easier and less risky to create a new Git repository that uses Git LFS right away than it is to migrate an existing Git repository to use Git LFS and maintain the structure of the commits.
By default, Git LFS rewrites the Git repository history during the migration process; this preserves the structure while reducing the size of the Git repository. The repository is smaller after the migration rewrites the Git history because large files are moved from the Git repository to the associated Git LFS repository.
If a project requires a carefully groomed commit graph, the Git history must be rewritten. Rewriting the Git history requires everyone to re-clone the Git repository. If many people share a Git repository and the history is rewritten, chaos can result because some people will invariably continue to work on the now-obsolete old Git repository. The problem users are the ones who do not read memos. They will lose their work unless a recovery procedure is followed.
If maintaining the Git commit ordering is not important to you and your organization, then you will be happy to know that this incantation:
$ git lfs migrate import --no-rewrite
... preserves compatibility for all your other users. However, the price of this compatibility is that after the import, any copies of large files that were in the Git repository will remain there. If those files were very large, the repository would remain bloated for all time.
So you must choose one of the following when enhancing a Git repository with Git LFS (see Git LFS Tracking, Migration and Un-Migration):
-
Benefit: smaller Git repository and consistent commit ordering.
Risk: a potential for unhappy users who lost work while the Git upgrade was in process. -
Benefit: Unlikely anyone will lose work during the upgrade.
Potential issue: The Git repository forever remains bloated with large files that no longer have a purpose.
Wait until you are familiar with Git LFS before attempting to convert any repositories that you care about to Git LFS. Create repositories to practice on, as shown in these articles.
It is likely that you will encounter problems trying to get things to work. Hopefully, the solutions that I provide will help you solve them. Learn about the problems you will likely encounter and practice the solutions on the practice repositories before you try enhancing an important Git repository with LFS.
Implementations
Git is a distributed versioning system, and so is Git LFS. Pricing for prepackaged LFS storage is unreasonably high from BitBucket, GitHub, and GitLab. Running a local Git LFS server means you can store large assets wherever makes the most sense, including local storage or on an S3 server.
With few exceptions, Git LFS servers from BitBucket, GitHub, and GitLab lack the ability to authenticate against other storage services. If you want to pay the lowest price possible for storage but want to host on one of the aforementioned Git providers, you will need to run your own instance of a Git LFS server. Happily, Git LFS servers are lightweight processes that do not generate much network traffic, so you could run one on your laptop or a small office server.
For example, you can point a local Git LFS server at large files on AWS S3 or S3 clones like DigitalOcean Spaces or Backblaze B2. You won't incur any extra costs from Git providers like GitHub or BitBucket for this, and this configuration is easy to set up.
Many LFS implementations exist; however, as is often the case with software, dead projects live on as zombies for many years. The following LFS implementations were the most interesting to me.
APIs
The Git LFS Batch API is documented here. It is implemented by most vendors and Git LFS storage solutions as the default HTTP-based Git LFS object transfer protocol.
In contrast, the SSH-based Git LFS object transfer protocol, labeled the "SSH protocol proposal" in the linked page, was introduced with Git LFS v3.0.0. This protocol is only offered by a few vendors today, such as GitLab, who added it in their v17.2 release. I believe that GitHub and BitBucket also offer Git LFS over SSH.
LFS Server Summary
Prices are shown in the next section.
I carefully examined the source code of every open source Git LFS server mentioned in this article. The only one that had proper error handling and other standard quality measures was the GitLab implementation. All the others only had “happy path” coding. The commercial products (GitHub, JFrog and Sonatype) are not open source and so I was unable to examine the source code.
- Null servers only use locally accessible content, such as might be found on a server in your local area network. This is the simplest setup, and if that server is regularly maintained (including backups), this option can provide good performance at no extra cost and subject to whatever security regime you decide to implement.
- BitBucket has a tiny free storage offering, so small as to be almost useless (1 GB per GitHub user).
- Git LFS S3 Proxy is a Cloudflare Pages site that acts as a Git LFS server backed by any S3-compatible service. This project is different from all others mentioned. It is extremely simple and maintainable, uses Cloudflare's inexpensive global edge network, and is dramatically cheaper than GitHub and GitLab LFS storage. From a security point of view, this architecture is the most insecure of all. Every request exposes full credentials. There is a very high security risk if the URL is logged, shared, copied to a clipboard, appears in git reflog, CI logs, browser history, etc. This project is a brilliant hack, but exposes the user to unnecessary risk.
- GitHub also has a tiny free storage offering, so small as to be almost useless (1 GB per GitHub user).
- GitLab: All projects on GitLab.com have 10 GiB of free storage for their Git repository and Large File Storage (LFS). While much more generous than GitHub's free offering, this is still too small to be useful for many projects. GitLab’s storage pricing is crazy expensive. This appears to be the best available open-source Git LFS implementation.
- JFrog Artifactory is a commercial server that provides Git LFS support, and many other features. Unless you are already a JFrog customer, it would not make sense to select this product for Git LFS.
-
Gitbucketis a F/OSS project. However, I would be reluctant to rely on anything written in Scala for production. Giftlessis a pluggable F/OSS Git LFS server. It is written in Python and claims to be is easy to extend. In particular, it supports local storage, and storage on AWS S3. Unfortunately, it is an example of happy path coding and is incomplete.- LFS Test Server is an example server that implements the Git LFS API. The product notes say that it is “intended to be used for testing the Git LFS client and is not in a production-ready state.” However, if run on an internal network, or on a laptop, this might be a viable option. LFS Test Server is written in Go, with pre-compiled binaries available for Mac, Windows, Linux, and FreeBSD. This is a reasonable starting point for a custom Git LFS server, but lacks proper error handling.
-
Rudolfsclaims to be a high-performance, caching, F/OSS Git LFS server with an AWS S3 (that does not work) and local storage back-end. The does not have proper error checking and only considers the happy path. I do not recommend using this project. - Sonatype Nexus Repository is a commercial server that provides Git LFS support, and many other features. Only supports the Git LFS batch API. It does not offer an online storage option; users should self-host. Unless you are already a Sonatype customer, it would not make sense to select this product for Git LFS.
Storage Pricing
There is a huge disparity in pricing between the various S3-compatible storage providers. Wasabi, at $7/TB/month, is the cheapest, while GitLab, at $6000/TB/month, is the most expensive. I do not believe that GitLab’s storage is 857 times better than Wasabi’s.
Data egress fees can become substantial costs. In fact, for large and active assets, data egress fees can be much greater than storage costs. GitHub has the highest data egress fees, 33 times more than the next most expensive provider.
When data egress fees are charged, there is no limit to the potential financial liability
Data egress fees do not apply from a Git provider (like GitHub) when storage is provided by a separate storage provider (like Wasabi).
In the following table, data egress fees for providers that do not charge for data egress are shown as $0.
| Storage | Normalized Storage (TB/month) | Egress ($/GB) | Comment | |
|---|---|---|---|---|
| AWS | $0.023 GB/month | $26 | $0.090 | |
| Azure | $0.15 GB/month | $15 | $0.087 | Premium pay-as-you-go pricing shown; other charges apply. Very complex pricing. |
| Backblaze | $6 TB/month | $6 | $0.010 | Minimum charge: $6/month. |
| Bitbucket | $0.100 GB/month | $100 | ? | Incomplete information shown online. |
| Cloudflare R2 | $0.015 GB/month | $15 | $0 | Very complex pricing. 10 GB free storage, unlimited bandwidth to write up to 1 million objects and read up to 10 million objects. |
| DigitalOcean Spaces | $0.02 GB/month | $20 | $0.010 | Minimum charge: $5/month |
| GitHub | $5/data pack/month | $100 | $100.000 | |
| GitLab | $5/month: 10 GB storage and 20 GB egress | $6000 | $3.000 | Other charges apply, for example, $29 per user/month |
| Google Cloud | $23 | $0.110 | Very complex pricing. Other charges apply. | |
| Linode | $5/month: 250GB storage, 1 TB egress | $20 | $0.005 | The first 1 TB egress is included |
| Scaleway | $0.015 GB/month | $15 | $0.010 | 75 GB/month egress included |
| Wasabi | $7 | $0 | Minimum charge: $6.99/month. |
Use the Latest Version of Git
At first, I had a really difficult time making Git LFS work. A major contributing factor was that my versions of Git on the client and server were both old, and the server version was older than the versions on the clients. Once I sorted that out, things got much easier.
As of 2025-12-08, the latest stable version of Git was v2.52.0.
You can check the version of Git on your clients and the Git LFS server by running the following on each computer:
$ git -v git version 2.47.1
Git Releases
I asked ChatGPT to summarize the enhancements to Git made since v2.43 was released (November 20, 2023) that might affect Git LFS. The response has been edited for readability and relevance.
Good question. There have indeed been multiple changes in Git since version 2.43.0 that can affect Git LFS usage — either directly or indirectly. Below is a summary of the most relevant changes (core-Git plus Git LFS itself) since 2.43.0.
✅ Git-Side Changes (since 2.43.0) that can impact Git LFS workflows
-
New “reftable” reference backend (Git 2.45.0)
- Git 2.45 introduced experimental support for a new reference-storage backend called “reftable” as an alternative to the traditional “files” (loose refs + packed-refs) backend. ([The GitHub Blog][1])
- That means repositories (including those using LFS) can now use
git init --ref-format=reftableorgit clone --ref-format=reftableto get potentially more scalable reference storage — especially useful for repos with many branches/tags. ([about.gitlab.com][2]) - For existing repos, this may affect how references (branches/tags) are stored — which indirectly affects any LFS-backed repository.
-
Performance improvements for pack generation (Git 2.44.0)
- Git 2.44 added “faster pack generation with multi-pack reuse,” which improves performance of operations like push/pull. ([Mike Slinn][3])
- This can speed up repository operations even when there are many large files tracked by LFS (since pack-generation affects object transfer and repository maintenance).
-
Reversion of a change to
attr.treedefault behaviour (Git 2.46.0)- In 2.43.0, Git changed how attribute lookup worked for bare repositories (using HEAD tree by default for
.gitattributesin such cases). That had implications for LFS, since.gitattributesoften defines which files are tracked by LFS. ([about.gitlab.com][4]) - In 2.46.0 the change was partially reverted because it caused performance regressions for clone/pull/diff with large or deep trees. ([about.gitlab.com][4])
- In practice, this reversion helps avoid unexpected attribute-resolution issues in bare repositories, which might otherwise have disrupted LFS tracking.
- In 2.43.0, Git changed how attribute lookup worked for bare repositories (using HEAD tree by default for
-
Build-system, memory-leak, and internal quality improvements (Git 2.47 / 2.48)
- Starting around 2.47 / 2.48, Git migrated to a new build system (the Meson build system), improving maintainability across platforms. ([about.gitlab.com][5])
- In 2.48, Git was declared “memory-leak-free” (as exercised by the test suite). ([about.gitlab.com][5])
- While not LFS-specific, these improvements reduce risk of subtle memory or behavior bugs when working with large repos — which is often the case when LFS is in use.
-
Safety/hardening changes that affected LFS workflows in Git for Windows 2.45.1
- According to the release notes for Git for Windows 2.45.1, a “defense-in-depth” change caused a regression when cloning repos enabled with Git LFS: clones would fail because an active
post-checkouthook (installed bygit lfs install) would be disallowed by default. ([Google Groups][6]) - The workaround was to run
git lfs pullafter clone, or adjust environment to disable the protection (GIT_CLONE_PROTECTION_ACTIVE=false). ([GitHub][7]) - This is a concrete example of a Git core change interfering with LFS usage — something to watch out for if you rely on hooks or complex clone workflows.
- According to the release notes for Git for Windows 2.45.1, a “defense-in-depth” change caused a regression when cloning repos enabled with Git LFS: clones would fail because an active
🛠 Git LFS–Side Updates (since Git 2.43.0) relevant to newer Git versions
Although the core of LFS is separate from Git, the combination of updated Git + updated LFS can produce better behaviour. Some relevant LFS-project changes:
- LFS v3.6.0 added support for “multi-stage authentication with Git credential helpers” (when used with Git ≥ 2.46.0) and support for relative worktree paths (when Git ≥ 2.48.0). It also added a new “object transfer batch size” config option, improved Windows path handling at install time, more POSIX-compliant hook scripts, and performance improvements in contexts like sparse checkouts, partial clones, and Git remotes with many tags. ([Mike Slinn][3])
- LFS v3.7.0 (the latest major LFS release as of 2025) introduced several enhancements: options to force objects to be re-downloaded, output object URLs + HTTP metadata as JSON (useful for automation), a configurable in-memory cache of file-path pattern matches (speeding up LFS-migration of large repos), better compatibility on Windows (same
.netrchandling as Git and curl), tolerant retry logic after HTTP 429 responses, support for symlinks to Git’s object storage during LFS migrations, and fewer TLS-verification issues on macOS with custom CA certificates. ([GitHub][8])
⚠️ Known Issues / Gotchas because of Git ↔ LFS Interaction
- The safety hardening in Git 2.45.1 for Windows caused some Git-LFS workflows to fail (as noted above) — clones would succeed but checkout would fail if a
post-checkouthook was present. The official remedy was to rungit lfs pull. ([Google Groups][6]) - Switching reference-backend to “reftable” might have unknown side-effects for tooling or hosting services that expect the “files” backend — since this is still experimental/optional (though increasingly recommended) with Git 2.45+.
🎯 What This Means for You (as a Git + LFS User)
- If you maintain a repo with many branches/tags (or expect to scale), consider experimenting with the “reftable” backend — especially if you also use LFS. It can potentially improve performance and ref-storage scalability.
- Upgrading to recent Git (≥ 2.48) and recent LFS (≥ 3.6) will give you better performance, more robust authentication, and improved cross-platform behavior (notably Windows/macOS).
- Be aware of the Git-Windows 2.45.1 issue: if your workflows rely on hooks (e.g.
post-checkoutfromgit lfs install) the default safety restrictions can block checkout on fresh clones — and you may need to callgit lfs pullmanually (or adjust environment). - For large / complex repos (many refs, many LFS objects, sparse/partial clones), the combination of Git performance improvements + LFS enhancements should make Git + LFS much more efficient/stable than before 2.43.0.
Upgrading Git
StackOverflow provided the information on how to upgrade Ubuntu. As usual, these instructions also apply to WSL/Ubuntu. If you are typing along, do the following on your server and all your clients.
Add the git-core PPA to the apt sources.
$ yes | sudo add-apt-repository ppa:git-core/ppa PPA publishes dbgsym, you may need to include 'main/debug' component Repository: 'Types: deb URIs: https://ppa.launchpadcontent.net/git-core/ppa/ubuntu/ Suites: noble Components: main ' Description: The most current stable version of Git for Ubuntu.
For release candidates, go to https://launchpad.net/~git-core/+archive/candidate . More info: https://launchpad.net/~git-core/+archive/ubuntu/ppa Adding repository. Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease Hit:2 https://dl.google.com/linux/chrome/deb stable InRelease Get:4 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease [24.3 kB] Get:5 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 Packages [2,840 B] Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease Get:6 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main i386 Packages [2,848 B] Get:7 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main Translation-en [2,088 B] Fetched 32.1 kB in 1s (35.7 kB/s) Reading package lists... Done N: Skipping acquire of configured file 'main/binary-i386/Packages' as repository 'https://dl.google.com/linux/chrome/deb stable InRelease' doesn't support architecture 'i386'
Now update the apt packages and upgrade Git.
$ sudo apt update Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease Hit:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble InRelease Hit:3 https://packagecloud.io/github/git-lfs/ubuntu noble InRelease Reading package lists... Done Building dependency tree... Done Reading state information... Done 2 packages can be upgraded. Run 'apt list --upgradable' to see them.
$ sudo apt list --upgradable Listing... Done git-man/noble,noble 1:2.47.1-0ppa1~ubuntu24.04.1 all [upgradable from: 1:2.43.0-1ubuntu7.1] git/noble 1:2.47.1-0ppa1~ubuntu24.04.1 amd64 [upgradable from: 1:2.43.0-1ubuntu7.1]
$ yes | sudo apt upgrade Reading package lists... Done Building dependency tree... Done Reading state information... Done Calculating upgrade... Done Get more security updates through Ubuntu Pro with 'esm-apps' enabled: libcjson1 libavdevice60 ffmpeg libpostproc57 libavcodec60 libavutil58 libswscale7 libswresample4 gh libavformat60 libavfilter9 Learn more about Ubuntu Pro at https://ubuntu.com/pro The following packages will be upgraded: git git-man 2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 8,967 kB of archives. After this operation, 11.9 MB of additional disk space will be used. Get:1 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git amd64 1:2.47.1-0ppa1~ubuntu24.04.1 [6,775 kB] Get:2 https://ppa.launchpadcontent.net/git-core/ppa/ubuntu noble/main amd64 git-man all 1:2.47.1-0ppa1~ubuntu24.04.1 [2,192 kB] Fetched 8,967 kB in 10s (881 kB/s) (Reading database ... 486991 files and directories currently installed.) Preparing to unpack .../git_1%3a2.47.1-0ppa1~ubuntu24.04.1_amd64.deb ... Unpacking git (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Preparing to unpack .../git-man_1%3a2.47.1-0ppa1~ubuntu24.04.1_all.deb ... Unpacking git-man (1:2.47.1-0ppa1~ubuntu24.04.1) over (1:2.43.0-1ubuntu7.1) ... Setting up git-man (1:2.47.1-0ppa1~ubuntu24.04.1) ... Setting up git (1:2.47.1-0ppa1~ubuntu24.04.1) ... Processing triggers for man-db (2.12.0-4build2) ...
Checking the installed version of Git shows the desired result:
$ git --version git version 2.47.1
Git add Without LFS Is Slow With Large Files
Plain Old Git lacks the ability to recognize compressed files. Git LFS addresses that problem.
Without Git LFS, the git add command creates a compressed snapshot
of the working files that you have specified.
When files are large, this can take a long time.
Git and Git LFS store the files that they manage differently.
Git saves added files to the staging area, which is actually the file .git/index.
On the other hand, Git LFS saves added files to the .git/lfs/objects/ directory.
Many types of large files are already compressed, for example zips,
video files and many types of audio files.
Without Git LFS, the git add command will waste time trying to further compress already compressed files.
Again, Git LFS is designed for this task.
In contrast, the git commit command is always fast.
It just creates a list of the snapshots that should be grouped into a commit.
References
-
git-lfs.com - Git LFS by Atlassian
- GitLab - Git LFS
- Handling Large Files with LFS
- Man pages.
- The online documentation. Read about the documented limitations.
- A Developer’s Guide to Git LFS
- Set Up a Git LFS Repository by JFrog
- Git Large File System Overview
- Git LFS Client Installation
- Git LFS Server URLs
- Git-ls-files, Wildmatch Patterns and Permutation Scripts
- Git LFS Tracking, Migration and Un-Migration
- Git LFS Client Configuration & Commands
- Working With Git LFS
Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, the compiled Go programs provided on GitHub should install easily, and most of the textual information should be helpful.