Evaluation Procedure For Git LFS Servers

Published 2025-01-06. Last modified 2025-02-06.
Time to read: 7 minutes.

This page is part of the git collection.

I have published 8 articles about the Git large file system (LFS). They are meant to be read in order.

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

This Article Probably Contains Errors

This article is incomplete and may contain errors. It has been published to allow collaboration with fact-checkers. Not all of these scenarios might make sense to attempt to review. Do not rely on this information yet.

I evaluated several Git LFS servers. Scripts and procedures common to each evaluation are described in this article. This article concludes with a brief outline of the procedure followed for each of the Git LFS servers that I chose to evaluate.

The results of evaluating each server are documented in follow-on articles.

If you evaluate another Git LFS implementation using the scripts provided in this article and send me detailed results. My coordinates are on the front page of this website.

Many websites show examples of how to work with Git LFS by tracking small files. This is quicker and easier than testing with files over 100 MB, which can be slow and consumes a lot of storage space.

Testing with small files will avoid most of the practical details that evaluation is intended to surface. Furthermore, you should also see what happens when your LFS storage exceeds 1 GB, and what happens when you attempt to commit more than 2 GB of content in a single commit.

If you follow along with the instructions I give in the evaluations, large files will actually be used, non-trivial amounts of LFS storage will be used, and practical issues may arise. This is the motivation for performing evaluations, after all.

Git LFS Evaluation Procedure

Each of the next several articles is dedicated to the evaluation of a specific Git LFS server. Each Git LFS server is evaluated in several scenarios. The high-level process for each evaluation scenario is as follows:

The Git LFS server is installed.
Potential Git LFS server configuration is considered, and any relevant activities are performed.
Each evaluation is performed in a series of steps. The step script sequences the stages of each GitHub LFS evaluation scenario. It uses test data provided by the git_lfs_test_data script and stored in $work/git/git_lfs_test_data.
1. For the null server tests the new_bare_repo script is used.
2. For most of the Git LFS servers tested the create_lfs_eval_repo script is used to create the evaluation repository as a subdirectory of ~/git_lfs_repos on gojira.

Step Script

The step script accepts the scenario number and the step number. Each of the steps are shown below, along with a description of the actions performed for that step.

The Git LFS Server URL is saved into an appropriately crafted .lfsconfig file for the evaluation scenario.

The track script specifies the wildmatch patterns that determine the files to be stored in Git LFS.

An initial set of files are copied into the repository, CRCs are computed for each file and stored.
The initial set of files are added to the Git and Git LFS repositories, and the time that took is saved.

The files are committed, and the time that took is saved.

The files are pushed, and the time that took is saved.

The client-side and server-side Git repositories are examined. Where are the files stored? How big are they? Etc.
Big and small files are updated, deleted and renamed.

CRCs are computed for all files and saved. The CRCs of files that did not change are examined to make sure that the CRCs did not change. The client-side and server-side Git repositories are examined. Where are the files stored? How big are they? Etc.
The repository is cloned to another machine with a properly configured git client. CRCs are computed for all files and compared to the CRCs computed on the other client machine.
File changes for step 3 are made on the second git client machine, committed, and pushed. CRCs are computed for all files.
On the first client machine, changes are pulled. The CRCs of all files are compared between the two clients.
The untrack and unmigrate commands are used to move all files from Git LFS to Git. Any issues are noted, and the Git LFS repository is examined. CRCs are computed and compared to the previous value.

Organization

I defined 3 environment variables to make moving between directories easier and less confusing for readers. These environment variables take advantage of the organizational similarity of the Git LFS evaluation setup between server and clients. The values shown were created by the setup_git_lfs_eval_server and setup_git_lfs_eval_client scripts.

gojira:/etc/environment (server-side)

LFS_EVAL=/home/mslinn/lfs_eval
LFS_DATA=$LFS_EVAL/eval_data
LFS_SCENARIOS=$LFS_EVAL/scenarios

The clients stored evaluation data files and client-side copies of the repositories in $work/git/lfs_eval/. I defined 3 environment variables to match the server-side environment variables.

bear:/etc/environment and camille:/etc/environment (client-side)

LFS_EVAL=/mnt/f/work/git/lfs_eval
LFS_DATA=$LFS_EVAL/eval_data
LFS_SCENARIOS=$LFS_EVAL/scenarios

Client-side $LFS_DATA (bear and camille)

$LFS_DATA/
  step1/
    .checksums
    README.md
    pdf1.pdf
    video1.m4v
    video2.mov
    video3.avi
    video4.ogg
    zip1.zip
    zip2.zip
  step2/
    .checksums
    README.md
    pdf1.pdf
    video2.mov
    video3.avi
    zip1.zip
  step3/
    .checksums
    README.md

Client-side $LFS_SCENARIOS (bear and camille)

$LFS_SCENARIOS/
  1/
    .git/
    .checksums
    README.md
    ...
  2/
    .git/
    .checksums
    README.md
    ...
  17/
    .git/
    .checksums
    README.md
    ...

To make all the Git repositories on server gojira:

Shell

mslinn@gojira ~ $ setup_git_lfs_eval_server
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_1.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_2.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_3.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_4.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_5.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_6.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_7.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_8.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_9.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_10.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_11.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_12.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_13.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_14.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_15.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_16.git/
Initialized empty shared Git repository in /home/mslinn/lfs_eval/scenarios/scenario_17.git/

Test Data

We need test data to ensure that Git LFS is functioning correctly. A minimum of 2 GB of binary files larger than 100 GB is required to properly test Git LFS.

The test files are organized so Git and Git LFS operations can be tested. Large files need to be added, removed and updated. Old versions need to be retrievable.

For convenience, I wrote a script called git_lfs_test_data that downloads large files from several sources. The files are organized into subdirectories; each subdirectory contains the files that are added to the commit for each step of the test procedure.

git_lfs_test_data

#!/bin/bash

# download url_dir filename saved_name
function download {
  URL_DIR="$1"
  FNAME="$2"
  SAVED_FNAME="$3"
  THIS_DIR="$( basename "$PWD" )"
  if [ -f "$SAVED_FNAME" ]; then
    echo "$SAVED_FNAME has already been downloaded to $THIS_DIR"
    return
  fi
  echo "Downloading $FNAME as $THIS_DIR/$SAVED_FNAME"
  curl \
    -o "$SAVED_FNAME" \
    --progress-bar \
    --retry 5 \
    --retry-all-errors \
    "$URL_DIR/$FNAME"
}

# Files for step 1
mkdir -p step1 && cd step1 || exit 1
cat <<EOF >.gitignore
.cksum_output
EOF
echo "This is README.md for step 1" > README.md
download https://download.blender.org/peach/bigbuckbunny_movies BigBuckBunny_640x360.m4v       video1.m4v
download https://download.blender.org/peach/bigbuckbunny_movies big_buck_bunny_480p_h264.mov   video2.mov
download https://download.blender.org/peach/bigbuckbunny_movies big_buck_bunny_480p_stereo.avi video3.avi
download https://download.blender.org/peach/bigbuckbunny_movies big_buck_bunny_720p_stereo.ogg video4.ogg
download https://mattmahoney.net/dc                             enwik9.zip                     zip1.zip
download https://www.gutenberg.org/cache/epub/feeds             rdf-files.tar.zip              zip2.zip
download https://files.testfile.org/PDF                         100MB-TESTFILE.ORG.pdf         pdf1.pdf

# Files for step 2
cd - && mkdir -p step2 && cd step2 || exit 1
echo "This is README.md for step 2" > README.md
download http://ipv4.download.thinkbroadband.com                200MB.zip                      zip1.zip
download https://download.blender.org/peach/bigbuckbunny_movies big_buck_bunny_720p_h264.mov   video2.mov
download https://download.blender.org/peach/bigbuckbunny_movies big_buck_bunny_720p_stereo.avi video3.avi
download https://files.testfile.org/PDF                         200MB-TESTFILE.ORG.pdf         pdf1.pdf

# Files for step 3
cd - && mkdir -p step3 && cd step3 || exit 1
echo "This is README.md for step 3" > README.md

printf "\nDownload complete.\n"
printf "Here are lists of the initial set of downloaded files and sizes,\n"
printf "ordered by name. The last line in each listing shows the total\n"
printf "size of the files.\n\n"
printf "Some files might be deleted by each step; those are not shown here.\n\n"

printf "\nStep 1:\n"
cd ../step1  || exit 1
du -ah | sed s^./^^

printf "\nStep 2:\n"
cd ../step2 || exit 1
du -ah | sed s^./^^

printf "\nStep 3:\n"
cd ../step3 || exit 1
du -ah | sed s^./^^

Installation instructions are provided in Git LFS Scripts.

I also wrote a script called checksums that computes CRCs for each file, and compares the CRCs against expected values. Discrepancies are reported.

checksums

#!/bin/bash

CHECKSUM_FILE=".checksums"


function help {
  echo "$( basename "$0" ) - Report names, CRC and size of changed files.

Syntax:
  $( basename "$0" ) [DIRECTORY]
  where DIRECTORY can be relative or absolute.

The first time this script runs, it creates a file called $CHECKSUM_FILE in the current directory.
Each time it runs thereafter, any files whose contents change are reported."
  exit
}

# Directory to scan
if [ "$1" ]; then
  DIR="$1"
else
  DIR="."
fi


# Function to parse command-line options
parse_options() {
  unset DEBUG
  while getopts "h" opt; do
    case $opt in
    *)
      help
      ;;
    esac
  done
}

# Function to compute checksums and write them to a file
compute_checksums() {
  local output_file="$1"
  find "$DIR" \( -path './.git*' -prune -o -path "./$CHECKSUM_FILE" -prune \) -o -type f -print0 | \
    xargs -0 cksum | \
    sort >"$output_file"
}

# Function to get the human-readable file size using find
get_file_size() {
  local file="$1"
  # Using find to get the file size in bytes, then convert to human-readable format
  size_in_bytes=$(find "$file" -printf "%s")
  # Convert bytes to human-readable format
  human_size=$(numfmt --to=iec "$size_in_bytes")
  echo "$human_size"
}

# Function to compare checksums with the saved values
compare_checksums() {
  if [[ ! -f "$CHECKSUM_FILE" ]]; then
    echo "Checksum file not found. This should not happen if cksum_output exists."
    exit 1
  fi

  TMP_FILE=$(mktemp /tmp/cksum_compare_XXXXXX.txt)

  # Ensure the temporary file gets deleted when the script finishes or exits
  trap 'rm -f "$TMP_FILE"' EXIT

  compute_checksums "$TMP_FILE"

  # Compare checksums directly
  while read -r line; do
    # Extract checksum and file name from saved cksum_output
    old_crc=$(echo "$line" | awk '{print $1}')
    file_path=$(echo "$line" | awk '{print $3}')

    # Skip if it's the checksum file itself
    if [[ "$file_path" == "$CHECKSUM_FILE" ]]; then
      continue
    fi

    # Find corresponding line in the current checksum output
    new_crc=$(grep -E "(^|\s)$file_path($|\s)" "$TMP_FILE" | awk '{print $1}')

    # If the checksums don't match, report the change
    if [[ "$old_crc" != "$new_crc" ]]; then
      old_size=$(get_file_size "$file_path")
      new_size=$(get_file_size "$file_path")
      echo -e "${YELLOW}$file_path CRC was: $old_crc, $old_size; CRC is now: $new_crc, $new_size${NORMAL}"
    fi
  done <"$CHECKSUM_FILE"
}


NORMAL=$(tput sgr0)
GREEN=$(tput setaf 2)
YELLOW=$(tput setaf 3)

# Parse command-line options
parse_options "$@"

# If cksum_output exists, compare the checksums and replace it with the current checksums
if [[ -f "$CHECKSUM_FILE" ]]; then
  compare_checksums
fi
compute_checksums "$CHECKSUM_FILE"

About The Test Data

The script downloads 10 files, ranging in size from 102 MB to 398 MB. The downloaded files have filetypes that include avi, m4v, mov, mp4, ogg, pdf, and zip. These files are suitably licensed for testing purposes.

Big Buck Bunny is a set of CC BY 3.0 licensed videos that are free to share and adapt. These videos are renderings of a single original video, provided in a variety of formats and resolutions. The original video is a comedy about a fat rabbit taking revenge on three irritating rodents. The script downloads some Big Buck Bunny videos for use as large file content for evaluating Git LFS servers.

The script also downloads one large zip file from each of the following:

Project Gutenberg
Large Text Compression Benchmark
NYC Taxi Data as provided by thinkbroadband.com

Large PDFs are downloaded from testfile/org.

The downloaded large files are placed in subdirectory, called v1/ and v2/. These subdirectories are used to update other previously committed large files. Doing so will reveal any problems that might exist when revising a large file.

Using git_lfs_test_data

You should create an environment variable called work that points to where you want Git LFS testing to be done. Also create a subdirectory called $work/git/. This is where the Git LFS testing will be performed.

It takes several minutes for the git_lfs_test_data script to download the test files. They should be saved in $work/git/git_lfs_test_data/. The test scripts copy these files to your Git LFS test directory as required.

Here is an example of how to use the git_lfs_test_data script to download the test data. I stored the files in a new directory called $work/git/git_lfs_test_data on my local server gojira. The follow-on articles reference this directory, accessed via ssh. Scroll to the end of the output to see the test data that we now have available to work with.

Shell

mslinn@gojira ~ $ mkdir "$work/git/git_lfs_test_data"

mslinn@gojira ~ $ cd "$work/git/git_lfs_test_data"

mslinn@gojira ~ $ git_lfs_test_data
Downloading BigBuckBunny_640x360.m4v as step1/video1.m4v
##################################################### 100.0%
Downloading big_buck_bunny_480p_h264.mov as step1/video2.mov
##################################################### 100.0%
Downloading big_buck_bunny_480p_stereo.avi as step1/video3.avi
##################################################### 100.0%
Downloading big_buck_bunny_720p_stereo.ogg as step1/video4.ogg
##################################################### 100.0%
Downloading enwik9.zip as step1/zip1.zip
##################################################### 100.0%
Downloading rdf-files.tar.zip as step1/zip2.zip
##################################################### 100.0%
Downloading 100MB-TESTFILE.ORG.pdf as step1/pdf1.pdf
##################################################### 100.0%
Downloading 200MB.zip as step2/zip1.zip
##################################################### 100.0%
Downloading big_buck_bunny_720p_h264.mov as step2/video2.mov
##################################################### 100.0%
Downloading big_buck_bunny_720p_stereo.avi as step2/video3.avi
##################################################### 100.0%
Downloading 200MB-TESTFILE.ORG.pdf as step2/pdf1.pdf
##################################################### 100.0%

Download complete.
Here are lists of downloaded files and sizes for each step,
ordered by name. The last line in each listing shows the total
size of the files.

Step 1:
103M pdf1.pdf
116M video1.m4v
238M video2.mov
150M video3.avi
188M video4.ogg
308M zip1.zip
154M zip2.zip
1.3G .

Step 2:
205M pdf1.pdf
398M video2.mov
272M video3.avi
200M zip1.zip
1.1G .

The filetypes of these large files are: avi, m4v, mov, ogg, pdf, and zip.

Evaluation Repository Creation Script

I wrote the following script to prepare a consistent testbed for non-null Git LFS servers. The script creates a private git repository in your user account on GitHub. The commands in the script were discussed in Git LFS Tracking, Migration and Un-Migration.

The script relies on the GitHub CLI. If you want to use this script, please install the GitHub CLI first. Be sure to initialize the GitHub CLI before you run the script. GitHub CLI initialization allows you to run authenticated remote git commands from a console.

create_lfs_eval_repo

#!/bin/bash

# Run this script on the client that should hold the original copy of the
# git repository used to evaluate all Git LFS servers.
# Initializes the new repo, but does not set origin.url or lfs.url.
#
# The new git repository is placed in "$work/git/original_repo",
# and is copied to gojira:lfs_eval/original_repos/normal_repo.

function help {
  if [ "$1" ]; then printf "$1\n\n"; fi
  echo "$(basename "$0") creates a standard git repository for testing Git LFS implementations.

This script creates a new Git repository, and an empty clone of the new repository on GitHub.
The local copy is then populated with test data.

This script uses test data from $work/git/git_lfs_test_data/, which must exist.
See https://mslinn.com/git/5600-git-lfs-evaluation.html#git_lfs_test_data

Syntax: $(basename "$0") SCENARIO_NUMBER
Where:
  SCENARIO_NUMBER can range from 3 through 9, inclusive.

Scenarios 1 and 2 exercise bare git repositories, created by newBareRepo.
"
  exit
}

# @param $1 index origin 1
function lfs_url {
  urls=(
    "/mnt/server_repos/scenario1.lfs"
    "gojira:/eval_repos/scenario2.lfs"
  )
  echo "${urls[$1]}"
}

function doit {
  # Create the directory for the new git repository
  SCENARIO="scenario$1"
  REPO_DIR="$work/git/$SCENARIO"
  echo "Creating '$REPO_DIR'"
  mkdir "$REPO_DIR"
  mkdir "$REPO_DIR.lfs"

  cd "$REPO_DIR" || exit

  initialize "$1"

  THIS_DIR="$(basename "$PWD")"
  JSON="$( gh repo list --json name )"
  EXISTS="$( jq -c --arg CWD "$THIS_DIR" '.[] | select(.name==$CWD)' <<< "$JSON" )"
  if [ "$EXISTS" ]; then
    if [ "$FORCE" ]; then
      echo "Recreating the '$THIS_DIR' repository on GitHub"
      # From https://cli.github.com/manual/gh_repo_delete:
      # Deletion requires authorization with the delete_repo scope.
      # To authorize, run gh auth refresh -s delete_repo
      gh repo delete "$THIS_DIR"
    else
      echo "Error: a repository called '$THIS_DIR' already exists in your GitHub account and the -f option was not specified."
      exit 5
    fi
  fi
  # Create a private git repository from the current directory
  gh repo create --private --source=. --remote=origin

  populate "$1"
  echo "All done."
}

function initialize {
  echo "Initializing the repository '$REPO_DIR' on this computer."

  # Make the current directory into an empty git repository
  git init

  # Ensure the git lfs extension filters are defined, and the repo's pre commit hooks are installed.
  git lfs install
}

function populate {
  echo "Creating README.md"
  printf "# Scenario $1\nThis is a normal file.\n" > README.md

  echo "Copying $work/git/git_lfs_test_data/"
  rsync -at --progress "$work/git/git_lfs_test_data/" ./
}


unset FORCE
while getopts "fh" opt; do
   case $opt in
       f ) export FORCE=true ;;
       h ) help ;;
       [?]) help ;;
   esac
done
shift "$(($OPTIND-1))"


if [ -z "$1" ]; then help "Error: Please provide the scenario number."; fi
if (( "$1" < 1 )); then help "Error: Invalid scenario number must be at least 3 ('$1' was provided)."; fi
if (( "$1" < 3 )); then help "Error: Scenarios 1 and 2 are for bare git repositories; use newBareRepo instead."; fi
if (( "$1" > 9 )); then help "Error: Invalid scenario number must be less than 10 ('$1' was provided)."; fi

if [ -z "$work" ]; then help "Error: the \"\$work\" environment variable is undefined."; fi
if [ -d "$work/$1" ]; then
  echo "Error: the directory '$work/$1' already exists."
  exit 2
fi

if [ ! -d "$work/git/git_lfs_test_data/" ]; then
  help "Error: '$work/git/git_lfs_test_data/' does not exist."
fi

doit "$1"

Installation instructions are provided in Git LFS Scripts.

You will see this script in action when the Git LFS servers are evaluated in the follow-on articles.

Scenarios Considered

The following scenarios were considered for this evaluation.
Reviewers: I am most interested in your feedback about the value, relevance and feasibility of the scenarios. Please let me know of additional scenarios that might be of interest, along with suggested use cases.

No Git LFS Server

Git Server	Scenario	Git LFS Protocol	Use Cases
None (bare repo)	1	local	Not functional. Requires networking External access is not required Provides ultra security Supports ultra-large files
None (bare repo)	2	SSH	Can provide strong security Supports ultra-large files Supports external access Setup can be simple or complex
GitHub	3	local	External access is not required Supports ultra-large files Simplest setup if network is already set up
	4	SSH	Can provide strong security Supports ultra-large files Supports external access Setup can be simple or complex
	5	`http`	Requires DNS setup Requires a web server on the internal network The gateway provides the only security; no other authentication is possible when accessing the internal network Supports external access

Git LFS Test Server

Provides its own web server, which supports http and https. No other protocols are supported.

Git Server	Scenario	Git LFS Protocol	Use Cases
None (bare repo)	6	`http`	Supports high security Supports ultra-large files Provides its own web server No authentication desired for internal network Routable network traffic desired
GitHub	7	`http`	Provides its own web server No authentication desired for internal network Routable network traffic desired

Giftless

Git Server	Scenario	Git LFS Protocol	Use Cases
None (bare repo)	8	local	Ultra security ultra-large files Simple setup where network is already in place Routable network traffic not desired
None (bare repo)	9	SSH	Ultra security ultra-large files Simple setup
GitHub	10	local	Ultra security ultra-large files Simple setup where network is already in place Routable network traffic not desired
GitHub	11	SSH	Ultra security ultra-large files Simple setup
GitHub	12	`http`	No authentication desired; gateway provides the only security Routable network traffic desired

Rudolfs

Git Server	Scenario	Git LFS Protocol	Use Cases
None (bare repo)	13	local	Ultra security ultra-large files Simple setup where network is already in place
None (bare repo)	14	SSH	Ultra security ultra-large files Simple setup
GitHub	15	local	Ultra security ultra-large files Simple setup where network is already in place
GitHub	16	SSH	Ultra security ultra-large files Simple setup
GitHub	17	`http`	No authentication desired; gateway provides the only security Routable network traffic desired

I have published 8 articles about the Git large file system (LFS). They are meant to be read in order.

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

Mainframe image; Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License by PekoeBlaze

© Copyright 1994-2025 Michael Slinn. All rights reserved.
For requests to use this copyright-protected work in any manner, email mslinn@mslinn.com.

This website was made using Jekyll and Mike Slinn’s Jekyll Plugins.