Git and libgit2

Updating Trees of Git Repositories

Published 2023-03-11. Last modified 2025-09-12.
Time to read: 2 minutes.

This page is part of the git collection.

I need to keep several hundred git repositories up-to-date. I have a directory tree of website repos, and a directory tree of code repos. Updating these trees was tedious until I wrote the initial version of the update script back in 2008.

Environment Variables

/etc/environment is a system-wide configuration file, which is sourced every time a user logs in. It is owned by root, so your account needs to be a member of the admin group, or you will have to use sudo to modify it.

The /etc/environment file in all of my systems defines two environment variables:

sites
Points to the root of the website directory tree
work
Points to the root of the code project tree
/etc/environment
export sites=/var/www
export work=/var/work

Now $sites and $work will be defined for all users every time they log in.

In addition, I define subordinate environment variables for each project in a file called $work/.evars

$work/.evars
export cadenzaHome=$work/cadenzaHome
export cadenzaCode=$cadenzaHome/cadenzaCode
export cadenzaDependencies=$cadenzaCode/cadenzaDependencies
export awslib_scala=$cadenzaDependencies/awslib_scala
export shoppingcart=$cadenzaDependencies/shoppingcart
export clients=$work/clients
export django=$work/django
export msp=$sites/www.mslinn.com
... 

$work/.evars is included by ~/.bashrc.

~/.bashrc
source $work/.evars

Switching Directories

The above environment variables allow me to easily move to a git project directory without having to remember where it resides on the computer that I am currently using:

Shell
$ cd $clients
$ pwd /var/work/clients

Updating Git Directory Trees

I first wrote a Bash version of a command I called update, years later I wrote a multithreaded Ruby version that runs orders of magnitude faster for large directory trees. I also called this version update; note that it requires a properly set up Ruby development environment.

The site and work environment variables are used by the update scripts.

#!/bin/bash

# Update all git directories below current directory or specified directory
# Skips directories that contain a file called .ignore
# See https://stackoverflow.com/a/61207488/553865

if [ "$( curl -sL -w "%{http_code}\n" https://www.github.com -o /dev/null )" != 200 ]; then
  echo "Cannot connect to GitHub"
  exit 2
fi

HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'

export PATH=${PATH/':./:'/:}
export PATH=${PATH/':./bin:'/:}
#echo "$PATH"

if [ -z "$1" ]; then
  ROOTS="$sites $work"
else
  ROOTS="$@"
fi

echo "Updating $ROOTS"
DIRS="$( find -L $ROOTS -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \) )"

echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
for d in $DIRS; do
  cd "$d" > /dev/null || exit 2
  echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
  git pull
  cd - > /dev/null || exit 3
done

The Ruby version of update is waaaay faster than the Bash version! 💕💕💕

#!/usr/bin/env ruby

# Multithreaded Ruby script to update all git directories below specified roots.
# Uses a maximum of 75% of available processors
# Skips directories containing a .ignore file.
# Does not prescan to build a stack; instead, scans and spawns threads on the fly.
# Uses 'git -C <dir> pull' to avoid thread-unsafe Dir.chdir.
# Ensures each repo is only updated once, even if reached by multiple paths.

require 'net/http'
require 'etc'
require 'set'

# Limit the number of concurrent git pulls
MAX_THREADS = [1, (Etc.nprocessors * 0.75).to_i].max

def help
  puts <<~HELP
    Usage: #{File.basename($PROGRAM_NAME)} [DIRECTORY ...]

    Recursively updates all git repositories under the specified DIRECTORY roots.
    If no directories are given, uses the environment variables 'sites' and 'work' as roots.
    Skips directories containing a .ignore file.
    Runs 'git pull' in each repository, using up to #{MAX_THREADS} concurrent threads.

    Options:
      -h, --help    Show this help message and exit

    Example:
      #{File.basename($PROGRAM_NAME)} $sites $work
  HELP
  exit 0
end

help if ARGV.include?('-h') || ARGV.include?('--help')

# Set color codes for highlighting output
HIGHLIGHT = "\e[01;34m".freeze
NORMAL    = "\e[00m".freeze

# Determine root directories to scan: use ENV['sites'] and ENV['work'] if no args given, else use args
roots = if ARGV.empty?
          [ENV.fetch('sites', nil), ENV.fetch('work', nil)].compact.flat_map(&:split)
        else
          ARGV
        end

puts "Updating #{roots.join(' ')}"
puts "#{HIGHLIGHT}Scanning #{Dir.pwd}#{NORMAL}"

@semaphore = Mutex.new
@active_threads = []
visited = Set.new
visited_mutex = Mutex.new

def scan_and_update(dir, visited, visited_mutex)
  return unless File.directory?(dir)
  return if File.exist?(File.join(dir, '.ignore'))

  real_dir = begin
    File.realpath(dir)
  rescue StandardError
    dir
  end

  visited_mutex.synchronize do # Visit each directory only once
    return if visited.include?(real_dir)

    visited.add(real_dir)
  end

  if File.exist?(File.join(dir, '.git'))
    wait_for_thread_and_update(dir)
    return
  end

  begin # Recurse into subdirectories
    Dir.children(dir).each do |entry|
      path = File.join(dir, entry)
      scan_and_update(path, visited, visited_mutex) if File.directory?(path)
    end
  rescue SystemCallError => e
    puts e.message
  end
end

def wait_for_thread_and_update(dir)
  loop do # Wait for a thread slot to be available
    @semaphore.synchronize do
      @active_threads.select!(&:alive?) # Clean up finished threads
      if @active_threads.count < MAX_THREADS
        thread = Thread.new do
          puts "\n#{HIGHLIGHT}Updating #{dir}#{NORMAL}"
          system('git', '-C', dir, 'pull')
        end
        @active_threads << thread
        return
      end
    end
    sleep 0.1
  end
end

roots.each do |root|
  scan_and_update(File.expand_path(root), visited, visited_mutex)
end

@active_threads.each(&:join) # Wait for all threads to finish

Here is the help message for the Ruby version:

Shell
$ update -h
Usage: update [DIRECTORY ...]
Recursively updates all git repositories under the specified DIRECTORY roots. If no directories are given, uses the environment variables 'sites' and 'work' as roots. Skips directories containing a .ignore file. Runs 'git pull' in each repository, using up to 18 concurrent threads.
Options: -h, --help Show this help message and exit
Example: $ update $sites $work

Most of the time I want to update everything in both directory trees, so for that no arguments are required:

Shell
$ update
Updating /var/www /var/work
Updating /var/work/cadenzaHome/cadenzaCode/cadenzaDependencies/awslib_scala
Already up to date.
Updating /var/work/cadenzaHome/cadenzaCode/cadenzaDependencies/shoppingcart Already up to date.
...

It is also possible to specify the roots of one or more directory trees of git repositories:

Shell
$ update /path/to/another/tree $my_gems $my_plugins
Updating /path/to/another/tree /mnt/f/work/ruby/my_gems /mnt/f/work/jekyll/my_plugins 
😁

I hope you find the update scripts to be as useful as I have!

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.