Published 2023-03-11.
Last modified 2025-09-12.
Time to read: 2 minutes.
git
collection.
I need to keep several hundred git repositories up-to-date.
I have a directory tree of website repos, and a directory tree of code repos.
Updating these trees was tedious until I wrote the initial version of the
update
script back in 2008.
Environment Variables
/etc/environment
is a system-wide configuration file, which is
source
d
every time a user logs in.
It is owned by root
, so your account needs to be a member of the admin
group,
or you will have to use sudo
to modify it.
The /etc/environment
file in all of my systems defines two environment variables:
sites
- Points to the root of the website directory tree
work
- Points to the root of the code project tree
export sites=/var/www export work=/var/work
Now $sites
and $work
will be defined for all users every time they log in.
In addition, I define subordinate environment variables for each project in a file called $work/.evars
export cadenzaHome=$work/cadenzaHome
export cadenzaCode=$cadenzaHome/cadenzaCode
export cadenzaDependencies=$cadenzaCode/cadenzaDependencies
export awslib_scala=$cadenzaDependencies/awslib_scala
export shoppingcart=$cadenzaDependencies/shoppingcart
export clients=$work/clients
export django=$work/django
export msp=$sites/www.mslinn.com
...
$work/.evars
is included by ~/.bashrc
.
source $work/.evars
Switching Directories
The above environment variables allow me to easily move to a git project directory without having to remember where it resides on the computer that I am currently using:
$ cd $clients
$ pwd /var/work/clients
Updating Git Directory Trees
I first wrote a Bash version of a command I called update
,
years later I wrote a multithreaded Ruby version that runs orders of magnitude faster for large directory trees.
I also called this version update
; note that it requires a properly set up
Ruby development environment.
The site
and work
environment variables are used by the update
scripts.
#!/bin/bash # Update all git directories below current directory or specified directory # Skips directories that contain a file called .ignore # See https://stackoverflow.com/a/61207488/553865 if [ "$( curl -sL -w "%{http_code}\n" https://www.github.com -o /dev/null )" != 200 ]; then echo "Cannot connect to GitHub" exit 2 fi HIGHLIGHT="\e[01;34m" NORMAL='\e[00m' export PATH=${PATH/':./:'/:} export PATH=${PATH/':./bin:'/:} #echo "$PATH" if [ -z "$1" ]; then ROOTS="$sites $work" else ROOTS="$@" fi echo "Updating $ROOTS" DIRS="$( find -L $ROOTS -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \) )" echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}" for d in $DIRS; do cd "$d" > /dev/null || exit 2 echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL" git pull cd - > /dev/null || exit 3 done
The Ruby version of update
is waaaay faster than the Bash version!
💕💕💕
#!/usr/bin/env ruby # Multithreaded Ruby script to update all git directories below specified roots. # Uses a maximum of 75% of available processors # Skips directories containing a .ignore file. # Does not prescan to build a stack; instead, scans and spawns threads on the fly. # Uses 'git -C <dir> pull' to avoid thread-unsafe Dir.chdir. # Ensures each repo is only updated once, even if reached by multiple paths. require 'net/http' require 'etc' require 'set' # Limit the number of concurrent git pulls MAX_THREADS = [1, (Etc.nprocessors * 0.75).to_i].max def help puts <<~HELP Usage: #{File.basename($PROGRAM_NAME)} [DIRECTORY ...] Recursively updates all git repositories under the specified DIRECTORY roots. If no directories are given, uses the environment variables 'sites' and 'work' as roots. Skips directories containing a .ignore file. Runs 'git pull' in each repository, using up to #{MAX_THREADS} concurrent threads. Options: -h, --help Show this help message and exit Example: #{File.basename($PROGRAM_NAME)} $sites $work HELP exit 0 end help if ARGV.include?('-h') || ARGV.include?('--help') # Set color codes for highlighting output HIGHLIGHT = "\e[01;34m".freeze NORMAL = "\e[00m".freeze # Determine root directories to scan: use ENV['sites'] and ENV['work'] if no args given, else use args roots = if ARGV.empty? [ENV.fetch('sites', nil), ENV.fetch('work', nil)].compact.flat_map(&:split) else ARGV end puts "Updating #{roots.join(' ')}" puts "#{HIGHLIGHT}Scanning #{Dir.pwd}#{NORMAL}" @semaphore = Mutex.new @active_threads = [] visited = Set.new visited_mutex = Mutex.new def scan_and_update(dir, visited, visited_mutex) return unless File.directory?(dir) return if File.exist?(File.join(dir, '.ignore')) real_dir = begin File.realpath(dir) rescue StandardError dir end visited_mutex.synchronize do # Visit each directory only once return if visited.include?(real_dir) visited.add(real_dir) end if File.exist?(File.join(dir, '.git')) wait_for_thread_and_update(dir) return end begin # Recurse into subdirectories Dir.children(dir).each do |entry| path = File.join(dir, entry) scan_and_update(path, visited, visited_mutex) if File.directory?(path) end rescue SystemCallError => e puts e.message end end def wait_for_thread_and_update(dir) loop do # Wait for a thread slot to be available @semaphore.synchronize do @active_threads.select!(&:alive?) # Clean up finished threads if @active_threads.count < MAX_THREADS thread = Thread.new do puts "\n#{HIGHLIGHT}Updating #{dir}#{NORMAL}" system('git', '-C', dir, 'pull') end @active_threads << thread return end end sleep 0.1 end end roots.each do |root| scan_and_update(File.expand_path(root), visited, visited_mutex) end @active_threads.each(&:join) # Wait for all threads to finish
Here is the help message for the Ruby version:
$ update -h Usage: update [DIRECTORY ...]
Recursively updates all git repositories under the specified DIRECTORY roots. If no directories are given, uses the environment variables 'sites' and 'work' as roots. Skips directories containing a .ignore file. Runs 'git pull' in each repository, using up to 18 concurrent threads.
Options: -h, --help Show this help message and exit
Example: $ update $sites $work
Most of the time I want to update everything in both directory trees, so for that no arguments are required:
$ update Updating /var/www /var/work Updating /var/work/cadenzaHome/cadenzaCode/cadenzaDependencies/awslib_scala Already up to date.
Updating /var/work/cadenzaHome/cadenzaCode/cadenzaDependencies/shoppingcart Already up to date.
...
It is also possible to specify the roots of one or more directory trees of git repositories:
$ update /path/to/another/tree $my_gems $my_plugins Updating /path/to/another/tree /mnt/f/work/ruby/my_gems /mnt/f/work/jekyll/my_plugins
I hope you find the update
scripts to be as useful as I have!