Git and libgit2

Git Directory Tree Operations

Published 2021-04-10. Last modified 2023-06-01.
Time to read: 5 minutes.

This page is part of the git collection.

I have several trees of git repositories, grouped into subdirectories. The total number of repositories in in the hundreds. Here is a sanitized depiction of one of my git directory trees:

Directory tree
├── cadenzaHome
│   ├── cadenzaAssets
│   ├── cadenzaCode
│   │   ├── cadenzaClient
│   │   ├── cadenzaCourseCode
│   │   ├── cadenzaDependencies
│   │   ├── cadenzaLibs
│   │   ├── cadenzaServer
│   │   ├── cadenzaServerNext
│   │   └── cadenzaSupport
│   ├── cadenzaCreative
│   │   └── cadenzaCreativeTemplates
│   ├── cadenzaCreativeBackup
│   └── cadenzaCurriculum
├── django
│   ├── django
│   ├── django-oscar
│   ├── frobshop
│   ├── main
│   └── oscar
├── jekyll
│   ├── jekyllTemplate
│   └── jekyll-flexible-include-plugin

Some git repos are forks, and I defined upstream git remotes for them, in addition to the usual origin remote.

Two Use Cases

This article discusses the two use cases for git-tree, a Ruby gem I wrote to help me work efficiently with all those projects.

Directories containing a file called .ignore are ignored.

The source code for git-tree is in this GitHub repo. The gem is published on RubyGems.org.

Use Case: Dependent Gem Maintenance

One of my directory trees holds Jekyll plugins, packaged as 25 gems. They depend on one another, and must be built in a particular order. Sometimes an operation must be performed on all of the plugins, and then rebuild them all.

Most operations do not require that the projects be processed in any particular order, however the build process must be invoked on the dependencies first. It is quite tedious to do this 25 times, over and over.

Several years ago I wrote a bash script to perform this task, but as its requirements became more complex, the bash script proved difficult to maintain. This use case is now fulfilled by the git-tree-exec command provided by the git_tree gem.

Use Case: Replicating Trees of Git Repositories

Whenever I set up an operating system for a new development computer, one of the tedious tasks that must be performed is to replicate the directory trees of git repositories.

It is a bad idea to attempt to copy an entire git repository between computers, because the .git directories within them can quite large. So large, in fact, that it might much more time to copy than re-cloning.

The reason is that copying the entire git repo actually means copying the same information twice: first the .git hidden directory, complete with all the history for the project, and then again for the files in the currently checked out branch. Git repos store the entire development history of the project in their .git directories, so as they accumulate history they eventually become much larger than the code that is checked out at any given time.

One morning I found myself facing the boring task of doing this manually once again. Instead, I wrote a bash script that scanned a git directory tree and wrote out another bash script that clones the repos in the tree. Any additional remote references are replicated.

Two years later, I decided to add new features to the script. Bash is great for short scripts, but it is not conducive to debugging or structured programming. I rewrote the bash script in Ruby, using the rugged gem. Much better!

This use case is fulfilled by the git-tree-replicate and git-tree-evars commands provided by the git_tree gem.

Installing Git_tree

To install git_tree:

  1. Set up Ruby.
  2. Working With Git Repos Using Ruby’s Rugged Gem explains that rugged needs to be built from source so the ssh library is included. Your system will need to have cmake and associated fiddly bits installed in order for the build to succeed. On Ubuntu, type:
    Shell
    $ yes | sudo apt install cmake libgit2-dev libssh2-1-dev pkg-config
  3. Now you can install rugged with ssh support:
    Shell
    $ gem install git_tree
    Thanks for installing git_tree!
    Successfully installed git_tree-0.2.1 Parsing documentation for git_tree-0.2.1 Done installing documentation for git_tree after 0 seconds 1 gem installed

To register the new commands, either log out and log back in, or open a new console.

You should now have two new shell commands: git-tree-replicate and git-tree-evars.

Both these commands require only one parameter: an environment variable reference, pointing to the top-level directory to examine. The environment variable reference must be contained within single quotes to prevent expansion by the shell.

Using git-tree-exec

The command requires two parameters. The first parameter indicates the directory or directories to process. 3 forms are accepted:

  1. A directory name, which may be relative or absolute.
  2. An environment variable reference.
  3. A list of directory names, which may be relative or absolute, and may contain environment variables.

Example 1

For all projects listed, update Gemfile.lock and install a local copy of the gem.

Use this format when the order that projects are processed matters.

No output is displayed for the 3 commands chained together with && until they have all completed.

Shell
$ git-tree-exec '
  $jekyll_plugin_logger
  $jekyll_draft
  $jekyll_plugin_support
  $jekyll_all_collections
  $jekyll_plugin_template
  $jekyll_flexible_include_plugin
  $jekyll_href
  $jekyll_img
  $jekyll_outline
  $jekyll_plugin_template
  $jekyll_pre
  $jekyll_quote
  ' 'bundle && bundle update && bundle rake install'

Similarly, to release a new set of related plugins, I provide the same command as above, but for the program to execute I pass rake release.

Shell
$ git-tree-exec '
  $jekyll_plugin_logger
  $jekyll_draft
  $jekyll_plugin_support
  $jekyll_all_collections
  $jekyll_plugin_template
  $jekyll_flexible_include_plugin
  $jekyll_href
  $jekyll_img
  $jekyll_outline
  $jekyll_plugin_template
  $jekyll_pre
  $jekyll_quote
  ' 'bundle rake release'

Example 2

This example shows how to display the version of projects that create gems under the directory pointed to by $my_plugins.

An executable script is required on the PATH, so git-tree-exec can invoke it as it loops through the subdirectories. I call this script version, and it is written in bash, although the language used is not significant:

#!/bin/bash

x="$( ls lib/**/version.rb 2> /dev/null )"
if [ -f "$x" ]; then
  v="$(
    cat "$x" | \
    grep '=' | \
    sed -e s/.freeze// | \
    tr -d 'VERSION =\"' | \
    tr -d \'
  )"
  echo "$(basename $PWD) v$v"
fi

Invoke the version script for each project under $my_plugins/ as shown below. In general, it is a good idea to enclose variable name references within double quotes, unless you are sure that there will never be a space in a file path.

Shell
$ git-tree-exec "$my_plugins" version
jekyll_all_collections v0.3.3
jekyll_archive_create v1.0.2
jekyll_archive_display v1.0.1
jekyll_auto_redirect v0.1.0
jekyll_basename_dirname v1.0.3
jekyll_begin_end v1.0.1
jekyll_bootstrap5_tabs v1.1.2
jekyll_context_inspector v1.0.1
jekyll_download_link v1.0.1
jekyll_draft v1.1.2
jekyll_flexible_include_plugin v2.0.20
jekyll_from_to_until v1.0.3
jekyll_href v1.2.5
jekyll_img v0.1.5
jekyll_nth v1.1.0
jekyll_outline v1.2.0
jekyll_pdf v0.1.0
jekyll_plugin_logger v2.1.1
jekyll_plugin_support v0.7.0
jekyll_plugin_template v0.3.0
jekyll_pre v1.4.1
jekyll_quote v0.4.0
jekyll_random_hex v1.0.0
jekyll_reading_time v1.0.0
jekyll_revision v0.1.0
jekyll_run v1.0.1
jekyll_site_inspector v1.0.0
jekyll_sort_natural v1.0.0
jekyll_time_since v0.1.3 

Example 3

List the projects under the directory pointed to by $my_plugins that have a demo/ subdirectory.

Shell
$ git-tree-exec "$my_plugins" \
  'if [ -d demo ]; then realpath demo; fi'
/mnt/c/work/jekyll/my_plugins/jekyll-hello/demo
/mnt/c/work/jekyll/my_plugins/jekyll_all_collections/demo
/mnt/c/work/jekyll/my_plugins/jekyll_archive_create/demo
/mnt/c/work/jekyll/my_plugins/jekyll_download_link/demo
/mnt/c/work/jekyll/my_plugins/jekyll_draft/demo
/mnt/c/work/jekyll/my_plugins/jekyll_flexible_include_plugin/demo
/mnt/c/work/jekyll/my_plugins/jekyll_from_to_until/demo
/mnt/c/work/jekyll/my_plugins/jekyll_href/demo
/mnt/c/work/jekyll/my_plugins/jekyll_img/demo
/mnt/c/work/jekyll/my_plugins/jekyll_outline/demo
/mnt/c/work/jekyll/my_plugins/jekyll_pdf/demo
/mnt/c/work/jekyll/my_plugins/jekyll_plugin_support/demo
/mnt/c/work/jekyll/my_plugins/jekyll_plugin_template/demo
/mnt/c/work/jekyll/my_plugins/jekyll_pre/demo
/mnt/c/work/jekyll/my_plugins/jekyll_quote/demo
/mnt/c/work/jekyll/my_plugins/jekyll_revision/demo
/mnt/c/work/jekyll/my_plugins/jekyll_time_since/demo 

Using git-tree-replicate

The following creates a script in the current directory called work.sh and makes it executable. The script replicates the desired portions of the directory tree of git repos under $work:

Shell
$ git-tree-replicate '$work' > work.sh

$ chmod a+x work.sh

When git-tree-replicate completes, copy the generated script to the target machine and run it from the new top-level directory. The following example copies the script to your user directory to machine2 while preserving the execute bit, then runs it:

Shell
$ scp -p work.sh machine2:
work.sh                          100%   12KB   1.3MB/s   00:00 

$ ssh machine2
Welcome to Ubuntu 23.04 (GNU/Linux 6.2.0-20-generic x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/advantage

0 updates can be applied immediately.

Last login: Tue May 23 10:11:14 2023 from 192.168.1.102 

$ # Install cmake and rugged on this machine if you have not already done so:
# yes | sudo apt install cmake libgit2-dev libssh2-1-dev pkg-config
# gem install git_tree

$ cd $work

$ ~/work.sh

Here is the output generated for the directory tree shown at the top of this article:

Shell
$ git-tree-replicate '$work'
if [ ! -d "cadenzaHome/cadenzaCreative/cadenzaCreativeTemplates/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCreative'
  pushd 'cadenzaHome/cadenzaCreative' > /dev/null
  git clone 'git@github.com:mslinn/cadenzaCreativeTemplates.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCreative/cadenzaAssets/.git" ]; then
  mkdir -p 'cadenzaHome'
  pushd 'cadenzaHome' > /dev/null
  git clone 'git@github.com:mslinn/cadenzaAssets.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaSupport/dottyTemplate/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode/cadenzaSupport'
  pushd 'cadenzaHome/cadenzaCode/cadenzaSupport' > /dev/null
  git clone 'git@github.com:mslinn/dottyTemplate.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaLibs/scalacourses-play-utils/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode/cadenzaLibs'
  pushd 'cadenzaHome/cadenzaCode/cadenzaLibs' > /dev/null
  git clone 'git@github.com:mslinn/scalacourses-play-utils.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaLibs/scalacourses-utils/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode/cadenzaLibs'
  pushd 'cadenzaHome/cadenzaCode/cadenzaLibs' > /dev/null
  git clone 'git@github.com:mslinn/scalacourses-utils.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaserver/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode'
  pushd 'cadenzaHome/cadenzaCode' > /dev/null
  git clone 'git@bitbucket.org:mslinn/cadenzaserver.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore/course_scala_intro_code/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore'
  pushd 'cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore' > /dev/null
  git clone 'ssh://git@bitbucket.org/mslinn/course_scala_intro_code.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore/course_scala_intermediate_code/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore'
  pushd 'cadenzaHome/cadenzaCode/cadenzaCourseCode/ScalaCourses.com/group_scalaCore' > /dev/null
  git clone 'git@bitbucket.org:mslinn/course_scala_intermediate_code.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCode/cadenzaClient/.git" ]; then
  mkdir -p 'cadenzaHome/cadenzaCode'
  pushd 'cadenzaHome/cadenzaCode' > /dev/null
  git clone 'git@github.com:mslinn/cadenzaClient.git'
  popd > /dev/null
fi

if [ ! -d "cadenzaHome/cadenzaCurriculum/.git" ]; then
  mkdir -p 'cadenzaHome'
  pushd 'cadenzaHome' > /dev/null
  git clone 'git@github.com:mslinn/cadenzaCurriculum.git'
  popd > /dev/null
fi

if [ ! -d "jekyll/jekyllTemplate/.git" ]; then
  mkdir -p 'jekyll'
  pushd '/var/work' > /dev/null
  git clone 'git@github.com:mslinn/jekyllTemplate.git'
  popd > /dev/null
fi

if [ ! -d "django/django-oscar/.git" ]; then
  mkdir -p 'django'
  pushd 'django' > /dev/null
  git clone 'git@github.com:mslinn/django-oscar.git'
  cd "django-oscar"
  git remote add upstream 'git@github.com:django-oscar/django-oscar.git'
  popd > /dev/null
fi

if [ ! -d "django/frobshop/.git" ]; then
  mkdir -p 'django'
  pushd 'django' > /dev/null
  git clone 'git@github.com:mslinn/frobshop.git'
  popd > /dev/null
fi

if [ ! -d "django/django/.git" ]; then
  mkdir -p 'django'
  pushd 'django' > /dev/null
  git clone 'git@github.com:mslinn/django.git'
  cd "django"
  git remote add upstream 'git@github.com:django/django.git'
  popd > /dev/null
fi

if [ ! -d "jekyll/jekyll-flexible-include-plugin/.git" ]; then
  mkdir -p 'jekyll'
  pushd 'jekyll' > /dev/null
  git clone 'git@github.com:mslinn/jekyll-flexible-include-plugin.git'
  cd "jekyll-flexible-include-plugin"
  git remote add upstream 'https://idiomdrottning.org/jekyll-include-absolute-plugin'
  popd > /dev/null
fi

if [ ! -d "jekyll/jekyllTemplate/.git" ]; then
  mkdir -p 'jekyll'
  pushd 'jekyll' > /dev/null
  git clone 'git@github.com:mslinn/jekyllTemplate.git'
  popd > /dev/null
fi 
😁

As you can see, the generated script checks to see if a git repo has already been cloned, and does not attempt to clone it again if so.

Using git-tree-evars

The git-tree-evars command should be run on the target computer. The command requires only one parameter: an environment variable reference, pointing to the top-level directory to replicate. The environment variable reference must be contained within single quotes to prevent expansion by the shell.

The following generated script appends to any script in the $work directory called .evars. The script defines environment variables that point to each git repos $work:

Shell
$ git-tree-evars '$work' >> $work/.evars

Generated Script from git-tree-evars

Following is a sample of environment variable definitions. You are expected to edit it to suit.

Generated script
export work=/mnt/c/work
export ancientWarmth=$work/ancientWarmth/ancientWarmth
export ancientWarmthBackend=$work/ancientWarmth/ancientWarmthBackend
export braintreeTutorial=$work/ancientWarmth/braintreeTutorial
export survey_analytics=$work/ancientWarmth/survey-analytics
export survey_creator=$work/ancientWarmth/survey-creator
export django=$work/django/django
export frobshop=$work/django/frobshop

The environment variable definitions are meant to be saved into a file that is sourced upon boot. While you could place them in a file like ~/.bashrc, the author’s preference is to instead place them in $work/.evars, and add the following to ~/.bashrc:

~/.bashrc snippet
source $work/.evars

Thus, each time you log in, the environment variable definitions will have been re-established. You can therefore change directory to any of the cloned projects, like this:

Shell
$ cd $git_root
$ cd $my_project

Massaging the Generated Script

Lets create a file to work with before saving it:

Shell
$ git-tree-evars '$work' > evars.sh

Remove all paths with the string cadenza from evars.sh, and save as evars_no_cadenza.sh:

Shell
$ cat evars.sh | grep -v cadenza > evars_no_cadenza.sh

Remove all paths with the string cadenza, and the definition for work. Save as evars_no_cadenza_work.sh:

Shell
$ cat evars.sh | \
  grep -v 'cadenza' | \
  grep -v 'work=' | \
  sort > \
  evars_no_cadenza_work.sh

Append the contents of evars_no_cadenza_work.sh to $work/.evars:

Shell
$ cat evars_no_cadenza_work.sh >> $work/.evars
* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.