Mike Slinn
Mike Slinn

Working With Git Repos Using Ruby's Rugged Gem

Published 2023-03-11. Last modified 2023-03-23.
Time to read: 6 minutes.

This page is part of the git collection, categorized under Git, Open Source, Ruby.

This article builds on the preceding Introdution to libgit2 article. It demonstrates how to work with the Ruby wrapper around libgit2, called rugged.

Installing Rugged

The rugged gem needs to be built from source. However, the gem has dependencies that need to be installed first. Here are the official installation instructions. On Ubuntu / WSL, install the the dependencies like this:

Shell
$ yes | sudo apt install libgit2-dev cmake pkg-config

Now you can pull down any Ruby project that uses rugged, run bundle install within the project directory, and the project should be setup.

Otherwise, provided that you have installed the prerequisites, but you want to play with rugged in irb, you could manually install rugged on Linux/WSL like this:

Shell
$ gem install rugged

The above will take a while. Do something else to pass the time. Learn to juggle. Practise your guitar repetoire. Write poetry. Make love, not war. Go for a walk.

Eventually, you will see something like:

Installing rugged 1.6.2 with native extensions
Bundle complete! 55 Gemfile dependencies, 148 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

Exploring Rugged With irb

Just for fun, let’s look up the commit for the annotated tag called v1.5.1 in the git repository for rugged itself using the rugged Ruby gem. Digital navel gazing is always popular amongst software authors.

Before running the following code, I had previously cloned the rugged git repository into /mnt/c/work/ruby/rugged. This code opens the git repository so it can be queried and manipulated.

Shell
$ irb
irb(main):001:0> require 'rugged'
=> true 
irb(main):003:0> repo = Rugged::Repository.new('/mnt/c/work/ruby/rugged') => #<Rugged::Repository:74460 {path: "/mnt/c/work/ruby/rugged/.git/"}>

We can use a typical Ruby idiom to obtain the commit for the annotated tag called v1.5.1. The following code iterates through all of the git repository’s annotated tags, returning the first one that has the desired name.

Git’s data structures are designed for efficient iteration. At most a few hundred annotated tags would normally be expected in any git repository, so the following iteration is quick and efficient.

irb continued
irb(main):004:0> tag = repo.tags.find { |tag| tag.name == 'v1.5.1' }
=>
  #<Rugged::Tag:81940 {name: "v1.5.1", target: #<Rugged::Commit:81960 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81980 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}>
  <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5>
  <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be>
  <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a>
  <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab>
  <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6>
  <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4>
  <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d>
  , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1", "9d5978bba108785feb5626a4f01bc860791985aa"]}>}> 

Now we can obtain the commit for the annotated tag.

irb continued
irb(main):004:0> commit = tag&.target
=>
  #<Rugged::Commit:81360 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81380 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}>
  <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5>
  <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be>
  <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a>
  <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab>
  <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6>
  <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4>
  <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d>
  , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1",
    "9d5978bba108785feb5626a4f01bc860791985aa"]}> 

Ruby’s safe navigation operator, &., prevents a null pointer exception if no matching tag was found. Because we used the safe navigation operator, the value of the variable commit will either be the Rugged::Commit for the annotated tag called v1.5.1, or nil if desired tag was not found.

Both of the above operations can be combined into one statement.

irb continued
irb(main):004:0> repo.tags.find { |tag| tag.name == 'v1.5.1' }&.target
=>
  #<Rugged::Commit:81360 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81380 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}>
  <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5>
  <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be>
  <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a>
  <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab>
  <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6>
  <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4>
  <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d>
  , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1",
    "9d5978bba108785feb5626a4f01bc860791985aa"]}> 

There are other ways to obtain the desired annotated tag using rugged. The Swiss Army knife of the git CLI, git-rev-parse, can do this task, as can its rugged equivalent, Rugged::Repository.rev_parse. We will use that method later in this article.

Read File at a Ref

Following are two examples that show how to read a file’s contents at a given ref.

Reading the Hard Way

This approach has limitations on how the ref can be expressed, and several steps are required. This example is good for understanding how to work with libgit2 and rugged. In the next example, I show a shorter and more capable way of accomplishing the same thing, using rugged’s version of git’s Swiss Army knife, Rugged::Repository.rev_parse.

First, we need to open the previously cloned repo using rugged.

Shell
$ irb
irb(main):001:0> require 'rugged'
=> true 

irb(main):002:0> repo = Rugged::Repository.new('.')
=> ##<Rugged::Repository:64660 {path: "/var/work/.git/"}> 

Now that the variable repo is initialized, we can look up the reference in the git repository using rugged. Here I use HEAD, but any valid ref could be used, including tag names and SHAs, however incantations such as HEAD~2 are not supported. We will overcome that limitation in the next example.

irb continued
irb(main):003:0> reference_symbolic = repo.ref 'HEAD'
=> #<Rugged::Reference:2222640 {name: "HEAD", target: #<Rugged::Reference:2222660 {name: "refs/heads/master", target: #<Rugged::Commit:2222680 {message:... 

irb(main):004:0> reference_symbolic.type
=> :symbolic 

irb(main):005:0> reference_symbolic.name
=> "HEAD" 

irb(main):006:0> reference_symbolic.target_id
=> "refs/heads/master" 

The reference_symbolic variable above is a Rugged::Reference. More specifically, it is a symbolic reference. The target_id property of a symbolic reference returns the fully qualified reference, refs/heads/master.

There are two types of Rugged::References: symbolic and direct. You need a direct Rugged::Reference in order to perform many operations. Symbolic references can be dereferenced into direct references by obtaining the target property.

irb continued
irb(main):007:0> reference_direct = reference_symbolic.target
=> #<Rugged::Reference:2247160 {name: "refs/heads/master", target: #<Rugged::Commit:2247180 {message: "-\n", tree: #<Rugged::Tree:2247200 {oid: 8d4e34ae... 

irb(main):008:0> reference_direct.name
=> "refs/heads/master" 

irb(main):009:0> reference_direct.type
=> :direct 

For direct references, the target property returns the Rugged::Commit, and the target_id property returns the SHA of the commit.

irb continued
irb(main):010:0> reference_direct.target_id
=> "ef3d5c5eb6a4ccbcde5f2b5c01891baf4c478fcb" 

irb(main):011:0> commit = reference_direct.target
=>
#<Rugged::Commit:2234520 {message: "-\n", tree: #<Rugged::Tree:2234540 {oid: 8d4e34ae64135eb024b337420adeac27effeec2e}>
... 

irb(main):011:0> commit.type
=> :commit 

irb(main):093:0> commit.oid
=> "ef3d5c5eb6a4ccbcde5f2b5c01891baf4c478fcb" 

Note that commit.oid (object id) is the same as reference_direct.target_id.

Now that we have the commit for the desired snapshot that contains the file of interest, we can obtain the directory tree, and then get the entry for the file.

irb continued
irb(main):0013:0> tree = commit.tree
=>
#<Rugged::Tree:1229460 {oid: 8d4e34ae64135eb024b337420adeac27effeec2e}>
... 

irb(main):014:0> entry = tree.get_entry 'index.html'
=> {:name=>"index.html", :oid=>"5520174cb429b79df468b7b50153702102cd98e0", :filemode=>33188, :type=>:blob} 

We need the SHA of the file entry. This will allow us to read the file object, of type Rugged::OdbObject, from the git repository. This is actually a blob.

irb continued
irb(main):015:0> sha = entry[:oid]
=> "5520174cb429b79df468b7b50153702102cd98e0" 

irb(main):016:0> object = repo.read sha
=> #<Rugged::OdbObject:0x00007fbcbe07f800> 

irb(main):017:0> object.type
=> :blob 

Finally, we can read the contents of the blob/file by accessing its data property.

irb continued
irb(main):018:0> content = object.data
=> "---\ndescription: \"blah blah...\"" 
😁

That was interesting, but there is a more direct way!

A Better Way to Read a File

This example is simpler than the previous example, plus it works with any valid ref, such as HEAD~2, including revision syntax.

We start this example exactly the same as before:

Shell
$ irb
irb(main):001:0> require 'rugged'
=> true 

irb(main):002:0> repo = Rugged::Repository.new('.')
=> ##<Rugged::Repository:64660 {path: "/var/work/.git/"}> 

The reason this example is simpler and more flexible is due to the use of Rugged::Repository.rev_parse, instead of Rugged::Repository.ref. Just like the git-rev-parse command, the enhanced ref syntax that the rev-parse method supports is called revision syntax.

This enhanced syntax enables requests like: “Get me README.md as it was on the master branch on October 15”, which can be written as: master@{'Oct 15'}:README.md.

irb continued
irb(main):003:0> blob = repo.rev_parse "master@{'Oct 15'}:README.md"
=> #<Rugged::Blob:0x00007fbcbe094ef8 @owner=#<Rugged::Repository:62140 {path: "/var/sitesUbuntu/www.mslinn.com/.git/"}>> 

irb(main):005:0> puts blob.content
... content of README.md appears here ... 
😁

Done!

BTW, rev_parse returns a Rugged::Commit instead of a Rugged::Blob if the filename is not specified. You would then need to obtain the blob from the commit, like this:

irb continued
irb(main):003:0> commit = repo.rev_parse "master@{'Oct 15'}"
=>
#<Rugged::Commit:2545160 {message: "-\n", tree: #<Rugged::Tree:2545180 {oid: 17afab2ed624809a8f52ce5537b737b99d8238d1}> 

irb(main):004:0> blob = repo.blob_at(commit.oid, 'README.md')
=> #<Rugged::Blob:0x00007fbcbe094ef8 @owner=#<Rugged::Repository:62140 {path: "/var/sitesUbuntu/www.mslinn.com/.git/"}>> 

irb(main):005:0> puts blob.content
... content of README.md appears here ... 

Diff-Centric Dump of Changes

This code example utilizes the hierarchy of terms defined in the Terminology section of the Introduction to libgit2 article.

Following is a simple Ruby program that uses rugged v1.5.1 to dump out all the changes to a repository between HEAD and the previous commit. This is a diff-centric approach; all changes in every file associated with a diff are dumped.

Diff-centric Ruby/rugged program
require 'rugged'
repo = Rugged::Repository.new('.') head = repo.head target = head.target # Rugged::Commit parent = head.target.parents.first # Rugged::Commit puts "head.target.oid: #{target.oid}" puts "parents.first.oid: #{parent.oid}" puts "commit message: #{parent.message}" head.target.diff(head.target.parents.first).each_patch do |patch| delta = patch.delta # Rugged::Diff::Delta new_path = delta.new_file[:path] patch.each_hunk do |hunk| hunk.each_line do |line| sign = line.addition? ? '+' : '-' lineno = line.new_lineno.to_s.rjust(4, ' ') puts "#{sign} #{new_path}:#{lineno} #{line.content}" end end end

I ran the above program on git itself. Here is the output as of 2023-03-07:

head.target.oid: d15644fe0226af7ffc874572d968598564a230dd
parents.first.oid: ef7d4f53c2fd9e8186d093dea6d45a91ce57110e
commit message: Git 2.40-rc1\n\nSigned-off-by: Junio C Hamano <gitster@pobox.com>\n
- range-diff.c: 383 \tconst char *color_new = diff_get_color_opt(diffopt, DIFF_FILE_NEW);\n
- range-diff.c: 384 \tconst char *color_commit = diff_get_color_opt(diffopt, DIFF_COMMIT);\n
- range-diff.c: 385 \tconst char *color;\n
- range-diff.c:  -1 \tint abbrev = diffopt->abbrev;\n
+ range-diff.c:  -1 \tchar abbrev = diffopt->abbrev;\n
- range-diff.c: 387 \n
- range-diff.c: 388 \tif (abbrev < 0)\n
- range-diff.c: 389 \t\tabbrev = DEFAULT_ABBREV;\n

A Disappointing Example

One of my pet peeves about open source software is that the documentation is often substandard and out-of-date. Rugged suffers from this issue.

Few complete examples of how to accomplish a task are available, and most of them do not work anymore because the library has evolved so much. Maintaining documentation is not something that most open-source software developers care about. That is a shame, because without good documentation few people are able to use such a library.

Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.

Following is an revised and annotated example of a program that can be found in the rugged documentation. Unfortunately, it is not properly explained, and thus what exactly it does can be elusive. I spent the time to figure it out. Hopefully this explanation will be helpful to you, dear reader.

To aid your learning experience I first use irb to present the concepts, then I present a similar working program.

Irb Exploration

First lets make a new directory in /tmp/test, then initialize the directory as a new bare git repository by calling Rugged::Repository.init_at as shown below. The contents are then listed.

irb equivalent
$ mkdir /tmp/test
$ cd /tmp/test
$ irb irb(main):001:0> require 'rugged' => true
irb(main):002:0> repo = Rugged::Repository.init_at('.', :bare) => #<Rugged::Repository:67820 {path: "/tmp/test/"}>
irb(main):003:0> puts Dir['./*'].each { |x| puts x } ./HEAD ./config ./description ./hooks ./info ./objects ./refs => nil

Right now we have an empty bare repository. We can see that no files have been stored as blob objects in the repo so far because nothing has been hashed yet:

irb equivalent, continued
irb(main):004:0> puts Dir['./objects/**/*'].each { |x| puts x }
./objects
./objects/info
./objects/pack
=> nil 

Normally when working with the git CLI, one would:

  1. Create a file
  2. Git add it to the staging area (also known as the index)
  3. Commit the index

Instead, the example skips the file creation process without any mention of doing so.

So that this code example has at least a semi-plausible motivation, imagine you work for AntiGit (a mythical GitHub competitor) – and you are designing the git repository server architecture. In that case, you might write code that utilizes a bare git repository. Otherwise, you probably do not want to work with a bare git repository.

... Anyway, in this example, an arbitrary string is hashed and stored into the git database, just to demonstrate that this can be done. The programmer, knowing that they were working with a bare git repository, pretended that the hash represented the contents of a file called newfile.txt. The last command displays the files in objects/ directory again:

irb equivalent, continued
irb(main):005:0> oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob)
=> "fad0c411fb5b035b58be3116f03c8bf76a1ae760" 
irb(main):006:0> puts Dir['./objects/**/*'].each { |x| puts x } ./objects/fa ./objects/fa/d0c411fb5b035b58be3116f03c8bf76a1ae760 ./objects/info ./objects/pack => nil

Now we see that a blob object was written to the object database. However, it has not been entered into the index. That would never happen when using git add, because the ‘porcelain’ command line provides atomic operations. Instead, when using the low-level plumbing API, you must perform all the low-level steps, one by one.

This is how to add the blob to the index:

irb equivalent, continued
irb(main):007:0> index.add path: 'newfile.txt', oid: oid, mode: 0o0100644
=> "fad0c411fb5b035b58be3116f03c8bf76a1ae760" 
irb(main):008:0> curr_tree = repo.index.write_tree(repo) => "4b825dc642cb6eb9a060e54bf8d69288fbee4904"

Now we can commit to the local git repository:

irb equivalent, continued
irb(main):09:0> author = {
  :email=>"mslinn@mslinn.com",
  :time=>Time.now,
  :name=>"Mike Slinn"
}
=> {:email=>"mslinn@mslinn.com", :time=>2023-03-19 20:34:05.588255505 -0400, :name=>"Mike Slinn"} 
irb(main):008:0> new_commit = Rugged::Commit.create repo, author: author, message: 'This is a commit message.', parents: [], tree: curr_tree, update_ref: 'HEAD' => "7d3e484d4f136a9f1298f794dc43b0ebabc91d57"

We still have not pushed to a remote ... maybe I’ll write about that topic one day.

Similar Ruby Program

In summary, the following program creates files directly in the git stage and commits them. The ‘files’ are never physically created, thus this example does not utilize a git working tree — it is a bare repository. This becomes problematic if you attempt to mix the effects of the Ruby code with git commands. In general, there is little benefit in working this way, unless you work for or compete against GitHub.

require 'rugged'
require 'tmpdir'

author = { email: 'email@email.com', time: Time.now, name: 'username' }

Dir.mktmpdir do |temp_dir| # This directory will be deleted on exit
  Dir.chdir(temp_dir)
  puts "Working in #{temp_dir}"

  repo = Rugged::Repository.init_at temp_dir
  index = repo.index

  # Stage a file (newfile.txt) that does not actually exist.
  oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob)
  index.add path: 'newfile.txt', oid: oid, mode: 0o0100644
  curr_tree = index.write_tree(repo)
  new_commit = Rugged::Commit.create repo,
      author:     author,
      message:    "This is a commit message.",
      committer:  author,
      parents:    repo.empty? ? [] : [repo.head.target].compact,
      tree:       curr_tree,
      update_ref: 'HEAD'
  puts "Commit #{new_commit[0..5]} created.\n\n"

  # The git CLI shows newfile.txt as a deleted file:
  #   $ git status
  #   On branch master
  #   Changes to be committed:
  #     (use "git restore --staged <file>..." to unstage)
  #           deleted:    newfile.txt
  puts `git status`

  files = Dir["*"]
  if files.reject { |x| x == '.git' }.empty?
    puts "No physical files were created."
  else
    puts "Physical files are:\n  #{files.join("  \n")}"
  end
end
Typical output
Working in /tmp/d20230311-1208322-ovp16x
Commit fab118b9 created.

On branch master
Changes to be committed:
  (use "git restore --staged ..." to unstage)
	deleted:    newfile.txt

No physical files were created.