Published 2023-03-11.
Last modified 2023-06-05.
Time to read: 7 minutes.
git
collection.
This article builds on the preceding
Introdution to libgit2
article.
It demonstrates how to work with the Ruby wrapper around libgit2
,
called rugged
.
Installing Rugged
The rugged
gem needs to be built from source.
However, the gem has dependencies that need to be installed first.
The official installation instructions
neglect to mention the prerequisite libssh2-1-dev
library for ssh
support.
On Ubuntu / WSL, install all the dependencies like this:
$ yes | sudo apt install cmake libgit2-dev libssh2-1-dev pkg-config
It is a good idea to check that you do not have a copy of the rugged
installed before continuing,
because by default the version that is installed does not include ssh
support.
$ gem info rugged *** LOCAL GEMS ***
rugged (1.6.3, 1.6.2) Authors: Scott Chacon, Vicent Marti Homepage: https://github.com/libgit2/rugged License: MIT Installed at (1.6.3): /home/mslinn/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0 (1.6.2): /home/mslinn/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0
Rugged is a Ruby binding to the libgit2 linkable library
The above example shows 2 versions of rugged
installed.
Ensure that rugged
was built with ssh
support:
$ ruby -e 'require "rugged"; puts Rugged.features' threads https ssh
If you do not see ssh as one of the features,
you should delete all installed versions of rugged
:
$ gem uninstall rugged Select gem to uninstall: 1. rugged-1.6.2 2. rugged-1.6.3 3. All versions > 3 Successfully uninstalled rugged-1.6.2
You have requested to uninstall the gem: rugged-1.6.3 jekyll_flexible_include-2.0.19 depends on rugged (>= 0) If you remove this gem, these dependencies will not be met. Continue with Uninstall? [yN] y Successfully uninstalled rugged-1.6.3
To include ssh
support by default for rugged
for all projects built using
bundle install
or bundle update
, type:
$ bundle config set --global build.rugged --with-ssh
The above persists the global configuration for bundle
in ~/.bundle/config
:
--- BUNDLE_BUILD__RUGGED: "--with-ssh"
Now you can pull down any Ruby project that uses rugged
,
run bundle install
within the project directory,
and the project should be setup.
If you are writing a program or gem that uses rugged
,
you should also set a local configuration for bundle
within your project.
This will ensure your users will obtain rugged
with the ssh
feature also.
$ bundle config set --local build.rugged --with-ssh
The above persists the bundle
configuration setting for your project,
in .bundle/config
.
Be sure to check in this file as part of your project:
--- BUNDLE_BUILD__RUGGED: "--with-ssh"
Otherwise, provided that you have installed the prerequisite system libraries,
if you want to play with rugged
in irb
,
you could manually install rugged
like this:
$ gem install rugged -- --with-ssh
The above will take a while. Do something else to pass the time. Learn to juggle. Practise your guitar repetoire. Write poetry. Make love, not war. Go for a walk.
Eventually, you will see something like:
Installing rugged 1.6.3 with native extensions
Bundle complete! 55 Gemfile dependencies, 148 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.
Ensure that rugged
was built with ssh
support:
$ ruby -e 'require "rugged"; puts Rugged.features' threads https ssh
It is probably a good idea to put some Ruby code like the following into every Rugged project:
abort "Error: Rugged was not built with ssh support" \ unless Rugged.features.include? :ssh
Exploring Rugged With Irb
Read and Write Global Configuration
The OS user's global git configuration is available as a hash by calling
Rugged::Config.global
.
$ irb irb(main):001:0> require 'rugged' => true
irb(main):002:0> config = Rugged::Config.global => #<Rugged::Config:0x00007f4b2173a078 @owner=nil>
irb(main):003:0> config.each_key { |x| puts x } alias.co branch.master.remote branch.master.merge core.filemode core.autocrlf core.safecrlf core.excludesfile core.pager color.status color.branch color.ui gui.trustmtime push.default push.autosetupremote push.autosetupremote user.name user.email rebase.autostash diff.exif.textconv diff.compactionheuristic diff.colormoved init.defaultbranch pull.rebase fetch.prune %}
Lets use the config
hash we obtained above
to set the default meta information for your diff output
to blue foreground, black background, and bold text.
irb(main):004:0> config['color.diff.meta'] = \
'blue black bold'
The above is equivalent to invoking:
$ git config --global \
color.diff.meta "blue black bold"
The above makes a persistent change because it updates ~/.gitconfig
,
where user global state is stored.
See Chapter 8.1 Customizing Git - Git Configuration
in the Pro Git book for more information.
Working With An Existing Repository
Before running the following code,
I had previously cloned the rugged
git repository into
/mnt/
.
The following code opens that git
repository so it can be queried and manipulated.
irb(main):006:0> repo = Rugged::Repository.new \ '/mnt/c/work/git/rugged' => #<Rugged::Repository:74460 {path: "/mnt/c/work/git/rugged/.git/"}>
Reading Project Configuration
We can easily discover the keys for the hash that contains the configuration of this git repository.
Note that I used curly braces instead of do
/end
because the only reason this was not written as one line,
is due to the need to display the information on small mobile devices.
Normally, this would be written on just one line.
irb(main):003:0> repo.config.each_key { |x| puts x } alias.lol alias.co branch.master.remote branch.master.merge core.filemode core.autocrlf core.safecrlf core.excludesfile core.pager color.status color.branch color.ui gui.trustmtime push.default push.autosetupremote push.autosetupremote user.name user.email rebase.autostash diff.exif.textconv diff.compactionheuristic diff.colormoved init.defaultbranch pull.rebase fetch.prune core.repositoryformatversion core.filemode core.bare core.logallrefupdates core.ignorecase remote.origin.url remote.origin.fetch branch.master.remote branch.master.merge
Now lets discover the value of the user.name
and user.email
configuration settings for this git repository:
irb(main):007:0> repo.config['user.name'] => "Mike Slinn" irb(main):008:0> repo.config['user.email'] => "mslinn@mslinn.com"
As described in The Git Pager, we can turn off the pager for this git repository like this:
irb(main):009:0> repo.config['core.pager'] => "less -F" irb(main):010:0> repo.config['core.pager'] = 'cat' => "cat"
Finding the Commit for a Tag
Just for fun, let’s look up the commit for the annotated tag called v1.5.1
in the git
repository for rugged
itself
using the rugged
Ruby gem.
Digital navel gazing is always popular amongst software authors.
We can use a typical Ruby idiom to obtain the commit for the annotated tag called v1.5.1
.
The following code iterates through
all of the git
repository’s annotated tags,
returning the first one that has the desired name.
Git
’s data structures are designed for efficient iteration.
At most a few hundred annotated tags would normally be expected in any
git
repository, so the following iteration is quick and efficient.
irb(main):011:0> tag = repo.tags.find { |tag| tag.name == 'v1.5.1' } => #<Rugged::Tag:81940 {name: "v1.5.1", target: #<Rugged::Commit:81960 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81980 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}> <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5> <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be> <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a> <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab> <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6> <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4> <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d> , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1", "9d5978bba108785feb5626a4f01bc860791985aa"]}>}>
Now we can obtain the commit for the annotated tag.
irb(main):012:0> commit = tag&.target => #<Rugged::Commit:81360 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81380 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}> <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5> <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be> <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a> <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab> <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6> <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4> <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d> , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1", "9d5978bba108785feb5626a4f01bc860791985aa"]}>
Ruby’s safe navigation operator,
&.
, prevents a null pointer exception if no matching tag was found.
Because we used the safe navigation operator, the value of the variable commit
will either be the
Rugged::Commit
for the annotated tag called v1.5.1
,
or nil
if desired tag was not found.
Both of the above operations can be combined into one statement.
irb(main):013:0> repo.tags.find { |tag| tag.name == 'v1.5.1' }&.target => #<Rugged::Commit:81360 {message: "Merge pull request #945 from libgit2/cmn/bump-libgit2-15\n\nUpdate to v1.5.1", tree: #<Rugged::Tree:81380 {oid: 91fdc2d8b85409686fdb35e4bc380d48164355c3}> <".gitignore" 429849f5370009d62c4293b417edffd8651c6cf5> <"CHANGELOG.md" 700f9627e0b8297896fbfe9d259b124e384825be> <"Gemfile" b6880a0a33e0666d938e49281563fa761b3fd98a> <"LICENSE" 9efa1d8c7baaef8538d6318fde8f560a142594ab> <"README.md" 5484bce591ee114e1f43904a94647b30a71f0dd6> <"Rakefile" 5319630e221c60adc5c1ba9ea1c5b2dc34e9b9e4> <"lib" 8bcbbe21243733f495f852e3fce48b01e332668d> , parents: ["22122185dcf117866c68f34f5bbf50acbbb082e1", "9d5978bba108785feb5626a4f01bc860791985aa"]}>
There are other ways to obtain the desired annotated tag using rugged
.
The Swiss Army knife of the git
CLI,
git-rev-parse
,
can do this task, as can its rugged
equivalent, Rugged::Repository.rev_parse
.
We will use that method later in this article.
Read File at a Ref
Following are two examples that show how to read a file’s contents at a given ref.
Reading the Hard Way
This approach has limitations on how the ref can be expressed, and several steps are required.
This example is good for understanding how to work with libgit2
and rugged
.
In the next example,
I show a shorter and more capable way of accomplishing the same thing,
using rugged
’s version of git
’s Swiss Army knife,
Rugged::
.
First, we need to open the previously clone
d repo using rugged
.
$ irb irb(main):001:0> require 'rugged' => true irb(main):002:0> repo = Rugged::Repository.new('.') => ##<Rugged::Repository:64660 {path: "/var/work/.git/"}>
Now that the variable repo
is initialized,
we can look up the reference in the git repository using rugged
.
Here I use HEAD
, but any valid ref could be used, including tag names and SHAs,
however incantations such as HEAD~2
are not supported.
We will overcome that limitation in the next example.
irb(main):003:0> reference_symbolic = repo.ref 'HEAD' => #<Rugged::Reference:2222640 {name: "HEAD", target: #<Rugged::Reference:2222660 {name: "refs/heads/master", target: #<Rugged::Commit:2222680 {message:... irb(main):004:0> reference_symbolic.type => :symbolic irb(main):005:0> reference_symbolic.name => "HEAD" irb(main):006:0> reference_symbolic.target_id => "refs/heads/master"
The reference_symbolic
variable above is a Rugged::
.
More specifically, it is a symbolic reference.
The target_id
property of a symbolic reference returns the fully qualified reference,
refs/
.
There are two types of Rugged::Reference
s: symbolic
and direct
.
You need a direct
Rugged::Reference
in order to perform many operations.
Symbolic references can be dereferenced into direct references by obtaining the target
property.
irb(main):007:0> reference_direct = reference_symbolic.target => #<Rugged::Reference:2247160 {name: "refs/heads/master", target: #<Rugged::Commit:2247180 {message: "-\n", tree: #<Rugged::Tree:2247200 {oid: 8d4e34ae... irb(main):008:0> reference_direct.name => "refs/heads/master" irb(main):009:0> reference_direct.type => :direct
For direct references,
the target
property returns the Rugged::Commit
,
and the target_id
property returns the SHA of the commit.
irb(main):010:0> reference_direct.target_id => "ef3d5c5eb6a4ccbcde5f2b5c01891baf4c478fcb" irb(main):011:0> commit = reference_direct.target => #<Rugged::Commit:2234520 {message: "-\n", tree: #<Rugged::Tree:2234540 {oid: 8d4e34ae64135eb024b337420adeac27effeec2e}> ... irb(main):011:0> commit.type => :commit irb(main):093:0> commit.oid => "ef3d5c5eb6a4ccbcde5f2b5c01891baf4c478fcb"
Note that commit.oid
(object id) is the same as reference_direct.target_id
.
Now that we have the commit for the desired snapshot that contains the file of interest, we can obtain the directory tree, and then get the entry for the file.
irb(main):0013:0> tree = commit.tree => #<Rugged::Tree:1229460 {oid: 8d4e34ae64135eb024b337420adeac27effeec2e}> ... irb(main):014:0> entry = tree.get_entry 'index.html' => {:name=>"index.html", :oid=>"5520174cb429b79df468b7b50153702102cd98e0", :filemode=>33188, :type=>:blob}
We need the SHA of the file entry.
This will allow us to read the file object, of type Rugged::OdbObject
, from the git repository.
This is actually a blob.
irb(main):015:0> sha = entry[:oid] => "5520174cb429b79df468b7b50153702102cd98e0" irb(main):016:0> object = repo.read sha => #<Rugged::OdbObject:0x00007fbcbe07f800> irb(main):017:0> object.type => :blob
Finally, we can read the contents of the blob/file by accessing its data
property.
irb(main):018:0> content = object.data => "---\ndescription: \"blah blah...\""
That was interesting, but there is a more direct way!
A Better Way to Read a File
This example is simpler than the previous example,
plus it works with any valid ref, such as HEAD~2
, including
revision syntax.
We start this example exactly the same as before:
$ irb irb(main):001:0> require 'rugged' => true irb(main):002:0> repo = Rugged::Repository.new('.') => ##<Rugged::Repository:64660 {path: "/var/work/.git/"}>
The reason this example is simpler and more flexible is due to the use of
Rugged::Repository.rev_parse
,
instead of Rugged::Repository.ref
.
Just like the git-rev-parse
command,
the enhanced ref syntax that the rev-parse
method supports is called revision syntax.
This enhanced syntax enables requests like:
“Get me README.md
as it was on the master
branch on October 15”,
which can be written as:
master@{'Oct 15'}:README.md
.
irb(main):003:0> blob = repo.rev_parse "master@{'Oct 15'}:README.md" => #<Rugged::Blob:0x00007fbcbe094ef8 @owner=#<Rugged::Repository:62140 {path: "/var/sitesUbuntu/www.mslinn.com/.git/"}>> irb(main):005:0> puts blob.content ... content of README.md appears here ...
Done!
BTW, rev_parse
returns a Rugged::Commit
instead of a Rugged::Blob
if the filename is not specified.
You would then need to obtain the blob from the commit, like this:
irb(main):003:0> commit = repo.rev_parse "master@{'Oct 15'}" => #<Rugged::Commit:2545160 {message: "-\n", tree: #<Rugged::Tree:2545180 {oid: 17afab2ed624809a8f52ce5537b737b99d8238d1}> irb(main):004:0> blob = repo.blob_at(commit.oid, 'README.md') => #<Rugged::Blob:0x00007fbcbe094ef8 @owner=#<Rugged::Repository:62140 {path: "/var/sitesUbuntu/www.mslinn.com/.git/"}>> irb(main):005:0> puts blob.content ... content of README.md appears here ...
Diff-Centric Dump of Changes
This code example utilizes the hierarchy of terms defined in the
Terminology
section of the Introduction to libgit2
article.
Following is a simple Ruby program that uses rugged
v1.5.1 to dump
out all the changes to a repository between HEAD
and the previous commit.
This is a diff-centric approach; all changes in every file associated with a diff are dumped.
require 'rugged'
repo = Rugged::Repository.new('.') head = repo.head target = head.target # Rugged::Commit parent = head.target.parents.first # Rugged::Commit puts "head.target.oid: #{target.oid}" puts "parents.first.oid: #{parent.oid}" puts "commit message: #{parent.message}" head.target.diff(head.target.parents.first).each_patch do |patch| delta = patch.delta # Rugged::Diff::Delta new_path = delta.new_file[:path] patch.each_hunk do |hunk| hunk.each_line do |line| sign = line.addition? ? '+' : '-' lineno = line.new_lineno.to_s.rjust(4, ' ') puts "#{sign} #{new_path}:#{lineno} #{line.content}" end end end
I ran the above program on git
itself.
Here is the output as of 2023-03-07:
head.target.oid: d15644fe0226af7ffc874572d968598564a230dd parents.first.oid: ef7d4f53c2fd9e8186d093dea6d45a91ce57110e commit message: Git 2.40-rc1\n\nSigned-off-by: Junio C Hamano <gitster@pobox.com>\n - range-diff.c: 383 \tconst char *color_new = diff_get_color_opt(diffopt, DIFF_FILE_NEW);\n - range-diff.c: 384 \tconst char *color_commit = diff_get_color_opt(diffopt, DIFF_COMMIT);\n - range-diff.c: 385 \tconst char *color;\n - range-diff.c: -1 \tint abbrev = diffopt->abbrev;\n + range-diff.c: -1 \tchar abbrev = diffopt->abbrev;\n - range-diff.c: 387 \n - range-diff.c: 388 \tif (abbrev < 0)\n - range-diff.c: 389 \t\tabbrev = DEFAULT_ABBREV;\n
A Disappointing Example
One of my pet peeves about open source software is that the documentation is often substandard and out-of-date.
Rugged
suffers from this issue.
Few complete examples of how to accomplish a task are available, and most of them do not work anymore because the library has evolved so much. Maintaining documentation is not something that most open-source software developers care about. That is a shame, because without good documentation few people are able to use such a library.
The following is an revised and annotated example of a program that can be found in the
rugged
documentation.
Unfortunately, it is not properly explained, and thus what exactly it does can be elusive.
I spent the time to figure it out.
Hopefully this explanation will be helpful to you, dear reader.
To aid your learning experience I first use irb
to present the concepts, then I present a similar working program.
Irb Exploration
First lets make a new directory in /tmp/test
, then initialize the directory as a new
bare git
repository
by calling Rugged::Repository.init_at
as shown below.
The contents are then listed.
$ mkdir /tmp/test
$ cd /tmp/test
$ irb irb(main):001:0> require 'rugged' => true
irb(main):002:0> repo = Rugged::Repository.init_at('.', :bare) => #<Rugged::Repository:67820 {path: "/tmp/test/"}>
irb(main):003:0> puts Dir['./*'].each { |x| puts x } ./HEAD ./config ./description ./hooks ./info ./objects ./refs => nil
Right now we have an empty bare repository. We can see that no files have been stored as blob objects in the repo so far because nothing has been hashed yet:
irb(main):004:0> puts Dir['./objects/**/*'].each { |x| puts x } ./objects ./objects/info ./objects/pack => nil
Normally when working with the git
CLI, one would:
- Create a file
Git add
it to the staging area (also known as theindex
)- Commit the index
Instead, the example skips the file creation process without any mention of doing so.
So that this code example has at least a semi-plausible motivation,
imagine you work for AntiGit (a mythical GitHub competitor) –
and you are designing the git
repository server architecture.
In that case, you might write code that utilizes a bare git
repository.
Otherwise, you probably do not want to work with a bare git
repository.
... Anyway, in this example, an arbitrary string is hashed and stored into the git
database,
just to demonstrate that this can be done.
The programmer, knowing that they were working with a bare git
repository,
pretended that the hash represented the contents of a file called newfile.txt
.
The last command displays the files in objects/
directory again:
irb(main):005:0> oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob) => "fad0c411fb5b035b58be3116f03c8bf76a1ae760"
irb(main):006:0> puts Dir['./objects/**/*'].each { |x| puts x } ./objects/fa ./objects/fa/d0c411fb5b035b58be3116f03c8bf76a1ae760 ./objects/info ./objects/pack => nil
Now we see that a blob object was written to the object database.
However, it has not been entered into the index
.
That would never happen when using git add
,
because the ‘porcelain’ command line provides atomic operations.
Instead, when using the low-level plumbing API, you must perform all the low-level steps, one by one.
This is how to add the blob to the index
:
irb(main):007:0> index.add path: 'newfile.txt', oid: oid, mode: 0o0100644 => "fad0c411fb5b035b58be3116f03c8bf76a1ae760"
irb(main):008:0> curr_tree = repo.index.write_tree(repo) => "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
Now we can commit to the local git
repository:
irb(main):09:0> author = { :email=>"mslinn@mslinn.com", :time=>Time.now, :name=>"Mike Slinn" } => {:email=>"mslinn@mslinn.com", :time=>2023-03-19 20:34:05.588255505 -0400, :name=>"Mike Slinn"}
irb(main):008:0> new_commit = Rugged::Commit.create repo, author: author, message: 'This is a commit message.', parents: [], tree: curr_tree, update_ref: 'HEAD' => "7d3e484d4f136a9f1298f794dc43b0ebabc91d57"
We still have not pushed to a remote ... maybe I’ll write about that topic one day.
Similar Ruby Program
In summary, the following program creates files directly in the
git stage and commits them.
The ‘files’ are never physically created,
thus this example does not utilize a git
working tree — it is a
bare repository.
This becomes problematic if you attempt to mix the effects of the Ruby code with git commands.
In general, there is little benefit in working this way, unless you work for or compete against GitHub.
require 'rugged' require 'tmpdir' author = { email: 'email@email.com', time: Time.now, name: 'username' } Dir.mktmpdir do |temp_dir| # This directory will be deleted on exit Dir.chdir(temp_dir) puts "Working in #{temp_dir}" repo = Rugged::Repository.init_at temp_dir index = repo.index # Stage a file (newfile.txt) that does not actually exist. oid = repo.write("This blob will be written to the git stage as newfile.txt.", :blob) index.add path: 'newfile.txt', oid: oid, mode: 0o0100644 curr_tree = index.write_tree(repo) new_commit = Rugged::Commit.create repo, author: author, message: "This is a commit message.", committer: author, parents: repo.empty? ? [] : [repo.head.target].compact, tree: curr_tree, update_ref: 'HEAD' puts "Commit #{new_commit[0..5]} created.\n\n" # The git CLI shows newfile.txt as a deleted file: # $ git status # On branch master # Changes to be committed: # (use "git restore --staged <file>..." to unstage) # deleted: newfile.txt puts `git status` files = Dir["*"] if files.reject { |x| x == '.git' }.empty? puts "No physical files were created." else puts "Physical files are:\n #{files.join(" \n")}" end end
Working in /tmp/d20230311-1208322-ovp16x Commit fab118b9 created. On branch master Changes to be committed: (use "git restore --staged..." to unstage) deleted: newfile.txt No physical files were created.