Git and libgit2
Mike Slinn

Low-Level Git Concepts

Published 2023-03-13. Last modified 2023-04-25.
Time to read: 6 minutes.

This page is part of the git collection.

This article discusses low- to high-level git concepts: hashes, refs, terms and revision parameters.

If you are new to git, the following easy-to-read trilogy provides a nice explanation:

  1. A curious tale.
  2. Curious git.
  3. Types of git objects brings you right into this document.

Low- to High-Level User Interfaces

This section was paraphrased, updated and enhanced from the Git Internals - Plumbing and Porcelain chapter of the Pro Git book, written by Scott Chacon and Ben Straub and published by Apress. The book is licensed under the Creative Commons Attribution Non-Commercial Share Alike 3.0 license.

Git was initially a toolkit for a version control system, rather than being user-friendly. From the beginning, its low-level subcommands were designed to be chained together UNIX-style, or called from scripts. The low-level git subcommands are referred to as plumbing subcommands.

Starting from 2015, more user-friendly git subcommands were added; continuing with the plumbing metaphor, the more user-friendly git subcommands are called porcelain commands. There were 43 main porcelain subcommands when this article was last updated. Type git help -a to see these subcommands, along with many other categories of subcommands.

Shell
$ 

Fundamental Concepts

Git currently uses SHA-1 to identify all types of objects it stores (commits, trees, blobs and annotated tags). Git has symbolic names for branches and tags, to spare you the awkwardness of having to use long alphanumeric identifiers. Combinations of symbolic names are called refs, which is short for references. While most refs usually refer to commits, tags are a special kind of ref that can refer to any of the four object types.

Git v2.2.9 introduced SHA-256 for object names and content. This required a new repository format. There is no interoperability between SHA-1 and SHA-256 repositories yet. No major Git provider is currently supporting SHA-256-enabled repositories yet.

A revision is anything which may be resolved to some kind of object stored in a Git object database, using Git’s DSL.

Git implements a DSL that can be used by combining ref names, SHA-1 names and operators. This is documented in the gitrevisions man page, which is dedicated to specifying revisions and ranges for Git. This is a difficult document to read. I have rewritten and paraphrased the material in the remainder of this article.

Low-level Files and Directories

Within the .git/ directory of a git project, many entries are possible:

Shell
$ tree .git -FL 1
.git
├── COMMIT_EDITMSG
├── FETCH_HEAD
├── HEAD
├── ORIG_HEAD
├── branches/
├── config
├── description
├── hooks/
├── index
├── info/
├── logs/
├── objects/
├── packed-refs
└── refs/ 

Only 4 files and subdirectories are important for this discussion:

HEAD
This file is created after the first commit, and points to the current branch. HEAD is a special reference. By definition, it always points to the currently checked out commit. However, this is not usually a direct pointer – instead, it is a symbolic reference, which means that it points to a branch whose tip commit is currently checked out.
index
This file contains the git staging area, also referred to by older documentation as the cache. This is the data that is committed when you run git commit. In general, when you commit, you commit the index.
objects/
This subdirectory contains the git project’s object database; the objects directory has 256 subdirectories, which contain the actual database files.
refs/
This subdirectory stores string representations of pointers to objects stored in the object database, such as commits, branches, tags, and remotes.

References

Every branch has a head, which is the pointer to the current branch reference, which is in turn a pointer to the last commit made on that branch. If the default branch is master, then the default head is the head of the master branch.

The reference called HEAD is equivalent to writing something of the form HEAD/<default_branch>. If the current branch is master you might write HEAD/master. You could also be more precise by writing using the fully qualified form: refs/heads/master.

The value for HEAD is persisted in .git/HEAD:

Shell
$ cat .git/HEAD
ref: refs/heads/master 

The above defines HEAD as refs/heads/master. This means the default branch is master. When a git repository has these contents in that file, writing HEAD is equivalent to heads/master and refs/heads/master.

Shell
$ git show-ref -s HEAD
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refnames

A refname is a symbolic reference name. Examples include: master, heads/master, refs/heads/master and refs/remotes/origin/master. The shorter refnames are convenient to write, while using longer refnames avoids ambiguity.

The refname master typically means the commit object referenced by refs/heads/master, defined in the file .git/refs/heads/master. However, the meaning of the short version of a refname might be ambiguious, depending on context.

For example, if a git repository has both the refnames heads/master (defined in the file .git/refs/heads/master) and refs/remotes/origin/master (defined in the file .git/refs/remotes/origin/master), you can explicitly write heads/master or refs/remotes/origin/master to be precise. Note that the SHAs of each reference are the same, which would make sense if these repositories were both up-to-date sibling clones.

Shell
$ cat .git/refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ cat .git/refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref master
fcd6335681f917421ef3522bc9704c4800467aa0 refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 refs/remotes/origin/master 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refname Disambiguation Rules

When ambiguous, a refname is disambiguated by the contents of the first file found below:

  1. .git/<refname>
    These unqualified refnames are usually only useful for the following:
    Refname Defined in Description
    HEAD .git/HEAD Names the commit on which you based the changes in the working tree.
    FETCH_HEAD .git/FETCH_HEAD Records the branch which you fetched from a remote repository with your last git fetch invocation.
    ORIG_HEAD .git/ORIG_HEAD This file and the refname are created by commands that move HEAD in a drastic way, such as git am, git merge, git rebase, and git reset. The purpose of this file and refname is to record the position of the HEAD before their operation, so that you can easily change the tip of the branch back to the state before you ran them.
    MERGE_HEAD .git/MERGE_HEAD This file and refname record the commit(s) which you are merging into your branch when you run git merge.
    CHERRY_PICK_HEAD .git/CHERRY_PICK_HEAD This file and refname record the commit which you are cherry-picking when you run git cherry-pick.
  2. .git/refs/<refname>
    .git/packed-refs/<refname>
  3. .git/refs/tags/<refname>
    .git/packed-refs/tags/<refname>
  4. .git/refs/heads/<refname>
    .git/packed-refs/heads/<refname>
  5. .git/refs/remotes/<refname>
    .git/packed-refs/remotes/<refname>
  6. .git/refs/remotes/<refname>/HEAD
    .git/packed-refs/remotes/<refname>/HEAD

Reference Logs

The history of the previous values of references is stored in .git/logs:

Shell
$ 

The above directory tree shows where the history of HEAD is stored: in .git/logs/HEAD.

Each git branch has its own history, under .git/logs/refs/heads. Internally, git refers to branches as heads. This might seem confusing; you will get used to it.

Each remote HEAD has its history stored under .git/logs/refs/remotes/<remote_name>/HEAD.

Each remote branch has its history stored under .git/logs/refs/remotes/<remote_name>/<branch_­name>.

Access the reference log by indexing a reference @{using braces}, preceded by an @ character. For example, the previous HEAD can be written like this: HEAD@{1}.

Terms

The Git Glossary defines many terms, and StackOverflow clarifies them. I have expanded on some definitions:

Commit-ish
The SHA of a commit, or an annotated tag that points at a commit. All commit-ish references are also tree-ish.
Tree-ish
Any identifier that points to a subdirectory tree. Git refers to directories as trees and tree objects. The general form is: <rev>:<path>.
To break this down, first there might be an optional prefix, delimited by a colon (:), followed by the name of the blob or tree at the given path.
For example: HEAD:README, :README, master:path/to/file, and master:README.

If the prefix is not provided, HEAD is assumed.
Working tree
The directory tree of physical files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

Unless submodules or worktrees are in play, the parent directory of the .git/ directory contains the working tree. Bare repositories have no working tree.

The .git/ directory is physically contained within the working tree, but is not logically part of it.
Repository
A repository is a collection of refs, together with an object database containing all objects which are reachable from the refs, possibly accompanied by metadata from one or more porcelains. A repository can share an object database with other repositories via an alternates mechanism. The repository proper does not include the index or the working tree; it mostly consists of the commits.
index
Also known as the staging area and the cache. A collection of files with status information, whose contents are stored as objects. The index is a stored version of the working tree. The index can also contain two or three versions of a working tree, for merging.

Revision Parameters

The meaning of revision parameters depends on the git command they are used with. A revision parameter might denote:

  • A specific commit; the type of revision parameter could be specified as:
    • A SHA or ref.
    • Output from git describe: a tag, optionally followed by a dash and a number of commits, followed by a dash, a g, then an abbreviated object name.
  • For commands, such as git-log, which walk the revision graph, revison parameters denote all commits which are reachable from that commit. The range of revisions can also be explicitly specified.
  • Some Git commands, such as git-cat-file, git-push, git-show, and git-show-ref, accept revision parameters which denote types of objects other than commits. For example, these commands can accept objects such as blobs (files) or trees (directories of files).

The syntax for revision parameters is easily confused with the syntax for a parent commit. Revision parameters look like reference^{type}, whereas the parent of HEAD is written as HEAD^.

To be more specific, revision parameters are written with the following components:

  1. A reference.
  2. The character ^.
  3. An object type name enclosed in braces, for example: {commit} or {tree}.

^0 is a shorthand for ^{commit}.

The object is recursively dereferenced until an object of the desired type is found.

In the following example, master^{tree} returns the tree object associated with ref master.

Shell
$ 
$

Revision Syntax Examples

The following table shows examples of revision syntax in the left column, and the returned class from Rugged::Repository.rev_parse in the right column. Hover your mouse over a row to see it highlighted.

IncantationReturned Class
abf8efadc8Rugged::Commit
abf8efadc8:README.mdRugged::Blob
abf8efadc8^Rugged::Commit
abf8efadc8^{tree}Rugged::Tree
@Rugged::Commit
HEADRugged::Commit
HEAD~3Rugged::Commit
HEAD^Rugged::Commit
HEAD^{tree}Rugged::Tree
master^{tree}Rugged::Tree
HEAD:README.mdRugged::Blob
master:README.mdRugged::Blob
HEAD@{0}Rugged::Commit
HEAD@{yesterday}Rugged::Commit
HEAD@{2 months ago}Rugged::Commit
HEAD@{1 month 2 weeks 3 days ago}Rugged::Commit
HEAD@{'Oct 15, 2021'}Rugged::Commit
HEAD@{'2021-10-15'}^{tree}Rugged::Tree
HEAD@{'2021-10-15'}:README.mdRugged::Blob
master@{yesterday}Rugged::Commit
master@{'2021-10-15'}:README.mdRugged::Blob
@{'2021-10-15'}:README.mdRugged::Blob
@{last week}:README.mdRugged::Blob
@{last month}:README.mdRugged::Blob
@{last year}:README.mdRugged::Blob
@{'2021-10-15 12:34'}:README.mdRugged::Blob
@{0}Rugged::Commit
v1.5.1Rugged::Commit
v1.5.1^0Rugged::Commit
v1.5.1^{}Rugged::Commit
:/bumpRugged::Commit
HEAD^{/bump}Rugged::Commit

The above table was produced by the following program and my flexible_include Jekyll plugin:

#!/usr/bin/env ruby

require 'rainbow/refinement'
require 'rugged'

class GitRevisionException < StandardError; end

using Rainbow

EXPRESSIONS = [
  'abf8efadc8',
  'abf8efadc8:README.md',
  'abf8efadc8^',
  'abf8efadc8^{tree}',
  '@',
  'HEAD',
  'HEAD~3',
  'HEAD^',
  'HEAD^{tree}',
  'master^{tree}',
  'HEAD:README.md',
  'master:README.md',
  'HEAD@{0}',
  'HEAD@{yesterday}',
  'HEAD@{2 months ago}',
  'HEAD@{1 month 2 weeks 3 days ago}',
  "HEAD@{'Oct 15, 2021'}",
  "HEAD@{'2021-10-15'}^{tree}",
  "HEAD@{'2021-10-15'}:README.md",
  'master@{yesterday}',
  "master@{'2021-10-15'}:README.md",
  "@{'2021-10-15'}:README.md",
  "@{last week}:README.md",
  "@{last month}:README.md",
  "@{last year}:README.md",
  "@{'2021-10-15 12:34'}:README.md",
  '@{0}',
  'v1.5.1',
  'v1.5.1^0',
  'v1.5.1^{}',
  ':/bump',
  'HEAD^{/bump}'
].freeze

def do_one(rev_str)
  rev_str.strip!
  return nil if rev_str.strip.empty?

  begin
    result = @repo.rev_parse(rev_str).class
    td = "<td>#{result}</td>"
  rescue StandardError => e
    td = "<td class='error' style='padding: 1px 3px;'>#{e.message}</td>"
  end
  "  <tr class='code'><td>#{rev_str}</td> #{td}</tr>"
end

def expand_env(str)
  str.gsub(/\$([a-zA-Z_][a-zA-Z0-9_]*)|\${\g<1>}|%\g<1>%/) do
    ENV.fetch(Regexp.last_match(1), nil)
  end
end

begin
  git_dir = expand_env '$rugged'
  abort 'Error: the $rugged environment variable is not defined'.red if git_dir.empty?
  @repo = Rugged::Repository.new git_dir
  puts <<~END_OUTPUT
    <table class="condensed noborder table">
      <tr><th>Incantation</th> <th>Returned Class</th></tr>
    #{EXPRESSIONS.map { |x| do_one x }.compact.join("\n")}
    </table>
  END_OUTPUT
rescue StandardError => e
  raise GitRevisionException, "#{e.class}: #{e.full_message}".red, []
end

If you want to be able to run this program, you first need to install its dependency, rainbow:

Shell
$ gem install rainbow

References



* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.