Git and libgit2

Low-Level Git Concepts

Published 2023-03-13. Last modified 2023-04-25.
Time to read: 6 minutes.

This page is part of the git collection.

This article discusses low- to high-level Git concepts: hashes, refs, terms and revision parameters.

If you are new to Git, the following easy-to-read trilogy provides a nice explanation:

  1. A curious tale.
  2. Curious Git.
  3. Types of Git objects brings you right into this document.

Low- to High-Level User Interfaces

This section was paraphrased, updated and enhanced from the Git Internals - Plumbing and Porcelain chapter of the Pro Git book, written by Scott Chacon and Ben Straub and published by Apress. The book is licensed under the Creative Commons Attribution Non-Commercial Share Alike 3.0 license.

Git was initially a toolkit for a version control system, rather than being user-friendly. From the beginning, its low-level subcommands were designed to be chained together UNIX-style, or called from scripts. The low-level Git subcommands are referred to as plumbing subcommands.

Starting from 2015, more user-friendly git subcommands were added; continuing with the plumbing metaphor, the more user-friendly git subcommands are called porcelain commands. There were 43 main porcelain subcommands when this article was last updated. Type git help -a to see these subcommands, along with many other categories of subcommands.

git-help
See 'git help <command>' to read about a specific subcommand
Main Porcelain Commands add Add file contents to the index am Apply a series of patches from a mailbox archive Create an archive of files from a named tree bisect Use binary search to find the commit that introduced a bug branch List, create, or delete branches bundle Move objects and refs by archive checkout Switch branches or restore working tree files cherry-pick Apply the changes introduced by some existing commits citool Graphical alternative to git-commit clean Remove untracked files from the working tree clone Clone a repository into a new directory commit Record changes to the repository describe Give an object a human readable name based on an available ref diff Show changes between commits, commit and working tree, etc fetch Download objects and refs from another repository format-patch Prepare patches for e-mail submission gc Cleanup unnecessary files and optimize the local repository gitk The Git repository browser grep Print lines matching a pattern gui A portable graphical interface to Git init Create an empty Git repository or reinitialize an existing one log Show commit logs maintenance Run tasks to optimize Git repository data merge Join two or more development histories together mv Move or rename a file, a directory, or a symlink notes Add or inspect object notes pull Fetch from and integrate with another repository or a local branch push Update remote refs along with associated objects range-diff Compare two commit ranges (e.g. two versions of a branch) rebase Reapply commits on top of another base tip reset Reset current HEAD to the specified state restore Restore working tree files revert Revert some existing commits rm Remove files from the working tree and from the index scalar A tool for managing large Git repositories shortlog Summarize 'git log' output show Show various types of objects sparse-checkout Reduce your working tree to a subset of tracked files stash Stash the changes in a dirty working directory away status Show the working tree status submodule Initialize, update or inspect submodules switch Switch branches tag Create, list, delete or verify a tag object signed with GPG worktree Manage multiple working trees
Ancillary Commands / Manipulators config Get and set repository or global options fast-export Git data exporter fast-import Backend for fast Git data importers filter-branch Rewrite branches mergetool Run merge conflict resolution tools to resolve merge conflicts pack-refs Pack heads and tags for efficient repository access prune Prune all unreachable objects from the object database reflog Manage reflog information remote Manage set of tracked repositories repack Pack unpacked objects in a repository replace Create, list, delete refs to replace objects
Ancillary Commands / Interrogators annotate Annotate file lines with commit information blame Show what revision and author last modified each line of a file bugreport Collect information for user to file a bug report count-objects Count unpacked number of objects and their disk consumption diagnose Generate a zip archive of diagnostic information difftool Show changes using common diff tools fsck Verifies the connectivity and validity of the objects in the database gitweb Git web interface (web frontend to Git repositories) help Display help information about Git instaweb Instantly browse your working repository in gitweb merge-tree Perform merge without touching index or working tree rerere Reuse recorded resolution of conflicted merges show-branch Show branches and their commits verify-commit Check the GPG signature of commits verify-tag Check the GPG signature of tags version Display version information about Git whatchanged Show logs with difference each commit introduces
Interacting with Others archimport Import a GNU Arch repository into Git cvsexportcommit Export a single commit to a CVS checkout cvsimport Salvage your data out of another SCM people love to hate cvsserver A CVS server emulator for Git imap-send Send a collection of patches from stdin to an IMAP folder p4 Import from and submit to Perforce repositories quiltimport Applies a quilt patchset onto the current branch request-pull Generates a summary of pending changes send-email Send a collection of patches as emails svn Bidirectional operation between a Subversion repository and Git
Low-level Commands / Manipulators apply Apply a patch to files and/or to the index checkout-index Copy files from the index to the working tree commit-graph Write and verify Git commit-graph files commit-tree Create a new commit object hash-object Compute object ID and optionally creates a blob from a file index-pack Build pack index file for an existing packed archive merge-file Run a three-way file merge merge-index Run a merge for files needing merging mktag Creates a tag object with extra validation mktree Build a tree-object from ls-tree formatted text multi-pack-index Write and verify multi-pack-indexes pack-objects Create a packed archive of objects prune-packed Remove extra objects that are already in pack files read-tree Reads tree information into the index symbolic-ref Read, modify and delete symbolic refs unpack-objects Unpack objects from a packed archive update-index Register file contents in the working tree to the index update-ref Update the object name stored in a ref safely write-tree Create a tree object from the current index
Low-level Commands / Interrogators cat-file Provide content or type and size information for repository objects cherry Find commits yet to be applied to upstream diff-files Compares files in the working tree and the index diff-index Compare a tree to the working tree or index diff-tree Compares the content and mode of blobs found via two tree objects for-each-ref Output information on each ref for-each-repo Run a Git command on a list of repositories get-tar-commit-id Extract commit ID from an archive created using git-archive ls-files Show information about files in the index and the working tree ls-remote List references in a remote repository ls-tree List the contents of a tree object merge-base Find as good common ancestors as possible for a merge name-rev Find symbolic names for given revs pack-redundant Find redundant pack files rev-list Lists commit objects in reverse chronological order rev-parse Pick out and massage parameters show-index Show packed archive index show-ref List references in a local repository unpack-file Creates a temporary file with a blob's contents var Show a Git logical variable verify-pack Validate packed Git archive files
Low-level Commands / Syncing Repositories daemon A really simple server for Git repositories fetch-pack Receive missing objects from another repository http-backend Server side implementation of Git over HTTP send-pack Push objects over Git protocol to another repository update-server-info Update auxiliary info file to help dumb servers
Low-level Commands / Internal Helpers check-attr Display gitattributes information check-ignore Debug gitignore / exclude files check-mailmap Show canonical names and email addresses of contacts check-ref-format Ensures that a reference name is well formed column Display data in columns credential Retrieve and store user credentials credential-cache Helper to temporarily store passwords in memory credential-store Helper to store credentials on disk fmt-merge-msg Produce a merge commit message hook Run git hooks interpret-trailers Add or parse structured information in commit messages mailinfo Extracts patch and authorship from a single e-mail message mailsplit Simple UNIX mbox splitter program merge-one-file The standard helper program to use with git-merge-index patch-id Compute unique ID for a patch sh-i18n Git's i18n setup code for shell scripts sh-setup Common Git shell script setup code stripspace Remove unnecessary whitespace
User-facing repository, command and file interfaces attributes Defining attributes per path cli Git command-line interface and conventions hooks Hooks used by Git ignore Specifies intentionally untracked files to ignore mailmap Map author/committer names and/or E-Mail addresses modules Defining submodule properties repository-layout Git Repository Layout revisions Specifying revisions and ranges for Git
Developer-facing file formats, protocols and other interfaces format-bundle The bundle file format format-chunk Chunk-based file formats format-commit-graph Git commit-graph format format-index Git index format format-pack Git pack format format-signature Git cryptographic signature formats protocol-capabilities Protocol v0 and v1 capabilities protocol-common Things common to various protocols protocol-http Git HTTP-based protocols protocol-pack How packs are transferred over-the-wire protocol-v2 Git Wire Protocol, Version 2
External commands fame filter-repo gui lfs remote-keybase tree-evars tree-exec tree-replicate
Command aliases br branch ci commit co checkout dc diff --cached df diff dif diff --word-diff=color --ignore-space-at-eol ign ls-files -o -i --exclude-standard lg log -p lol log --graph --decorate --pretty=oneline --abbrev-commit lola log --graph --decorate --pretty=oneline --abbrev-commit --all ls ls-files pwd !pwd st status
hub custom commands
alias api browse ci-status compare create delete fork gist issue pr pull-request release sync

Fundamental Concepts

Git currently uses SHA-1 to identify all types of objects it stores (commits, trees, blobs and annotated tags). Git has symbolic names for branches and tags, to spare you the awkwardness of having to use long alphanumeric identifiers. Combinations of symbolic names are called refs, which is short for references. While most refs usually refer to commits, tags are a special kind of ref that can refer to any of the four object types.

Git v2.2.9 introduced SHA-256 for object names and content. This required a new repository format. There is no interoperability between SHA-1 and SHA-256 repositories yet. No major Git provider is currently supporting SHA-256-enabled repositories yet.

A revision is anything which may be resolved to some kind of object stored in a Git object database, using Git’s DSL.

Git implements a DSL that can be used by combining ref names, SHA-1 names and operators. This is documented in the gitrevisions man page, which is dedicated to specifying revisions and ranges for Git. This is a difficult document to read. I have rewritten and paraphrased the material in the remainder of this article.

Low-level Files and Directories

Within the .git/ directory of a Git project, many entries are possible:

Shell
$ tree .git -FL 1
.git
├── COMMIT_EDITMSG
├── FETCH_HEAD
├── HEAD
├── ORIG_HEAD
├── branches/
├── config
├── description
├── hooks/
├── index
├── info/
├── logs/
├── objects/
├── packed-refs
└── refs/ 

Only 4 files and subdirectories are important for this discussion:

HEAD
This file is created after the first commit, and points to the current branch. HEAD is a special reference. By definition, it always points to the currently checked out commit. However, this is not usually a direct pointer – instead, it is a symbolic reference, which means that it points to a branch whose tip commit is currently checked out.
index
This file contains the Git staging area, also referred to by older documentation as the cache. This is the data that is committed when you run git commit. In general, when you commit, you commit the index.
objects/
This subdirectory contains the git project’s object database; the objects directory has 256 subdirectories, which contain the actual database files.
refs/
This subdirectory stores string representations of pointers to objects stored in the object database, such as commits, branches, tags, and remotes.

References

Every branch has a head, which is the pointer to the current branch reference, which is in turn a pointer to the last commit made on that branch. If the default branch is master, then the default head is the head of the master branch.

The reference called HEAD is equivalent to writing something of the form HEAD/<default_branch>. If the current branch is master you might write HEAD/master. You could also be more precise by writing using the fully qualified form: refs/heads/master.

The value for HEAD is persisted in .git/HEAD:

Shell
$ cat .git/HEAD
ref: refs/heads/master 

The above defines HEAD as refs/heads/master. This means the default branch is master. When a git repository has these contents in that file, writing HEAD is equivalent to heads/master and refs/heads/master.

Shell
$ git show-ref -s HEAD
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refnames

A refname is a symbolic reference name. Examples include: master, heads/master, refs/heads/master and refs/remotes/origin/master. The shorter refnames are convenient to write, while using longer refnames avoids ambiguity.

The refname master typically means the commit object referenced by refs/heads/master, defined in the file .git/refs/heads/master. However, the meaning of the short version of a refname might be ambiguious, depending on context.

For example, if a Git repository has both the refnames heads/master (defined in the file .git/refs/heads/master) and refs/remotes/origin/master (defined in the file .git/refs/remotes/origin/master), you can explicitly write heads/master or refs/remotes/origin/master to be precise. Note that the SHAs of each reference are the same, which would make sense if these repositories were both up-to-date sibling clones.

Shell
$ cat .git/refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ cat .git/refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-refmaster
fcd6335681f917421ef3522bc9704c4800467aa0 refs/heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 refs/remotes/origin/master 

$ git show-ref -s heads/master
fcd6335681f917421ef3522bc9704c4800467aa0 

$ git show-ref -s refs/remotes/origin/master
fcd6335681f917421ef3522bc9704c4800467aa0 

Refname Disambiguation Rules

When ambiguous, a refname is disambiguated by the contents of the first file found below:

  1. .git/<refname>
    These unqualified refnames are usually only useful for the following:
    Refname Defined in Description
    HEAD .git/HEAD Names the commit on which you based the changes in the working tree.
    FETCH_HEAD .git/FETCH_HEAD Records the branch which you fetched from a remote repository with your last Git fetch invocation.
    ORIG_HEAD .git/ORIG_HEAD This file and the refname are created by commands that move HEAD in a drastic way, such as git am, git merge, git rebase, and git reset. The purpose of this file and refname is to record the position of the HEAD before their operation, so that you can easily change the tip of the branch back to the state before you ran them.
    MERGE_HEAD .git/MERGE_HEAD This file and refname record the commit(s) which you are merging into your branch when you run git merge.
    CHERRY_PICK_HEAD .git/CHERRY_PICK_HEAD This file and refname record the commit which you are cherry-picking when you run git cherry-pick.
  2. .git/refs/<refname>
    .git/packed-refs/<refname>
  3. .git/refs/tags/<refname>
    .git/packed-refs/tags/<refname>
  4. .git/refs/heads/<refname>
    .git/packed-refs/heads/<refname>
  5. .git/refs/remotes/<refname>
    .git/packed-refs/remotes/<refname>
  6. .git/refs/remotes/<refname>/HEAD
    .git/packed-refs/remotes/<refname>/HEAD

Reference Logs

The history of the previous values of references is stored in .git/logs:

Shell
$ tree --noreport .git/logs
├── HEAD
└── refs
    ├── heads
    │   └── master
    ├── remotes
    │   └── origin
    │       ├── HEAD
    │       └── master
    └── stash 

The above directory tree shows where the history of HEAD is stored: in .git/logs/HEAD.

Each Git branch has its own history, under .git/logs/refs/heads. Internally, Git refers to branches as heads. This might seem confusing; you will get used to it.

Each remote HEAD has its history stored under .git/logs/refs/remotes/<remote_name>/HEAD.

Each remote branch has its history stored under .git/logs/refs/remotes/<remote_name>/<branch_­name>.

Access the reference log by indexing a reference @{using braces}, preceded by an @ character. For example, the previous HEAD can be written like this: HEAD@{1}.

Terms

The Git Glossary defines many terms, and StackOverflow clarifies them. I have expanded on some definitions:

Commit-ish
The SHA of a commit, or an annotated tag that points at a commit. All commit-ish references are also tree-ish.
Tree-ish
Any identifier that points to a subdirectory tree. Git refers to directories as trees and tree objects. The general form is: <rev>:<path>.
To break this down, first there might be an optional prefix, delimited by a colon (:), followed by the name of the blob or tree at the given path.
For example: HEAD:README, :README, master:path/to/file, and master:README.

If the prefix is not provided, HEAD is assumed.
Working tree
The directory tree of physical files. The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.

Unless submodules or worktrees are in play, the parent directory of the .git/ directory contains the working tree. Bare repositories have no working tree.

The .git/ directory is physically contained within the working tree, but is not logically part of it.
Repository
A repository is a collection of refs, together with an object database containing all objects which are reachable from the refs, possibly accompanied by metadata from one or more porcelains. A repository can share an object database with other repositories via an alternates mechanism. The repository proper does not include the index or the working tree; it mostly consists of the commits.
index
Also known as the staging area and the cache. A collection of files with status information, whose contents are stored as objects. The index is a stored version of the working tree. The index can also contain two or three versions of a working tree, for merging.

Revision Parameters

The meaning of revision parameters depends on the Git command they are used with. A revision parameter might denote:

  • A specific commit; the type of revision parameter could be specified as:
    • A SHA or ref.
    • Output from git describe: a tag, optionally followed by a dash and a number of commits, followed by a dash, a g, then an abbreviated object name.
  • For commands, such as git-log, which walk the revision graph, revison parameters denote all commits which are reachable from that commit. The range of revisions can also be explicitly specified.
  • Some Git commands, such as git-cat-file, git-push, git-show, and git-show-ref, accept revision parameters which denote types of objects other than commits. For example, these commands can accept objects such as blobs (files) or trees (directories of files).

The syntax for revision parameters is easily confused with the syntax for a parent commit. Revision parameters look like reference^{type}, whereas the parent of HEAD is written as HEAD^.

To be more specific, revision parameters are written with the following components:

  1. A reference.
  2. The character ^.
  3. An object type name enclosed in braces, for example: {commit} or {tree}.

^0 is a shorthand for ^{commit}.

The object is recursively dereferenced until an object of the desired type is found.

In the following example, master^{tree} returns the tree object associated with ref master.

Shell
$ git cat-file -p master
tree 4a2efcfb37f6c809384e9b050c8076349c22d74e
parent b0f0db4cc5d5eeadb3a50282a5bb3c560c533ef0
author Mike Slinn <mslinn@mslinn.com> 1711887277 -0400
committer Mike Slinn <mslinn@mslinn.com> 1711887277 -0400
-

$ git cat-file -p master^{tree} | head 040000 tree eb81966ec081881a6e3d668ce07c089f1b506b18 .bundle 100644 blob 1b97443a9c5dcb10015b55abe0f7b8e142f1bbfe .gitignore 100644 blob 68389f23bf4f4006577672914d3e5bd0b5ea65bd .markdownlint.yaml 100644 blob e15734f7749e67a0a0b3bb762507da6546fceb9e .rubocop.yml 100644 blob 4e0ef479723860d16a332347a691d422a0ef2770 .shellcheckrc 040000 tree 292e479d0f8468fc41b96ee4607491decc82274c .vscode 100644 blob 779480bf02fc0102a240caa65523fc6c435954bf 404.html 100644 blob 1fe1aa295d166663285b54cb94954bfe19da152c 670nm.html 100644 blob 02862b254846b5669596de4d2795d023ebc87c7c BingSiteAuth.xml 100644 blob 5c9809276c1de34187c3fb847859c34708b7397f Gemfile

Revision Syntax Examples

The following table shows examples of revision syntax in the left column, and the returned class from Rugged::Repository.rev_parse in the right column. Hover your mouse over a row to see it highlighted.

IncantationReturned Class
abf8efadc8Rugged::Commit
abf8efadc8:README.mdRugged::Blob
abf8efadc8^Rugged::Commit
abf8efadc8^{tree}Rugged::Tree
@Rugged::Commit
HEADRugged::Commit
HEAD~3Rugged::Commit
HEAD^Rugged::Commit
HEAD^{tree}Rugged::Tree
master^{tree}Rugged::Tree
HEAD:README.mdRugged::Blob
master:README.mdRugged::Blob
HEAD@{0}Rugged::Commit
HEAD@{yesterday}Rugged::Commit
HEAD@{2 months ago}Rugged::Commit
HEAD@{1 month 2 weeks 3 days ago}Rugged::Commit
HEAD@{'Oct 15, 2021'}Rugged::Commit
HEAD@{'2021-10-15'}^{tree}Rugged::Tree
HEAD@{'2021-10-15'}:README.mdRugged::Blob
master@{yesterday}Rugged::Commit
master@{'2021-10-15'}:README.mdRugged::Blob
@{'2021-10-15'}:README.mdRugged::Blob
@{last week}:README.mdRugged::Blob
@{last month}:README.mdRugged::Blob
@{last year}:README.mdRugged::Blob
@{'2021-10-15 12:34'}:README.mdRugged::Blob
@{0}Rugged::Commit
v1.5.1Rugged::Commit
v1.5.1^0Rugged::Commit
v1.5.1^{}Rugged::Commit
:/bumpRugged::Commit
HEAD^{/bump}Rugged::Commit

The above table was produced by the following program and my flexible_include Jekyll plugin:

#!/usr/bin/env ruby

require 'rainbow/refinement'
require 'rugged'

class GitRevisionException < StandardError; end

using Rainbow

EXPRESSIONS = [
  'abf8efadc8',
  'abf8efadc8:README.md',
  'abf8efadc8^',
  'abf8efadc8^{tree}',
  '@',
  'HEAD',
  'HEAD~3',
  'HEAD^',
  'HEAD^{tree}',
  'master^{tree}',
  'HEAD:README.md',
  'master:README.md',
  'HEAD@{0}',
  'HEAD@{yesterday}',
  'HEAD@{2 months ago}',
  'HEAD@{1 month 2 weeks 3 days ago}',
  "HEAD@{'Oct 15, 2021'}",
  "HEAD@{'2021-10-15'}^{tree}",
  "HEAD@{'2021-10-15'}:README.md",
  'master@{yesterday}',
  "master@{'2021-10-15'}:README.md",
  "@{'2021-10-15'}:README.md",
  "@{last week}:README.md",
  "@{last month}:README.md",
  "@{last year}:README.md",
  "@{'2021-10-15 12:34'}:README.md",
  '@{0}',
  'v1.5.1',
  'v1.5.1^0',
  'v1.5.1^{}',
  ':/bump',
  'HEAD^{/bump}'
].freeze

def do_one(rev_str)
  rev_str.strip!
  return nil if rev_str.strip.empty?

  begin
    result = @repo.rev_parse(rev_str).class
    td = "<td>#{result}</td>"
  rescue StandardError => e
    td = "<td class='error' style='padding: 1px 3px;'>#{e.message}</td>"
  end
  "  <tr class='code'><td>#{rev_str}</td> #{td}</tr>"
end

def expand_env(str)
  str.gsub(/\$([a-zA-Z_][a-zA-Z0-9_]*)|\${\g<1>}|%\g<1>%/) do
    ENV.fetch(Regexp.last_match(1), nil)
  end
end

begin
  git_dir = expand_env '$rugged'
  abort 'Error: the $rugged environment variable is not defined'.red if git_dir.empty?
  @repo = Rugged::Repository.new git_dir
  puts <<~END_OUTPUT
    <table class="condensed noborder table">
      <tr><th>Incantation</th> <th>Returned Class</th></tr>
    #{EXPRESSIONS.map { |x| do_one x }.compact.join("\n")}
    </table>
  END_OUTPUT
rescue StandardError => e
  raise GitRevisionException, "#{e.class}: #{e.full_message}".red, []
end

If you want to be able to run this program, you first need to install its dependency, rainbow:

Shell
$ gem install rainbow

References

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.