Published 2023-03-13.
Last modified 2023-04-25.
Time to read: 6 minutes.
git
collection.
This article discusses low- to high-level git
concepts:
hashes, refs, terms and revision parameters.
If you are new to git
, the following easy-to-read trilogy provides a nice explanation:
- A curious tale.
- Curious git.
- Types of git objects brings you right into this document.
Low- to High-Level User Interfaces
This section was paraphrased, updated and enhanced from the Git Internals - Plumbing and Porcelain chapter of the Pro Git book, written by Scott Chacon and Ben Straub and published by Apress. The book is licensed under the Creative Commons Attribution Non-Commercial Share Alike 3.0 license.
Git was initially a toolkit for a version control system, rather than being user-friendly.
From the beginning, its low-level subcommands were designed to be chained together UNIX-style,
or called from scripts.
The low-level git
subcommands are referred to as plumbing subcommands.
Starting from 2015, more user-friendly git subcommands were added;
continuing with the plumbing metaphor,
the more user-friendly git subcommands are called porcelain commands.
There were 43 main porcelain subcommands when this article was last updated.
Type git help -a
to see these subcommands,
along with many other categories of subcommands.
$
Fundamental Concepts
Git currently uses SHA-1 to identify all types of objects it stores (commits, trees, blobs and annotated tags). Git has symbolic names for branches and tags, to spare you the awkwardness of having to use long alphanumeric identifiers. Combinations of symbolic names are called refs, which is short for references. While most refs usually refer to commits, tags are a special kind of ref that can refer to any of the four object types.
Git
v2.2.9 introduced SHA-256
for object names and content.
This required a new repository format.
There is no interoperability between SHA-1 and SHA-256 repositories yet.
No major Git provider is currently supporting SHA-256-enabled repositories yet.
A revision is anything which may be resolved to some kind of object stored in a Git object database, using Git’s DSL.
Git implements a DSL that can be used by combining ref names, SHA-1 names and operators.
This is documented in the
gitrevisions
man page, which is dedicated to specifying revisions and ranges for Git.
This is a difficult document to read.
I have rewritten and paraphrased the material in the remainder of this article.
Low-level Files and Directories
Within the .git/
directory of a git
project, many entries are possible:
$ tree .git -FL 1 .git ├── COMMIT_EDITMSG ├── FETCH_HEAD ├── HEAD ├── ORIG_HEAD ├── branches/ ├── config ├── description ├── hooks/ ├── index ├── info/ ├── logs/ ├── objects/ ├── packed-refs └── refs/
Only 4 files and subdirectories are important for this discussion:
HEAD
-
This file is created after the first commit, and points to the current branch.
HEAD
is a special reference. By definition, it always points to the currently checked out commit. However, this is not usually a direct pointer – instead, it is a symbolic reference, which means that it points to a branch whose tip commit is currently checked out. index
-
This file contains the
git
staging area, also referred to by older documentation as the cache. This is the data that is committed when you rungit commit
. In general, when you commit, you commit theindex
. objects/
- This subdirectory contains the git project’s object database;
the
objects
directory has 256 subdirectories, which contain the actual database files. refs/
- This subdirectory stores string representations of pointers to objects stored in the object database, such as commits, branches, tags, and remotes.
References
Every branch has a head, which is the pointer to the current branch reference,
which is in turn a pointer to the last commit made on that branch.
If the default branch is master
,
then the default head is the head of the master
branch.
The reference called HEAD
is equivalent to writing
something of the form HEAD/
.
If the current branch is master
you might write HEAD/
.
You could also be more precise by writing using the fully qualified form:
refs/
.
The value for HEAD
is persisted in .git/HEAD
:
$ cat .git/HEAD ref: refs/heads/master
The above defines HEAD
as refs/
.
This means the default branch is master
.
When a git repository has these contents in that file,
writing HEAD
is equivalent to heads/
and
refs/
.
$ git show-ref -s HEAD fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0
Refnames
A refname is a symbolic reference name.
Examples include:
master
, heads/
,
refs/
and
refs/
.
The shorter refnames are convenient to write,
while using longer refnames avoids ambiguity.
The refname master
typically means the commit object referenced by
refs/
,
defined in the file .git/
.
However, the meaning of the short version of a refname might be ambiguious, depending on context.
For example, if a git
repository has both the refnames
heads/
(defined in the file .git/
) and
refs/
(defined in the file .git/
),
you can explicitly write heads/
or
refs/
to be precise.
Note that the SHAs of each reference are the same,
which would make sense if these repositories were both up-to-date sibling clones.
$ cat .git/refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ cat .git/refs/remotes/origin/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref master fcd6335681f917421ef3522bc9704c4800467aa0 refs/heads/master fcd6335681f917421ef3522bc9704c4800467aa0 refs/remotes/origin/master $ git show-ref -s heads/master fcd6335681f917421ef3522bc9704c4800467aa0 $ git show-ref -s refs/remotes/origin/master fcd6335681f917421ef3522bc9704c4800467aa0
Refname Disambiguation Rules
When ambiguous, a refname is disambiguated by the contents of the first file found below:
-
.git/<refname>
These unqualified refnames are usually only useful for the following:Refname Defined in Description HEAD
.git/HEAD
Names the commit on which you based the changes in the working tree. FETCH_HEAD
.git/FETCH_HEAD
Records the branch which you fetched from a remote repository with your last git fetch invocation. ORIG_HEAD
.git/ORIG_HEAD
This file and the refname are created by commands that move HEAD
in a drastic way, such asgit am
,git merge
,git rebase
, andgit reset
. The purpose of this file and refname is to record the position of theHEAD
before their operation, so that you can easily change the tip of the branch back to the state before you ran them.MERGE_HEAD
.git/MERGE_HEAD
This file and refname record the commit(s) which you are merging into your branch when you run git merge. CHERRY_PICK_HEAD
.git/CHERRY_PICK_HEAD
This file and refname record the commit which you are cherry-picking when you run git cherry-pick
. -
.git/refs/<refname>
.git/packed-refs/<refname>
-
.git/refs/tags/<refname>
.git/packed-refs/tags/<refname>
-
.git/refs/heads/<refname>
.git/packed-refs/heads/<refname>
-
.git/refs/remotes/<refname>
.git/packed-refs/remotes/<refname>
-
.git/refs/remotes/<refname>/HEAD
.git/packed-refs/remotes/<refname>/HEAD
Reference Logs
The history of the previous values of references is stored in .git/
:
$
The above directory tree shows where the history of HEAD
is stored:
in .git/
.
Each git
branch has its own history,
under .git/
.
Internally, git
refers to branches as heads.
This might seem confusing; you will get used to it.
Each remote HEAD
has its history stored under
.git/
.
Each remote branch has its history stored under
.git/
.
Access the reference log by indexing a reference @{using braces}, preceded by an @ character.
For example, the previous HEAD
can be written like this: HEAD@{1}
.
Terms
The Git Glossary defines many terms, and StackOverflow clarifies them. I have expanded on some definitions:
- Commit-ish
- The SHA of a commit, or an annotated tag that points at a commit. All commit-ish references are also tree-ish.
- Tree-ish
-
Any identifier that points to a subdirectory tree.
Git refers to directories as trees and tree objects.
The general form is:
<rev>:<path>
.
To break this down, first there might be an optional prefix, delimited by a colon (:
), followed by the name of the blob or tree at the given path.
For example:HEAD:README
,:README
,master:path/to/file
, andmaster:README
.
If the prefix is not provided,HEAD
is assumed. - Working tree
-
The directory tree of physical files.
The working tree
normally contains the contents of the HEAD commit’s tree,
plus any local changes that you have made but not yet committed.
Unless submodules or worktrees are in play, the parent directory of the.git/
directory contains the working tree. Bare repositories have no working tree.
The.git/
directory is physically contained within the working tree, but is not logically part of it. - Repository
- A repository is a collection of refs, together with an object database containing all objects which are reachable from the refs, possibly accompanied by metadata from one or more porcelains. A repository can share an object database with other repositories via an alternates mechanism. The repository proper does not include the index or the working tree; it mostly consists of the commits.
- index
- Also known as the staging area and the cache. A collection of files with status information, whose contents are stored as objects. The index is a stored version of the working tree. The index can also contain two or three versions of a working tree, for merging.
Revision Parameters
The meaning of revision parameters depends on the git command they are used with. A revision parameter might denote:
-
A specific commit; the type of revision parameter could be specified as:
- A SHA or ref.
-
Output from
git describe
: a tag, optionally followed by a dash and a number of commits, followed by a dash, a g, then an abbreviated object name.
-
For commands, such as
git-log
, which walk the revision graph, revison parameters denote all commits which are reachable from that commit. The range of revisions can also be explicitly specified. -
Some Git commands, such as
git-cat-file
,git-push
,git-show
, andgit-show-ref
, accept revision parameters which denote types of objects other than commits. For example, these commands can accept objects such as blobs (files) or trees (directories of files).
The syntax for revision parameters is easily confused with the syntax for a parent commit.
Revision parameters look like reference^{type}
,
whereas the parent of HEAD
is written as HEAD^
.
To be more specific, revision parameters are written with the following components:
- A reference.
- The character ^.
-
An object type name enclosed in braces,
for example:
{commit}
or{tree}
.
^0
is a shorthand for ^{commit}
.
The object is recursively dereferenced until an object of the desired type is found.
In the following example,
master^{tree}
returns the tree object associated with ref master
.
$
$
Revision Syntax Examples
The following table shows examples of revision syntax in the left column,
and the returned class from Rugged::Repository.rev_parse
in the right column.
Hover your mouse over a row to see it highlighted.
Incantation | Returned Class |
---|---|
abf8efadc8 | Rugged::Commit |
abf8efadc8:README.md | Rugged::Blob |
abf8efadc8^ | Rugged::Commit |
abf8efadc8^{tree} | Rugged::Tree |
@ | Rugged::Commit |
HEAD | Rugged::Commit |
HEAD~3 | Rugged::Commit |
HEAD^ | Rugged::Commit |
HEAD^{tree} | Rugged::Tree |
master^{tree} | Rugged::Tree |
HEAD:README.md | Rugged::Blob |
master:README.md | Rugged::Blob |
HEAD@{0} | Rugged::Commit |
HEAD@{yesterday} | Rugged::Commit |
HEAD@{2 months ago} | Rugged::Commit |
HEAD@{1 month 2 weeks 3 days ago} | Rugged::Commit |
HEAD@{'Oct 15, 2021'} | Rugged::Commit |
HEAD@{'2021-10-15'}^{tree} | Rugged::Tree |
HEAD@{'2021-10-15'}:README.md | Rugged::Blob |
master@{yesterday} | Rugged::Commit |
master@{'2021-10-15'}:README.md | Rugged::Blob |
@{'2021-10-15'}:README.md | Rugged::Blob |
@{last week}:README.md | Rugged::Blob |
@{last month}:README.md | Rugged::Blob |
@{last year}:README.md | Rugged::Blob |
@{'2021-10-15 12:34'}:README.md | Rugged::Blob |
@{0} | Rugged::Commit |
v1.5.1 | Rugged::Commit |
v1.5.1^0 | Rugged::Commit |
v1.5.1^{} | Rugged::Commit |
:/bump | Rugged::Commit |
HEAD^{/bump} | Rugged::Commit |
The above table was produced by the following program and my
flexible_include
Jekyll plugin:
#!/usr/bin/env ruby require 'rainbow/refinement' require 'rugged' class GitRevisionException < StandardError; end using Rainbow EXPRESSIONS = [ 'abf8efadc8', 'abf8efadc8:README.md', 'abf8efadc8^', 'abf8efadc8^{tree}', '@', 'HEAD', 'HEAD~3', 'HEAD^', 'HEAD^{tree}', 'master^{tree}', 'HEAD:README.md', 'master:README.md', 'HEAD@{0}', 'HEAD@{yesterday}', 'HEAD@{2 months ago}', 'HEAD@{1 month 2 weeks 3 days ago}', "HEAD@{'Oct 15, 2021'}", "HEAD@{'2021-10-15'}^{tree}", "HEAD@{'2021-10-15'}:README.md", 'master@{yesterday}', "master@{'2021-10-15'}:README.md", "@{'2021-10-15'}:README.md", "@{last week}:README.md", "@{last month}:README.md", "@{last year}:README.md", "@{'2021-10-15 12:34'}:README.md", '@{0}', 'v1.5.1', 'v1.5.1^0', 'v1.5.1^{}', ':/bump', 'HEAD^{/bump}' ].freeze def do_one(rev_str) rev_str.strip! return nil if rev_str.strip.empty? begin result = @repo.rev_parse(rev_str).class td = "<td>#{result}</td>" rescue StandardError => e td = "<td class='error' style='padding: 1px 3px;'>#{e.message}</td>" end " <tr class='code'><td>#{rev_str}</td> #{td}</tr>" end def expand_env(str) str.gsub(/\$([a-zA-Z_][a-zA-Z0-9_]*)|\${\g<1>}|%\g<1>%/) do ENV.fetch(Regexp.last_match(1), nil) end end begin git_dir = expand_env '$rugged' abort 'Error: the $rugged environment variable is not defined'.red if git_dir.empty? @repo = Rugged::Repository.new git_dir puts <<~END_OUTPUT <table class="condensed noborder table"> <tr><th>Incantation</th> <th>Returned Class</th></tr> #{EXPRESSIONS.map { |x| do_one x }.compact.join("\n")} </table> END_OUTPUT rescue StandardError => e raise GitRevisionException, "#{e.class}: #{e.full_message}".red, [] end
If you want to be able to run this program,
you first need to install its dependency,
rainbow
:
$ gem install rainbow
References
- StackOverflow answer by kostix.
-
gitrevisions(7)
man page. - Understanding the Fundamentals of Git by Rachit Tayal.