Git and libgit2

Git-ls-files and Wildmatch

Published 2023-03-13. Last modified 2025-01-18.
Time to read: 7 minutes.

This page is part of the git collection.

This article is generally useful to git users, and is especially useful for Git Large File Storage (LFS) users. I have incorporated information specific to Git LFS throughout this article; however, if you just use git and do not use Git LFS, all of this information will be useful to you.

I have published 7 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Client Installation
  3. Git LFS Server URLs
  4. Git-ls-files and Wildmatch
  5. Git LFS Filename Patterns & Tracking
  6. Git LFS Client Configuration & Commands
  7. Working With Git LFS
  8. Evaluation Procedure For Git LFS Servers
  9. Git LFS server tests:
    1. Null Git LFS Server

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

Git-Ls-Files

Git ls-files can list all files managed by git, or all files managed by git whose name matches a pattern. The specification for the filenames that are to be listed is called a wildmatch.

Git ls-files only matches wildmatch patterns against files in the git object store and the git index. New files in the git working directory that have not yet been added to the git index via git add will not be listed by git ls-files.

Git ls-files is often used to refine wildmatches for the Git Large File Storage git-lfs-track command.

Help Information

Here is the help information for git-ls-files.

Output of 'git help ls-files'
GIT-LS-FILES(1)               Git Manual               GIT-LS-FILES(1)
NAME git‐ls‐files - Show information about files in the index and the working tree
SYNOPSIS git ls-files [-z] [-t] [-v] [-f] [-c|--cached] [-d|--deleted] [-o|--others] [-i|--ignored] [-s|--stage] [-u|--unmerged] [-k|--killed] [-m|--modified] [--resolve-undo] [--directory [--no-empty-directory]] [--eol] [--deduplicate] [-x <pattern>|--exclude=<pattern>] [-X <file>|--exclude-from=<file>] [--exclude-per-directory=<file>] [--exclude-standard] [--error-unmatch] [--with-tree=<tree-ish>] [--full-name] [--recurse-submodules] [--abbrev[=<n>]] [--format=<format>] [--] [<file>...]
DESCRIPTION This merges the file listing in the index with the actual working directory list, and shows different combinations of the two.
One or more of the options below may be used to determine the files shown, and each file may be printed multiple times if there are multiple entries in the index or multiple statuses are applicable for the relevant file selection options.
OPTIONS -c, --cached Show all files cached in Git’s index, i.e. all tracked files. (This is the default if no -c/-s/-d/-o/-u/-k/-m/--resolve-undo options are specified.)
-d, --deleted Show files with an unstaged deletion
-m, --modified Show files with an unstaged modification (note that an unstaged deletion also counts as an unstaged modification)
-o, --others Show other (i.e. untracked) files in the output
-i, --ignored Show only ignored files in the output. Must be used with either an explicit -c or -o. When showing files in the index (i.e. when used with -c), print only those files matching an exclude pattern. When showing "other" files (i.e. when used with -o), show only those matched by an exclude pattern. Standard ignore rules are not automatically activated, therefore at least one of the --exclude* options is required.
-s, --stage Show staged contents' mode bits, object name and stage number in the output.
--directory If a whole directory is classified as "other", show just its name (with a trailing slash) and not its whole contents. Has no effect without -o/--others.
--no-empty-directory Do not list empty directories. Has no effect without --directory.
-u, --unmerged Show information about unmerged files in the output, but do not show any other tracked files (forces --stage, overrides --cached).
-k, --killed Show untracked files on the filesystem that need to be removed due to file/directory conflicts for tracked files to be able to be written to the filesystem.
--resolve-undo Show files having resolve-undo information in the index together with their resolve-undo information. (resolve-undo information is what is used to implement "git checkout -m $PATH", i.e. to recreate merge conflicts that were accidentally resolved)
-z \0 line termination on output and do not quote filenames. See OUTPUT below for more information.
--deduplicate When only filenames are shown, suppress duplicates that may come from having multiple stages during a merge, or giving --deleted and --modified option at the same time. When any of the -t, --unmerged, or --stage option is in use, this option has no effect.
-x <pattern>, --exclude=<pattern> Skip untracked files matching pattern. Note that pattern is a shell wildcard pattern. See EXCLUDE PATTERNS below for more information.
-X <file>, --exclude-from=<file> Read exclude patterns from <file>; 1 per line.
--exclude-per-directory=<file> Read additional exclude patterns that apply only to the directory and its subdirectories in <file>. Deprecated; use --exclude-standard instead.
--exclude-standard Add the standard Git exclusions: .git/info/exclude, .gitignore in each directory, and the user’s global exclusion file.
--error-unmatch If any <file> does not appear in the index, treat this as an error (return 1).
--with-tree=<tree-ish> When using --error-unmatch to expand the user supplied <file> (i.e. path pattern) arguments to paths, pretend that paths which were removed in the index since the named <tree-ish> are still present. Using this option with -s or -u options does not make any sense.
-t Show status tags together with filenames. Note that for scripting purposes, git‐status(1) --porcelain and git‐diff‐ files(1) --name-status are almost always superior alternatives, and users should look at git‐status(1) --short or git‐diff(1) --name-status for more user-friendly alternatives.
This option provides a reason for showing each filename, in the form of a status tag (which is followed by a space and then the filename). The status tags are all single characters from the following list:
H tracked file that is not either unmerged or skip-worktree
S tracked file that is skip-worktree
M tracked file that is unmerged
R tracked file with unstaged removal/deletion
C tracked file with unstaged modification/change
K untracked paths which are part of file/directory conflicts which prevent checking out tracked files
? untracked file
U file with resolve-undo information
-v Similar to -t, but use lowercase letters for files that are marked as assume unchanged (see git‐update‐index(1)).
-f Similar to -t, but use lowercase letters for files that are marked as fsmonitor valid (see git‐update‐index(1)).
--full-name When run from a subdirectory, the command usually outputs paths relative to the current directory. This option forces paths to be output relative to the project top directory.
--recurse-submodules Recursively calls ls-files on each active submodule in the repository. Currently there is only support for the --cached and --stage modes.
--abbrev[=<n>] Instead of showing the full 40-byte hexadecimal object lines, show the shortest prefix that is at least <n> hexdigits long that uniquely refers the object. Non default number of digits can be specified with --abbrev=<n>.
--debug After each line that describes a file, add more data about its cache entry. This is intended to show as much information as possible for manual inspection; the exact format may change at any time.
--eol Show <eolinfo> and <eolattr> of files. <eolinfo> is the file content identification used by Git when the "text" attribute is "auto" (or not set and core.autocrlf is not false). <eolinfo> is either "-text", "none", "lf", "crlf", "mixed" or "".
"" means the file is not a regular file, it is not in the index or not accessible in the working tree.
<eolattr> is the attribute that is used when checking out or committing, it is either "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf". Since Git 2.10 "text=auto eol=lf" and "text=auto eol=crlf" are supported.
Both the <eolinfo> in the index ("i/<eolinfo>") and in the working tree ("w/<eolinfo>") are shown for regular files, followed by the ("attr/<eolattr>").
--sparse If the index is sparse, show the sparse directories without expanding to the contained files. Sparse directories will be shown with a trailing slash, such as "x/" for a sparse directory "x".
--format=<format> A string that interpolates %(fieldname) from the result being shown. It also interpolates %% to %, and %xx where xx are hex digits interpolates to character with hex code xx; for example %00 interpolates to \0 (NUL), %09 to \t (TAB) and %0a to \n (LF). --format cannot be combined with -s, -o, -k, -t, --resolve-undo and --eol.
-- Do not interpret any more arguments as options.
<file> Files to show. If no files are given all files which match the other specified criteria are shown.
OUTPUT git ls-files just outputs the filenames unless --stage is specified in which case it outputs:
[<tag> ]<mode> <object> <stage> <file>
git ls-files --eol will show i/<eolinfo><SPACES>w/<eolinfo><SPACES>attr/<eolattr><SPACE*><TAB><file>
git ls-files --unmerged and git ls-files --stage can be used to examine detailed information on unmerged paths.
For an unmerged path, instead of recording a single mode/SHA-1 pair, the index records up to three such pairs; one from tree O in stage 1, A in stage 2, and B in stage 3. This information can be used by the user (or the porcelain) to see what should eventually be recorded at the path. (see git‐read‐tree(1) for more information on state)
Without the -z option, pathnames with "unusual" characters are quoted as explained for the configuration variable core.quotePath (see git‐config(1)). Using -z the filename is output verbatim and the line is terminated by a NUL byte.
It is possible to print in a custom format by using the --format option, which is able to interpolate different fields using a %(fieldname) notation. For example, if you only care about the "objectname" and "path" fields, you can execute with a specific "--format" like
git ls-files --format='%(objectname) %(path)'
FIELD NAMES The way each path is shown can be customized by using the --format=<format> option, where the %(fieldname) in the <format> string for various aspects of the index entry are interpolated. The following "fieldname" are understood:
objectmode The mode of the file which is recorded in the index.
objectname The name of the file which is recorded in the index.
stage The stage of the file which is recorded in the index.
eolinfo:index, eolinfo:worktree The <eolinfo> (see the description of the --eol option) of the contents in the index or in the worktree for the path.
eolattr The <eolattr> (see the description of the --eol option) that applies to the path.
path The pathname of the file which is recorded in the index.
EXCLUDE PATTERNS git ls-files can use a list of "exclude patterns" when traversing the directory tree and finding files to show when the flags --others or --ignored are specified. gitignore(5) specifies the format of exclude patterns.
Generally, you should just use --exclude-standard, but for historical reasons the exclude patterns can be specified from the following places, in order:
1. The command-line flag --exclude=<pattern> specifies a single pattern. Patterns are ordered in the same order they appear in the command line.
2. The command-line flag --exclude-from=<file> specifies a file containing a list of patterns. Patterns are ordered in the same order they appear in the file.
3. The command-line flag --exclude-per-directory=<name> specifies a name of the file in each directory git ls-files examines, normally .gitignore. Files in deeper directories take precedence. Patterns are ordered in the same order they appear in the files.
A pattern specified on the command line with --exclude or read from the file specified with --exclude-from is relative to the top of the directory tree. A pattern read from a file specified by --exclude-per-directory is relative to the directory that the pattern file appears in.
SEE ALSO git‐read‐tree(1), gitignore(5)
GIT Part of the git(1) suite
Git 2.40.1 05/18/2023 GIT-LS-FILES(1)

Wildmatch

Git ls-files can list all files whose name matches a pattern. The specification for the filenames that are to be listed is called a wildmatch.

The same wildmatch syntax is used in .gitignore files, with the exception that git-ls-files does not support negative patterns.

The next section describes a way for you to determine how matching actually works for your specific situation.

  • A blank line matches no files, so it can serve as a separator for readability.

  • A line starting with # serves as a comment. Put a backslash ("\") in front of the first hash for patterns that begin with a hash.

  • Trailing spaces are ignored unless they are quoted with backslash ("\").

  • An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined. Put a backslash ("\") in front of the first "!" for patterns that begin with a literal "!", for example, "\!important!.txt".

  • The slash "/" is used as the directory separator. Separators may occur at the beginning, middle or end of the .gitignore search pattern.

  • If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.

  • If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.

  • For example, a pattern doc/frotz/ matches doc/frotz directory, but not a/doc/frotz directory; however frotz/ matches frotz and a/frotz that is a directory (all paths are relative from the .gitignore file).

  • An asterisk "*" matches anything except a slash. The character "?" matches any one character except "/". The range notation, e.g. [a-zA-Z], can be used to match one of the characters in a range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.

Two consecutive asterisks ("**") in patterns matched against full pathname may have special meaning:

  • A leading "**" followed by a slash means match in all directories. For example, "**/foo" matches file or directory "foo" anywhere, the same as pattern "foo". "**/foo/bar" matches file or directory "bar" anywhere that is directly under directory "foo".

  • A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

  • A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.

  • Other consecutive asterisks are considered regular asterisks and will match according to the previous rules.

Configuration

The optional configuration variable core.excludesFile indicates a path to a file containing patterns of file names to exclude, similar to $GIT_DIR/info/exclude. Patterns in the exclude file are used in addition to those in $GIT_DIR/info/exclude.

Upper Case FAT Filenames

I record many videos with my digital mirrorless cameras, such as the Sony ZV E1 and the Lumix S5-II. These video files can be quite large. I want to store them in Git LFS.

I get many of these video files by removing the SD card from a camera, placing the SD card in a memory card reader, and copying the files to a computer.

Most commercially available memory cards use the FAT32 file system. The FAT file system stores "8.3" filenames as uppercase (8 characters for the file name, 3 characters for the file type). The uppercase nature of these file names is preserved by macOS and Windows when copying files from memory cards. This is why many of my video files are named entirely in uppercase. Audio files recorded on audio recorders (like a Zoom H4N Pro) are also stored on memory cards and have uppercase filenames.

However, if I tether the camera to a computer with a USB cable, then video files recorded in this way will be named entirely in lowercase.

It is therefore important to specify wildmatch patterns for media files in upper- and lowercase.

File Name Case Sensitivity

By default, Git is case-sensitive, but the underlying file systems (like NTFS on Windows and HFS+ on macOS) can be case-insensitive.

Wildmatch patterns are case-sensitive. Some media file names are in uppercase because they come from devices that use the FAT file system. That means you should to define two patterns for media files: an uppercase pattern and a lowercase pattern. It is a good habit to define both at the same time.

You could disable case sensitivity for the current repository, however, doing so might introduce problems.

Shell
$ git config core.ignorecase true

You can disable file name case sensitivity for all repositories for this OS user account, but again, doing so may invite problems:

Shell
$ git config --global core.ignorecase true

Instead of disabling file name case sensitivity, I suggest you declare two versions of every wildmatch pattern for media files: one in uppercase and one in lowercase.

For media (video and audio files), specify patterns in lowercase and uppercase:

Shell
$ git ls-files "*.mp4" "*.MP4"

Specifying Subdirectories and Filename Prefixes

Wildmatch patterns do not automatically recurse through subdirectories. For example, the pattern "*.zip" just applies to the ZIP files in the same directory that the pattern was evaluated for.

Shell
$ git ls-files "*.zip"

You can restrict the listed to specific directories. For example, the following specifies that only files managed by git in the big/ directory should be listed:

Shell
$ git ls-files "big/*"

You can further restrict the files that will be listed by adding filetypes. The following only lists the files in the big/ directory with a zip filetype:

Shell
$ git ls-files "big/*.zip"

Similarly, filename prefixes can be used in a wildmatch. Furthermore, you can specify more than one wildmatch pattern on a command line. The following lists all the files in the xray directory, and only the zip files in the zulu directory that begin with alpha:

Shell
$ git ls-files "xray/*" "zulu/alpha*.zip"

Recursive Matching

The **/ pattern specifies that at least one level of subdirectory is required for a match.

In the following example, a file at path a/b/c/d/blah.zip would match, but blah.zip would not match.

Shell
$ git ls-files "a/**/*.zip"

Matching A Pattern Everywhere

If you want a match to apply to every directory in the git repository, write the wildmatch specification twice, once for the current directory, and once for all subdirectories.

Shell
$ git ls-files "*.zip" "**/*.zip"

For media files, also specify the pattern(s) in uppercase:

Shell
$ git ls-files "*.mp4" "*.MP4" "**/*.mp4" "**/*.MP4"

As you can see, dealing with media files requires a long, redundant command that is difficult to type without making an error. The wildmatch permutation script addresses that problem.

Testing Wildmatch Patterns

git ls-files only matches wildmatch patterns against files in the Git object store and the Git index. New files in the Git working directory that have not yet been added to the Git index via git add will not be listed by git ls-files.

To test patterns using git ls-files:

Commit and push your work before messing around with wildmatch patterns. You do not want your experimentation to mess up your repository!

Shell
$ git add -A
$ git commit -m 'A commit message'
$ git push
$ git status # Ensure nothing got missed On branch master Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean

Make empty test files. Touch is a good command for this purpose.

Shell
$ mkdir -p dir1/dir2
$ touch test1.zip dir1/test2.zip dir1/dir2/test3.zip

Add the test files to the Git index. Do not commit these files, and definitely do not push them to a remote. The Git client just needs to be made aware of these files; it is unnecessary to actually version them for testing.

Shell
$ git add test1.zip dir1/test2.zip dir1/dir2/test3.zip

Experiment with patterns using git ls-files.

Shell
$ git ls-files "*.zip" "**/*.zip"
dir1/dir2/test3.zip
dir1/test2.zip
test1.zip 

Discard all changes made while testing. Once you have refined the wildmatch patterns, you no longer need or want the test files.

Shell
$ git reset --hard HEAD
HEAD is now at f1acdab - 
$ git clean -fd

See the git reset and git clean man pages for more information.

Wildmatch Permutation Script

Here is a handy little script, called wp, that can help you with all that repetitive typing.

wp
#!/bin/bash

function help {
  echo "$( basename "$0" ) ("wildcard permute")
Permute a token into a more general git ignore / git lfs pattern.
See https://mslinn.com/git/5300-git-lfs-patterns-tracking.html

Syntax:
$ $( basename "$0" ) TOKEN ...

The above returns:
'*.token' '*.TOKEN' '**/*.token' '**/*.TOKEN'

Example of actual usage (notice the backticks):
$ git ls-files \`$( basename "$0" ) mp4\`

The above is equivalent to typing:
$ git ls-files '*.mp4' '*.MP4' '**/*.mp4' '**/*.MP4'

If you type an uppercase token, the output is the same:

$ git ls-files \`$( basename "$0" ) MP4\`

If you do not like typing backticks, you can use this syntax instead:
$ git ls-files \$($( basename "$0" ) mp4)

You can specify more than one token:
$ git ls-files \`$( basename "$0" ) mp4 mp3\`

The above is equivalent to typing:
$ git ls-files '*.mp4' '*.MP4' '**/*.mp4' '**/*.MP4' '*.mp3' '*.MP3' '**/*.mp3' '**/*.MP3'
"
  exit 1
}

if [ -z "$1" ]; then help; fi

function doit {
  LC="${1,,}"
  UC="${1^^}"
  echo "*.$LC *.$UC **/*.$LC **/*.$UC"
}

for X in "$@"; do
  doit "$X"
done

After downloading the above bash script into a directory on your PATH, for example /usr/local/bin, remember to make the script executable:

Shell
$ chmod a+x /usr/local/bin/wp

Here is the help message:

Shell
$ wp
wp (wildmatch permute)
Permute a token into a git ignore / git lfs pattern.
See https://mslinn.com/git/5300-git-lfs-patterns-tracking.html

Syntax:
$ wp TOKEN

The above returns:
"*.token" "*.TOKEN" "**/*.token" "**/*.TOKEN"

Example of actual usage (notice the backticks):
$ git ls-files `wp mp4`

The above is equivalent to typing:
$ git ls-files "*.mp4" "*.MP4" "**/*.mp4" "**/*.MP4"

If you type an uppercase token, the output is the same:

$ git ls-files `wp MP4`

If you do not like typing backticks, you can use this syntax instead:
$ git ls-files $(wp mp4) 

Using the wp Script

The following example lists all files added to the Git repository with an mp4 or MP4 filetype. Notice the backticks (`).

Shell
mslinn@bear $ git ls-files `wp mp4`
video1.mp4 video2.mp4 

The above is equivalent to typing:

Shell
mslinn@bear $ git ls-files "*.mp4" "*.MP4" "**/*.mp4" "**/*.MP4"
video1.mp4 video2.mp4 

If you do not like backticks, you can use the following $(equivalent syntax):

Shell
mslinn@bear $ git ls-files $(wp mp4)
video1.mp4 video2.mp4 

I have published 7 articles about the Git large file system (LFS). They are meant to be read in order.

  1. Git Large File System Overview
  2. Git LFS Client Installation
  3. Git LFS Server URLs
  4. Git-ls-files and Wildmatch
  5. Git LFS Filename Patterns & Tracking
  6. Git LFS Client Configuration & Commands
  7. Working With Git LFS
  8. Evaluation Procedure For Git LFS Servers
  9. Git LFS server tests:
    1. Null Git LFS Server

6 articles are still in process.

Instructions for typing along are given for Ubuntu and WSL/Ubuntu. If you have a Mac, most of this information should be helpful.

* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.