Mike Slinn
Mike Slinn

Partial Clone of Upstream

Published 2023-03-30.
Time to read: 2 minutes.

This page is part of the git collection, categorized under Sinatra, git.

I wanted to make changes to a subdirectory of a large git project, but I did not want to have the entire project stored on my device. The git sparse checkout feature allowed me to do that.

The project I wanted to work on was Sinatra-ActiveRecord and I wanted to play with the sample project for sqlite. The sample project was very small (too small to be useful, actually!), so it made no sense to fill my computing device with data that was not needed.

I started by making an empty git repo.

Shell
$ mkdir sinatra-activerecord-sqlite

$ cd sinatra-activerecord-sqlite

$ git init

I wanted to eventually create two git remotes:

  • upstream – pointing to the original git repo, sinatra-activerecord/sinatra-activerecord.
  • origin – pointing to a new repo in my GitHub account that will contain the complete original repo's contents and history, plus my changes. This repo will be called mslinn/sinatra-activerecord-sqlite.

When you git push from a sparse clone to origin, the content of the entire original repo is copied to the new repository, as modified by any changes that you made. Sparse checkout means that for this local repo instance, only portions of the original repo are checked out.

However, the integrity of the entire original repo is maintained. If someone else checks out the new repository, without performing the sparse checkout procedure, they will receive all of the contents of the original repo.

This is how I defined the upstream remote:

Shell
$ git remote add --no-tags -t master -f upstream \
  https://github.com/sinatra-activerecord/sinatra-activerecord.git
Updating upstream
  remote: Enumerating objects: 1450, done.
  remote: Counting objects: 100% (232/232), done.
  remote: Compressing objects: 100% (125/125), done.
  remote: Total 1450 (delta 82), reused 204 (delta 74), pack-reused 1218
  Receiving objects: 100% (1450/1450), 229.40 KiB | 2.76 MiB/s, done.
  Resolving deltas: 100% (543/543), done.
  From https://github.com/sinatra-activerecord/sinatra-activerecord
   * [new branch]      master                                 -> upstream/master 

The git project that is being cloned has a lot of tags. In the above command, I used the ‐‐no‐tags option to suppress the downloading of all tags. The ‐t master option further restricted the clone, so only the master branch was fetched.

Now for the magic incantations that enables and defines this git’s sparse checkout:

shell continued
$ git config core.sparseCheckout true

$ echo "/example/sqlite" >> .git/info/sparse-checkout

It is now possible to pull down just the contents of the /example/sqlite directory from the upstream remote:

shell continued
$ git pull upstream master
remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
From https://github.com/sinatra-activerecord/sinatra-activerecord
  * branch            master     -> FETCH_HEAD 

Here are the files and directories that I just sparsely cloned from the repo:

shell continued
$ tree
.
└── example
    └── sqlite
        ├── Gemfile
        ├── README.md
        ├── Rakefile
        ├── app.rb
        ├── bin
        │   └── rake
        ├── config
        │   └── database.yml
        ├── config.ru
        └── db
            ├── development.sqlite3
            ├── migrate
            │   ├── 20140415201712_create_users.rb
            │   └── 20140415204542_create_posts.rb
            ├── schema.rb
            ├── seeds.rb
            ├── structure.sql
            └── test.sqlite3

6 directories, 14 files 

Next I used the GitHub CLI to create a repo in my GitHub account for containing the complete repo, along with my modifications.

shell continued
$ gh repo create --public --source=. --remote=origin
✓ Created repository mslinn/sinatra-activerecord-sqlite on GitHub
✓ Added remote git@github.com:mslinn/sinatra-activerecord-sqlite.git 
😁

The above gh repo create command automatically names the repo from the current directory name.

I do this so often that I defined 2 bash aliases in ~/.bash_aliases:

~/.bash_aliases
alias gh_new_private='gh repo create --private --source=. --remote=origin'
alias gh_new_public='gh repo create --public --source=. --remote=origin'