git Notes

    Chances are that you may need to know some git when using fastai - for example if you want to contribute to the project, or you want to undo some change in your code tree. This document has a variety of useful recipes that might be of help in your work.

    While this guide is mostly suitable for creating PRs for any github project, it includes several steps specific to the project repositories.

    The following instructions use USERNAME as a github username placeholder. The easiest way to follow this guide is to copy-n-paste the whole section into a file, replace USERNAME with your real username and then follow the steps.

    All the examples in this guide are written for working with the fastai repository. If you’d like to contribute to other fastai-project repositories, just replace fastai with that other repository name in the instructions below.

    Also don’t get confused between the fastai github username, the fastai repository, and the fastai module directory, where the python code resides. The following url shows all three, in the order they have been mentioned:

    Below you will find detailed steps towards creating a PR.

    1a. First time

    If you made the fork of the desired repository already, proceed to section 1b.

    If it’s your first time, you just need to make a fork of the original repository.

    1b. Subsequent times

    If you make a PR right after you made a fork of the original repository, the two repositories are aligned and you can easily create a PR. If time passes the original repository starts diverging from your fork, so when you work on your PRs you need to keep your master fork in sync with the original repository.

    You can tell the state of your fork, by going to and seeing something like:

    1. This branch is 331 commits behind fastai:master.

    So, let’s synchronize the two:

    1. Place yourself in the master branch of the forked repository. Go back to a repository you checked out earlier and switch to the master branch:

      1. cd fastai
      2. git checkout master
    2. Sync the forked repository with the original repository:

      1. git fetch upstream
      2. git checkout master
      3. git merge --no-edit upstream/master
      4. git push

      Now you can branch off this synced master branch.

      Validate that your fork is in sync with the original repository by going to https://github.com/USERNAME/fastai and checking that it says:

      1. This branch is even with fastai:master.

      Now you can work on a new PR.

    Step 2. Create a Branch

    It’s very important that you always work inside a branch. If you make any commits into the master branch, you will not be able to make more than one PR at the same time, and you will not be able to synchronize your forked master branch with the original without doing a reset. If you made a mistake and committed to the master branch, it’s not the end of the world, it’s just that you made your life more complicated. This guide will explain how to deal with this situation.

    1. Create a branch with any name you want, for example new-feature-branch, and switch to it. Then set this branch’s upstream, so that you could do git push and other git commands without needing to pass any more arguments.

      1. git checkout -b new-feature-branch
      2. git push --set-upstream origin new-feature-branch

    Step 6. Push Your Changes

    1. When you’re happy with the results, commit the new code:

      1. git commit -a

      -a will automatically commit changes to any of the repository files.

      If you created new files, first tell git to track them:

      1. git add newfile1 newdir2 ...

      and then commit.

    2. Finally, push the changes into the branch of your fork:

      1. git push

    Step 8. Passing CI Tests

    Once your PR was submitted, you will see on github that we have various tests running on CI servers that will validate your PR.

    How to Keep Your Feature Branch Up-to-date

    Normally you don’t need to worry about updating your feature branch to synchronize with the fastai code base (upstream master). The only time you must perform the update is when the same code you have been working on has undergone changes in the master. So when you submit a PR, github will tell you that there is a merge conflict.

    You could update your feature branch directly, but it’s best to update the master branch of your fork, first.

    • Step 1: sync your forked master branch:

      1. cd fastai
      2. git fetch upstream
      3. git checkout master
      4. git merge --no-edit upstream/master
      5. git push --set-upstream origin master
    • Step 2: update your feature branch my-cool-feature:

      1. git checkout my-cool-feature
      2. git merge origin/master
    • Step 3: resolve any conflicts resulting from the merge (using your editor or a special merge tool), followed by git add to the files which had conflict.

    • Step 4: push to github the updates to your branch:

      1. git push

      If your PR is already open, github will automatically update its status showing the new commits and the conflict shouldn’t be there any more if you followed the steps above.

    How To Reset Your Forked Master Branch

    If you haven’t been careful to create a branch, and committed to the master branch of your forked repository, you no longer will be able to sync it with the original repository, without resetting it. And when you will want to create a branch, it’ll have issues during PR, since it will be made against a diverged origin.

    Of course, the brute-force approach is to go to github, delete your fork (which will delete any of the work you have done on this fork, including any branches, so be very careful if you decided to do that, since there will be no way to recover your data).

    A much safer approach is to reset the HEAD of your forked master with the HEAD of the original repository:

    If you haven’t setup up the upstream, do it now:

    1. git remote add upstream [email protected]:fastai/REPONAME.git

    and then do the reset:

    1. git fetch upstream
    2. git update-ref refs/heads/master refs/remotes/upstream/master
    3. git checkout master
    4. git stash
    5. git reset --hard upstream/master
    6. git push origin master --force

    Where am I?

    Now that you have the original repository, the forked repository and its branches how do you know which of the repository and the branch you are currently in?

    • Which repository am I in?

      1. git config --get remote.origin.url | sed 's|^.*//||; s/.*@//; s/[^:/]+[:/]//; s/.git$//'

      e.g.: stas00/fastai

    • Which branch am I on?

      1. git branch | sed -n '/* /s///p'

      e.g.: new-feature-branch7

    • Combined:

      1. echo $(git config --get remote.origin.url | sed 's|^.*//||; s/.*@//; s/[^:/]+[:/]//; s/.git$//')/$(git branch | sed -n '/* /s///p')

      e.g.: stas00/fastai/new-feature-branch7

    But that’s not a very efficient process to constantly ask the system to tell you where you are. Why not make it automatic and integrate this into your bash prompt (assuming that use bash).

    bash-git-prompt

    Enter bash-git-prompt, which not only tells you which virtual environment you are in and which username, repo, branch you’re on, but it also provides very useful visual indications on the state of your git checkout - how many files have changed, how many commits are waiting to be pushed, whether there are any upstream changes, and much more.

    I currently work on 4 different fastai project repositories and 4 corresponding forks, and several branches in all of them, so I was very lost until I started using this tool. To give you a visual of various prompts I have as of this writing:

    1. (pytorch-dev) /fastai/ci-experiments [fastai/fastai:ci-experiments6]>
    2. (pytorch-dev) /fastai/linkcheck [fastai/fastai:master]>
    3. (pytorch-dev) /stas00/fork [stas00/fastai:master3]>
    4. (pytorch-dev) /fastai/wip [fastai/fastai:master|+2?10·3]>

    The numbers after the branch are modified/untracked/stashed counts. The leading (pytorch-dev) is the currently activated conda env name.

    If you’re not using bash or fish shell, search for forks of this idea for other shells.

    Github Shortcuts

    • show commits by author: ?author=github_username

      You can filter commits by author in the commit view by appending param ?author=github_username.

      For example, the link https://github.com/fastai/fastai/commits/master?author=jph00 shows a list of commits jph00 commits to the fastai repository.

    • show commits by range: [[email protected]](https://docs.fast.ai/cdn-cgi/l/email-protection){time}..master

      You can create a compare view in GitHub by using the URL github.com/user/repo/compare/{range}. Range can be two SHAs like sha1…sha2 or two branch names like master…my-branch. Range is also smart enough to take time into consideration.

      For example, you can filter a list of commits since yesterday by using format like [[email protected]](https://docs.fast.ai/cdn-cgi/l/email-protection){1.day.ago}…master. The link , for example, gets all commits since yesterday for the fastai repository:

    • show .diff & .patch

      Add .diff or .patch to the URLs of compare view, pull request or commit page to get the diff or patch in text format.

      For example, the link https://github.com/fastai/fastai/compare/[email protected]{1.day.ago}…master.patch gets the patch for all the commits since yesterday in the fastai repository.

    • line linking

      In any file view, when you click one line or multiple lines by pressing SHIFT, the URL will change to reflect your selections. You can tell others to look at a specific line of code, or a specific chunk of code, using just that link.

    • delete a fork

      1. Go to github.com/USERNAME/FORKED-REPO-NAME/
      2. Hit Settings
      3. Scroll down and hit [Delete this repository]

      replace, USERNAME with your github username, and FORKED-REPO-NAME with the repository name

    Revisions

    relative refs

    1. master^ = the first parent of master
    2. master^^ = the first grandparent of master
    3. ~<num> - several commits

    add

    1. git add [folder/file]

    remove

    1. git rm [folder/file]

    remove remote file copy only. e.g. remove database.yml that is already checked in but leaving the local copy untouched. This is intensively handy for removing ignored files that are already pushed without removing the local copies.

    1. git rm --cached database.yml
    1. git status
    1. git status -s

    push

    1. git push

    dry-run (do everything except for the actually sending of the data)

    1. git push --dry-run

    but it doesn’t show anything useful - see commands below for visual hints of what will happen

    show which files have changed and view the diff compared to the remote master branch HEAD

    1. git diff --stat --patch origin master

    list of files to be pushed

    1. git diff --stat --cached [remote/branch]

    show code diff of the files to be pushed

    1. git diff [remote repo/branch]

    show full file paths of the files that will change

    1. git diff --numstat [remote repo/branch]

    commit

    1. git commit -a

    -a is crucial as w/o it you need to git add every file that has changed!

    There is also -A, but careful using it, as it’ll add any tracked files, which is probably not what you want most of the time. Better forget about this option.

    authentication

    cache auth

    1. git config --global credential.helper cache

    adjust caching time

    1. git config --global credential.helper 'cache --timeout=36000'

    update

    1. git pull

    git pull is shorthand for

    1. git fetch
    2. git merge FETCH_HEAD

    display the incoming/outgoing changes before pull/push

    1. git log ^master origin/master
    2. git log master ^origin/master

    search/replace

    How to safely and efficiently search/replace files in git repo using CLI. The operation must not touch anything under .git/

    1. find . -type d -name ".git" -prune -o -type f -exec perl -pi -e 's|OLDSTR|NEWSTR|g' {} ;

    but it touch(1)es all files which slows down git-side

    so we want to do it on files that actually contain the old pattern

    1. grep --exclude-dir=.git -lIr "OLDSTR" . | xargs -n1 perl -pi -e 's|OLDSTR|NEWSTR|g'

    git GUI

    git

    1. git gui

    gitk

    1. gitk --all

    contributors

    show a list of contributors ordered by number of commits. Similar to the contributors view of GitHub.

    search git history

    to find all commits where commit message contains given word, use

    1. git log --grep=word_to_search_for

    to search all of git history for a string

    1. git log -Sword_to_search_for

    this will find any commit that added or removed the string password. Here are a few extra options:

    • -p: will show the diffs. If you provide a file (-p file), it will generate a patch for you.
    • -G: looks for differences whose added or removed line matches the given regexp, as opposed to
    • -S, which “looks for differences that introduce or remove an instance of string”.
    • --all: searches over all branches and tags; alternatively, use --branches[=<pattern>] or --tags[=<pattern>]

    search and exclude certain paths from the results:

    exclude subfolder foo

    1. git log -- . ":(exclude)foo"

    exclude several subfolders

    1. git log -- . ":(exclude)foo" ":(exclude)bar"

    exclude specific elements in that subfolder

    1. git log -- . ":(exclude)foo/bar/file"

    exclude any given file in that subfolder

    1. git log -- . ":(exclude)foo/*file"
    2. git log -- . ":(exclude,glob)foo/*file"

    make exclude case insensitive

    1. git log -- . ":(exclude,icase)FOO"

    which branch contains a specified sha key

    1. git branch contains SHA

    choose a commit rev from one branch (e.g. PR) and merge it the current checkout

    1. git show <commit> # check that this is the right rev
    2. git cherry-pick <commit> # merge it into the current checkout
    3. git push

    to merge a range of commits:

    1. git cherry-pick <commit1>..<commitN>

    cherry picking parts of a commit (only sections/hunks and not whole files)

    1. git cherry-pick -n <commit> # get your patch, but don't commit (-n = --no-commit)
    2. git reset # unstage the changes from the cherry-picked commit
    3. git add -p # make all your choices (add the changes you do want)
    4. git commit # make the commit!

    similar to the above 4 commands - interactive picking (-p == –patch)

    1. git checkout -p <commit>

    and if only changes for specific files are wanted:

    1. git checkout -p <commit> -- path/to/file_a path/to/file_b

    cherry-pick another git repo (can use sha1 instead of FETCH_HEAD)

    1. git fetch <remote-git-url> <branch> && git cherry-pick FETCH_HEAD

    abort the started cherry-pick process, which will revert to the previous state

    1. git cherry-pick --abort

    checkout

    checkout a specific commit

    1. git checkout <sha1>/or-short-hash

    check out a specific branch

    1. git clone https://github.com/vidartf/nbdime -b optimize-diff2

    overwrite local changes

    If you want to remove all local changes from your working copy, simply stash them:

    1. git stash push --keep-index

    or if it’s important you can name it

    1. git stash push "your message here"

    to merge the local changes saved with ‘git stash push’ after ‘git pull’

    1. git stash pop

    if the merge fails, it doesn’t get removed from the stash.

    once merge conflict is manually removed, need to manually call:

    1. git stash drop

    If you don’t need them anymore, you now can drop that stash:

    1. git stash drop

    to override all local changes and does not require an identity:

    1. git reset --hard
    2. git pull

    or:

    1. git checkout -t -f remote/branch
    2. git pull

    Discard local changes for a specific file

    1. git checkout dirs-or-files
    2. git pull

    maintain current local commits by creating a branch from master before resetting

    1. git checkout master
    2. git branch new-branch-to-save-current-commits
    3. git fetch --all
    4. git reset --hard origin/master

    pull from upstream and accept all changes blindly

    1. git pull --strategy theirs

    list existing stashes

    1. git stash list

    vies stashes:

    latest

    1. git stash show -p

    specific stash

    1. git stash show -p [email protected]{0}

    show the contents of each stash with one command

    1. git show $(git stash list | cut -d":" -f 1)

    diff against a specific stash

    1. git diff [email protected]{0}

    diff against a specific stash’s filename

    1. git diff [email protected]{0} my/file.ipynb

    diff 2 stashes:

      check out nbdime - diffing and merging of Jupyter Notebooks https://nbdime.readthedocs.io/en/stable/

      branches

      git branch removal (when not checkout’ed inside the branch that’s about to be removed)

      1. git branch -d branch_name

      branch delete via github - after the branch has been merged into the master upsteam, can now delete the branch in my fork at github.com

      1. 1. https://github.com/stas00/fastai/branches

      or go to https://github.com/stas00/fastai/ (and click [NN branches] above [New pull request] button

      1. 1. hit the trash button next to the branch to remove

      list branches that are merged or not yet merged to current branch. It’s a useful check before any merging happens

      1. git branch merged
      2. git branch no-merged

      switch back to last branch (like )

      1. git checkout -

      @{-1} is a way to refer to the last branch you were on. ‘-‘ is shorthand for @{-1} git branch --track mybranch @{-1}, git merge @{-1}, and git rev-parse --symbolic-full-name @{-1} would work as expected.

      or:

      1. git difftool -d master branch_name

      find the diff from their common ancestor to test, you can use … instead of ..:

      1. git diff --stat --color master...branch_name

      to compare just specific files

      1. git diff branch1 branch2 -- myfile1.js myfile2.js

      to compare a sub-directory or specific files across different commits

      1. git diff <rev1>..<rev2> -- dir1 file2

      compare two branches in different repos (e.g. original and github fork)

      given 2 checkouts /path/to/repoA and /path/to/repoB

      1. cd /path/to/repoA
      2. GIT_ALTERNATE_OBJECT_DIRECTORIES=/path/to/repoB/.git/objects git diff $(git --git-dir=/path/to/repoB/.git rev-parse --verify HEAD) HEAD

      another way using GUI with meld (apt install meld)

      1. meld /f1/br/stas00/master/ /f1/br/fastai/master

      find the best common ancestor between two branches, usually the branching point:

      1. git merge-base master origin/branch_name

      same, but returns a short rev instead of the long one

      1. git rev-parse --short $(git merge-base master origin/branch_name)

      alternative (doesn’t always work):

      1. git merge-base --fork-point master origin/branch_name

      note that ‘git merge-base’ returns no output once that branch has been merged to master.

      diff between the branching point and the HEAD of the branch

      1. git diff $(git merge-base --fork-point master origin/branch_name)..origin/branch_name

      commits between the branching point and the HEAD of the branch

      1. git log --oneline $(git merge-base --fork-point master origin/branch_name)..origin/branch_name

      find branches the commit is on

      1. git branch --contains <commit>

      find when a commit was merged into one or more branches.

      1. git when-merged [OPTIONS] COMMIT [BRANCH...]

      some good docs on branching strategies: https://nvie.com/posts/a-successful-git-branching-model/

      reverting/resetting/undoing

      lots of scenarios here: https://blog.github.com/2015-06-08-how-to-undo-almost-anything-with-git/

      revert the last commit

      1. git revert HEAD

      revert everything from the HEAD back to the commit hash 0766c053

      1. git revert --no-commit 0766c053..HEAD
      2. git commit

      this will revert everything from the HEAD back to the commit hash, meaning it will recreate that commit state in the working tree as if every commit since had been walked back. You can then commit the current tree, and it will create a brand new commit essentially equivalent to the commit you “reverted” to.

      (the --no-commit flag lets git revert all the commits at once- otherwise you’ll be prompted for a message for each commit in the range, littering your history with unnecessary new commits.)

      this is a safe and easy way to rollback to a previous state. No history is destroyed, so it can be used for commits that have already been made public.

      if merge happened earlier, revert could fail and ask for a specific parent branch via -m flag to specify which mainline to use

      for details: and https://stackoverflow.com/questions/5970889/why-does-git-revert-complain-about-a-missing-m-option

      revert your repository to a specific revision

      1. git checkout <rev>

      revert only parts of your repository to a specific revision

      1. git checkout <rev> -- dir1 dir2 file1 file2

      Reset branch’s HEAD to a given commit hash

      If somehow the HEAD of the branch got messed up and it got moved to some place in master, when someone by mistake merges it into master, here is how to reset it back. In this example we will use the branch imagenette-noise-lb.

      1. find the last commit that was supposed to be the HEAD, e.g.: https://github.com/fastai/fastai/commit/3ac14751101ff3997bc6b3e26f612d1b6d0ac9ea

        Either use this to help find the right commit:

        1. git log origin/imagenette-noise-lb

        or using github’s branch browse of a given tag ( in this example).

      2. and now reset the branch’s HEAD to it:

        1. git checkout imagenette-noise-lb
        2. git reset --hard g52ff7b4
        3. git push --force origin imagenette-noise-lb

      ignore

      to temporarily ignore changes in a certain file, run:

      1. git update-index --assume-unchanged <file>

      track changes again:

      1. git update-index --no-assume-unchanged <file>

      trace and debug

      check which config comes from where

      1. git config --list --show-origin

      display git attributes for a specific path

      1. git check-attr -a dev_nb/001b_fit.ipynb

      more here: https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git

      trace

      1. GIT_TRACE=1 git pull origin master

      very verbose

      1. set -x; GIT_TRACE=2 GIT_CURL_VERBOSE=2 GIT_TRACE_PERFORMANCE=2 GIT_TRACE_PACK_ACCESS=2 GIT_TRACE_PACKET=2 GIT_TRACE_PACKFILE=2 GIT_TRACE_SETUP=2 GIT_TRACE_SHALLOW=2 git pull origin master -v -v; set +x

      different options:

      1. GIT_TRACE for general traces,
      2. GIT_TRACE_PACK_ACCESS for tracing of packfile access,
      3. GIT_TRACE_PACKET for packet-level tracing for network operations,
      4. GIT_TRACE_PERFORMANCE for logging the performance data,
      5. GIT_TRACE_SETUP for information about discovering the repository and environment its interacting with,
      6. GIT_MERGE_VERBOSITY for debugging recursive merge strategy (values: 0-5),
      7. GIT_CURL_VERBOSE for logging all curl messages (equivalent to curl -v),
      8. GIT_TRACE_SHALLOW for debugging fetching/cloning of shallow repositories.
      9. possible values can include:
      10. true, 1 or 2 to write to stderr,
      11. an absolute path starting with / to trace output to the specified file.

      status and information

      short form log of events

      1. git log --oneline

      show a graph of the tree, showing the branch structure of merges

      1. git log --graph --decorate --pretty=oneline --abbrev-commit

      add --all to show all branches

      show all the commits in a branch that are not in HEAD. e.g. show all commits that are in master but not merged into the current feature branch yet.

      1. git log ..master

      overriding git configuration

      1. git -c http.proxy=someproxy clone https://github.com/user/repo.git
      2. git -c [email protected] -c user.name='Your Name'

      override git diff:

      1. git diff --no-ext-diff

      no such option exists for merge drivers.

      fixing things

      to fix a bad merge: https://stackoverflow.com/questions/307828/how-do-you-fix-a-bad-merge-and-replay-your-good-commits-onto-a-fixed-merge

      “fatal: Unknown index entry format 61740000”.

      when your index is broken you can normally delete the index file and reset it.

      1. rm -f .git/index
      2. git reset

      or you clone the repo again.

      merge strategies

      tell git not to merge certain files (i.e. keep the local version) by defining merge filter ‘ours’.

      https://stackoverflow.com/a/5895890/9201239

      1) add to .gitattributes:

      1. database.xml merge=ours

      2) set git merge driver to do nothing but return success

      1. git config merge.ours.name '"always keep ours" merge driver'
      2. git config merge.ours.driver 'touch %A'
      3. git config merge.ours.driver true

      working and updating the local checkout with upstream changes

      1. clone the remote repository
      2. git checkout -b my_new_feature
      3. ..work and commit some stuff
      4. git rebase master
      5. ..work and commit some stuff
      6. git rebase master
      7. ..finish the feature, commit
      8. git rebase master
      9. git checkout master
      10. git merge --squash my_new_feature
      11. git commit -m "added my_new_feature"
      12. git branch -D my_new_feature

      Aliases

      best to add manually with editor, but can use CLI

      1. .gitconfig
      2. [alias]

      e.g.

      1. git config --global alias.co checkout
      2. git config --global alias.br branch
      3. git config --global alias.ci commit
      4. git config --global alias.st status

      unstage a file (equivalent of: git reset HEAD -- fileA:

      1. git config --global alias.unstage 'reset HEAD --'

      see last commit

      1. git config --global alias.last 'log -1 HEAD'

      use ! for non-git sub-commands in aliases, e.g.:

      Miscellaneous Recipes

      • - visual teaching with exercises