Version Control with Git and GitHub

Reference links:

Table of contents

What Version Control is and why you should care

Version control stores a complete history of the codebase, including information about what changed, when and by whom. This has many advantages:

Git, GitHub and related concepts

Standard Git operations

Git is a command-line tool, to be precise a collection of such tools. We will describe those tools in this section. It is worth noting however, that there are various graphical interfaces that you can use to access most of these same functions. You can find many of those, many free, here. GitKraken in particular looks quite nice. We start with a brief list of the various "verbs":

Some of these are more advanced. We will now examine them in more detail. This section is broken into parts:

Starting a repository

There are two main ways to start a repository. You can turn any directory in your computer into a repository via git init:

git init

This will turn the current directory into a repository. The other way to start a repository is to git clone an existing repository. For instance the following line will clone the GitHub repository hosting this file into a directory called tutorial-git:

git clone https://github.com/Hanover-CS/tutorial-version-control.git tutorial-git

This will download the entire state of the repository into that folder, and also set up remote repository links from our local clone to the GitHub repository. This is typically the easiest way around; create a repository in GitHub first, then clone it to your computer.

Reviewing repository state

There couple of tools help you with determining your current repository state.

Status

First off is git status.

$ git status
On branch gh-pages
Your branch is up-to-date with 'origin/gh-pages'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    modified:   README.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   README.md

This tells us some key information:

There is also a "short" version of the output:

$ git status -s
MM README.md

The first M says that there are changes in this file that are staged, the second says that there are also changes that are not staged yet.

Log

git log shows us the history of recent logs. It has lots of options that you can find out by doing git log --help or by reading the documentation. A particularly nice option is the "oneline" logs:

$ git log --oneline
3fcd19b Create gh-pages branch via GitHub
8e8ec9e Creating git sections
31b4fa1 Create gh-pages branch via GitHub
ad68152 Start on state section
35362d7 Initial Commit

This simply tells us, in chronological order from the most recent to the oldest commit, what the commit's hash is and what the message is.

You can do a lot more with log, like restrict it to only showing commits that changed a particular file:

$ git log --oneline -- README.md
8e8ec9e Creating git sections
ad68152 Start on state section
35362d7 Initial Commit

Diff

git diff allows us to visually see the differences between commits. Its output might look something like this:

$ git diff
diff --git a/README.md b/README.md
index 953c5ed..ab7f849 100644
--- a/README.md
+++ b/README.md
@@ -138,6 +138,30 @@ MM README.md
 this is the stuff that was in the file before

-this line was there before and is now removed. The minus in front tells us that.
+this is a new line we are adding.
+This too. And the empty line right below. The plus tells us these are new.
+
 this line existed before

A lot of this information is not for human consumption, but you can see that it describes what was added and where in the file it was added: All the lines starting with a plus are new, all the ones starting with a minus are removed, and the others were there before. Most graphical program have a nice color-rich way of showing these differences.

Making Commits

Add

You can stage files for commit via git add:

$ git add README.md

This line makes all the changes that were in README.md to the staging area, to be committed.

If you want to add all the current changes from all files, you can use a dot to refer to the current directory:

$ git add .

Finally if you want to add parts of a file, you can do an interactive add, that brings up a mini interface. You would want to read the documentation on how that works.

$ git add -i

Many GUI clients will offer you a way to do that directly. We will be looking at how we can do this with GitKraken in class.

Commit

To commit the staged changes, we use git commit:

$ git commit -m "Enter message here"

You can also add the staged changes onto the last commit, instead of creating a new one. Very useful if you forgot to include a file:

$ git commit --amend

Be advised that this would change the last commit you have made. If you have already pushed/synced that commit to the remote repository, bad things can happen. Never make changes to commits that have been synced to a remote!

Reset

Resetting allows you to back out of a commit you made that you did not mean to make. It is in general a dangerous operation, and you should only use it if you are sure you know what you are doing. There are three kinds of resets:

$ git reset --mixed

This is the "mixed" reset, which is the default. This will "unstage" all files that were staged for commit. It does not change or delete the files, it simply no longer marks them as ready for commit. We say that it changes the index, but it does not change the working directory.

$ git reset --soft HEAD~

This is a "soft" reset. It does not change your index or your working directory. Files that were staged remain staged, and files that have changes keep those changes. But it does change the commit that is at the HEAD, to instead point to the commit called HEAD~, which is a notation for the commit before HEAD. Essentially this undoes the last commit. It does not lose the files that that last commit did. It simply puts them back to the "unstaged files" section. It would be as if you had not run git commit yet.

$ git reset --hard

This is a destructive step. It will completely remove what is in the index and working directory, and set them both to the HEAD (or another commit if you specify that at the end of the line). You will lose all the changes you have made. On rare occasions, this is the right thing to do. But it should be avoided. It is however useful if you wanted to "rewind" the current branch to a previous commit, because you realized you did not need it. The documentation offers the following example:

$ git branch experiment
$ git reset --hard HEAD~3
$ git checkout experiment

So what happens here is that we realized that our last 3 commits should have gone in a new branch, called experiment, instead of our current branch. The first line creates that new branch based off the current commits. The second line resets our current branch to go back 3 commits. The third line then switches us to the experiment branch to continue working on that.

Managing branches

There are a number of different operations on branches. A branch is nothing more than a chain of commits, marked by a pointer to the newest commit in the branch. As far as Git is concerned the only information in the branch is its name and that pointer to the newest commit.

Creating a branch

We create a new branch via the git branch command:

$ git branch test

This creates a branch based off the current commit, and calls it test. You could then use git checkout to switch to it. We could create a new branch and switch to it all at once with the command:

$ git checkout -b test

You can have the branch based off a different commit by specifying it as a next argument.

Switching to a branch

We use git checkout to switch to a branch:

$ git checkout test

Viewing the different branches

We can get information about the existing branches via the -v switch in git branch:

$ git branch -v
* gh-pages 45e13b6 [ahead 1] Update readme
  master   35362d7 Initial Commit
  test     45e13b6 Update readme

The asterisk marks the current branch. We see the name of the branch, the latest commit in it, the message of that commit, and in brackets the possible relation of the branch with a remote branch.

There is also a -vv option that shows us more about the relation to remote branches, along with the names of those branches.

$ git branch -vv
* gh-pages 45e13b6 [origin/gh-pages: ahead 1] Update readme
  master   35362d7 [origin/master: gone] Initial Commit
  test     45e13b6 Update readme

Deleting a branch

You can delete a branch via the -d switch:

$ git branch -d test

This does NOT actually delete any commits, it just removes the pointer to the commit that used to be at the front of the test branch. This may however result in commits being deleted, when git does its self-cleaning, which is a form of garbage collection: Any commits that are not reachable in some form starting from one of the branches are subject to deletion. Deleting a branch may put a number of commits in this situation, if the branch has not been merged into another branch.

Merging

There are a number of situations when merging is called for, where you have two branches that have diverged in some way and you want to bring them back together. There are two common situations:

This is a complicated situation, with many possible solutions, each with its tradeoff. We will discuss them extensively in the advanced sections.

Working with remotes

Working with remote repositories is a key element of Git, which focuses on "distributed version control". Here we discuss these key elements.

A remote repository is actually a special part of your repository that contains a copy of another repository. You can fetch information from that other repository, and store them in the "remote repository", which is really like a branch in your local repository. You can then decide how best to merge those changes with the work you have been doing in your local repository.

Viewing the remotes

You can see the remote repositories you have access to via git remote:

$ git remote -v
origin  https://github.com/Hanover-CS/tutorial-version-control.git (fetch)
origin  https://github.com/Hanover-CS/tutorial-version-control.git (push)

You see here in our case that there is a remote called origin, which points to a GitHub repository. You will notice that there are two different addresses, one for fetching/pulling and one for pushing. While the addresses for those two are typically the same, in certain workflows they are not.

You can also view the branches from the remote that you have decided to work with locally, via something like:

$ git remote show origin
* remote origin
  Fetch URL: https://github.com/Hanover-CS/tutorial-version-control.git
  Push  URL: https://github.com/Hanover-CS/tutorial-version-control.git
  HEAD branch: gh-pages
  Remote branch:
    gh-pages tracked
  Local branches configured for 'git pull':
    gh-pages merges with remote gh-pages
    master   merges with remote master
  Local ref configured for 'git push':
    gh-pages pushes to gh-pages (fast-forwardable)

This gives us information about which branches are set up to work on git pull and which are set up to work with git push, along with some other information.

The branches can be accessed with something like origin/gh-pages and in many ways behave like normal branches. We can for instance review the work in such a branch with:

$ git log origin/gh-pages --oneline

Tracking a remote branch

If you want to start tracking a new remote branch, you can do so with:

$ git checkout --track origin/newbranchName

This will create a new branch called newbranchName and set it to track the remote branch, so we can easily push and pull.

Pushing and Pulling

In order to push your commits to the remote branch, you would use git push, with or without specifying the repository (it will use the one that the current branch is tracking if you don't specify) and also optionally specifying a specific branch to push:

$ git push origin master

This will only work if the remote branch isn't ahead of the local branch. If someone has pushed some changed to the remote in the meantime, you would be asked to fetch those changes first and incorporate them into your work, before pushing.

In order to bring new changes in, you have to "fetch" the remote branch:

$ git fetch origin

This will update the "local" remote branch with any changes from the server, but it would not yet merge these changes into your local branch. In other words, this would update origin/gh-pages but not our gh-pages branch. That would require a further merge. What you want most of the time is a git pull:

$ git pull origin

This fetches and then merges. Oftentimes what you want is instead a rebase:

$ git pull --rebase

This is particularly useful: Say that you created some commits, and in the meantime someone pushed their changes to the remote repository. Your changes have nothing to do with theirs, so if you had pulled their changes first before creating your commits you would have had a nice linear structure to the commits. But instead you now have two diverging commits. What "rebasing" will do is fetch the remote changes, then "replay" your changes as if they had happened after those remote changes. This way you can maintain the linear structure of the commits. Rebasing is an important technique, and we will discuss it further in the advanced sections.

Standard GitHub utilities

In GitHub you review commits, and mostly create and discuss issues.

You should create a GitHub account and treat it as part of your resume. Setting up an account is fairly straightforward and we will not cover it further.

You will also likely need to generate SSH tokens, as described here.

GitHub has extensive documentation. We will only cover some basics here.

When you look at a repository for a project in GitHub, the main page contains the a number of different tabs:

Reviewing code and commits in GitHub

The Code tab allows you to navigate your codebase. By default it shows what happens in the current branch, but you can change that. There are a number of actions you can take from the Code tab:

GitHub Issues

The Issues tab allows us to work with issues and set tasks. To begin with you see a list of open issues, and you can filter and sort the list in various ways, as well as globally operate on multiple issues.

GitHub Projects

TODO

More advanced processes

TODO