Collaborating on a project hosted on GitHub
Abstract
Git is a powerful version control system allowing to record, access, and restore the history of projects.
After setting up remotes on the internet or other network, Git is also a mighty collaboration tool.
In this workshop, we will use the popular online Git repository hosting site GitHub to practice a collaboration workflow typical of many research teams.
Software requirements
1 - Properly configured Git
You can download Git here if you are on Windows and here if you use MacOS or Linux.
These minimum configurations should be set properly:
- your user name,
- your email address,
- your preferred text editor,
- the end of line formatting matching your operating system.
2 - GitHub account
- If you don't already have one, create a free GitHub account. - Create a personal access token or—if you don't want to create a token and type your username + token all the time—set SSH for your account.
Prerequisites
Basic knowledge of Git:
- familiarity with the concept of staging area,
- experience with staging and committing,
- basic knowledge of branches.
Collaborating on a project hosted on GitHub
Here is our scenario today:
You just started a collaboration with a research team outside UBC for one of your thesis chapters.
You have a first meeting on Zoom and during that meeting, one of your new collaborators tells you casually:
I just added you to our repo. Feel free to push.
But… horror… while you have used Git a little bit before, you have no idea how to collaborate with them this way.
Today's workshop will guide you through this so that you know how to handle collaborations through GitHub without panic (or embarrassment 😉). At the bottom of this page, you will also find two extra sections for other common collaboration scenarios, although we won't cover these today.
We will use a toy repository to practice. You can find it here.
1. Clone the repository
First of all, you need to clone the repository, meaning, you get a copy of the repository on your local machine. A repository is the name of a project put under version control. It includes the history of all versioned files.
cd /suitable/location/
git clone https://github.com/prosoitos/git_workshop_collab.git
# enter your GitHub username and token when prompted
# alternatively, if you have set SSH for GitHub
git clone git@github.com:prosoitos/git_workshop_collab.git
2. Inspect your new repository
Let's explore this new repo to understand its structure and see what your new collaborators have done before inviting you to the project.
If you run ls
, you should see that the repo you just cloned was added in your current working directory.
Now, enter it:
cd git_workshop_collab
Project files
Have a look at the files that your collaborators have created. You can use ways you are familiar with such as Windows Explorer, Finder, etc., or you can use the command line:
ls
Since this is a Git repository, it contains a .git
subdirectory. In Unix-like systems, dot files and dot directories are hidden files. In order to see hidden files/directories, you need to add the a
(all) flag:
ls -a
Project history
It might be useful to have a look at the history of this project to get a feel for your collaborators workflow.
For this, you can use git log
followed by various flags to customize the output to make it more readable.
For instance:
git log --graph --oneline --all
To view the various flags for the git log
command, you can type:
man git-log
# If you are on Windows, use instead:
git help log
Project remote
Your new repository is still closely associated with the repository on GitHub. In fact, the repo on GitHub is now the remote for your repo and that remote was automatically named origin by Git.
What are remotes, really?
Remotes are copies of a project that reside outside it and are connected to it so that data can be synced back and forth. "Outside" can be anywhere, including on an external drive, or even on the same machine. If you want your remotes to serve as backups, you want them outside your machine. And if you want your remotes to allow for collaboration, you want them on a network your collaborators have access to. One option, of course, is the internet.
A project can have several remotes. An address (or a path if they are local) specifies their location.
A number of online Git repository managers have become popular remote hosting sites. These include GitHub, GitLab, and Bitbucket.
You can list the remotes of a project with:
git remote
Here, our project has only one remote called origin
.
To have the address of the remote, you can add the v
(verbose) flag:
git remote -v
The GitHub repository now serves as a syncing hub between your local repository (that you just cloned) and that of each of your collaborators.
Managing remotes
You can rename a remote with:
git remote rename <old-remote-name> <new-remote-name>
You can delete a remote with:
git remote remove <remote-name>
You can change the url of a remote with:
git remote set-url <remote-name> <new-url> [<old-url>]
3. Keep the project up to date
As you work on this collaboration, you will have to download changes made by your collaborators to the project in order to keep your local copy up to date.
To download new changes (new commits) from the remote, you have 2 options: git fetch
and git pull
.
Fetching changes
Fetching downloads the data that you don't already have, from your remote, into your local clone.
git fetch <remote-name>
The branches on the remote are now accessible locally as <remote-name>/<branch>
. You can inspect them or you can merge them into your local branches.
Example: To fetch from your new GitHub remote, you would run:
git fetch origin
Merging changes
After fetching the changes, you need to incorporate (merge) them into your local branch. This is done with:
git merge FETCH_HEAD
FETCH_HEAD
is a temporary file (in .git/FETCH_HEAD
) with references to all the branches that just got fetched from the remote and a special notation for the remote branch corresponding to the branch you fetched from. git merge FETCH_HEAD
will merge that branch into your current branch.
Pulling changes
Pulling does both of these at once: it fetches the data and it merges the changes onto your local branches.
git pull <remote-name> <branch>
Example
git pull origin main
When you clone a repository, your local branch is set to track the equivalent remote branch, so in our case, you can simply run:
git pull
4. Work on the project
Now, it's time to start working on the project.
Once you have made changes to the project (e.g. you edited or added some files, staged the changes, and created new commits), you will have to upload those changes to the remote.
Review of Git basics: staging and committing
After you make changes to a project, you first need to stage them with:
git add <some-files-or-sections-of-files>
Then you create a commit:
git commit -m "<some-message-describing-the-commit>"
For an introduction to Git, you can have a look at our Git intro course, in particular the section explaining the functioning of Git.
Uploading commits to the remote is called pushing and is done with:
git push <remote-name> <branch-name>
To push your branch main
to the remote origin
:
git push origin main
Again, because the cloning process associates your local branch with its remote equivalent, you can simply run:
git push
Before you are allowed to push to the project, if the project changed upstream (i.e. if one of your collaborators pushed changes of their own onto the remote), you will have to pull (or fetch and merge) their changes as we saw above.
Now… what if your local changes and the changes that your colleagues pushed to GitHub conflict with each other?
In that case, the merging process will get interrupted and Git will give you an error message. You have to resolve the conflict before the merge can be finalized.
Resolving conflicts
There are many tools (e.g. Emacs, Vim, some GUI) to resolve conflicts. But you actually don't need any of them, they just make it nicer with keybindings and syntax highlighting.
To resolve the conflict in any text editor, open each file in which conflicts occurs (when a merge gets interrupted due to a conflict, Git tells you which files contain conflicts) and look for sections that look like this:
<<<<<<< HEAD
Version of this section of the file on the current branch
=======
Alternative version of the same section of the file
>>>>>>> alternative version
The <<<<<<< HEAD
, =======
, and >>>>>>>
are markers added by Git to identify the alternative versions at the location of a particular conflict.
You have to decide which version you want to keep (or write yet another version), remove the 3 lines with the markers, and remove the line(s) with the version(s) you do not want to keep.
Once you have done this for all conflicts, save the file. And once you have done this for all files in which conflicts occurred, stage those files. You can now create a commit to finalize the merge.
Extra 1
You create a project and want others to contribute to it
Another common scenario is this: you are the one creating a project and inviting colleagues to collaborate to it. This first extra section covers how this goes.
Let's quickly create a project:
cd /location/of/new/project
mkdir myproject
cd myproject
echo "This is our great project" > README
This is the content of our project:
ls -a
. .. README
Then, let's put it under version control with Git:
git init
You can see that this is now a Git repository:
ls -a
. .. .git README
Let's create a first commit:
git add README
git commit -m "Initial commit: add README"
Now, you need to create a remote on GitHub.
First, you need to create a new GitHub repository.
Create an empty repository on GitHub
Go to https://github.com,
login, and go to your home page (https://github.com/<user>
).
From there, select the Repositories tab, then click the green New button.
Enter the name you want for your repository, without spaces. It can be the same name you have for your project on your computer (it would be sensible and make things less confusing), but it doesn't have to be.
You can make your repository public or private. Choose the private option if your research contains sensitive data or you do not want to share your project with the world. If you want to develop open source projects, of course, you want to make them public.
Then, you have this empty repository on GitHub, but it is not connected to your local repository.
Add the new GitHub repository as a remote
Click on the Code green drop-down button, select SSH (if you have set SSH for your GitHub account) or HTTPS (if you haven't) and copy the address.
Then, go back to your command line, cd
inside your project if you aren't already there and add your remote.
You add a remote with:
git remote add <remote-name> <remote-address>
<remote-name>
is only a convenience name that will identify that remote. You can choose any name, but since Git automatically call the remote origin
when you clone a repository, it is common practice to use origin
as the name for the first remote.
<remote-address> is the address of your remote in the https form or—if you have set SSH for your GitHub account —the SSH form.
Example (using an SSH address):
git remote add origin git@github.com:<user>/<repo>.git
In our case:
git remote add origin git@github.com:<user>/myproject.git
Example (using an HTTPS address):
git remote add origin https://github.com/<user>/<repo>.git
In our case:
git remote add origin https://github.com/<user>/myproject.git
(Type: git remote add origin
, then paste the address you have just copied on GitHub).
Finally, if you want to grant your collaborators write access to the project, you need to add them to it (note that you don't have to give them write access: we will see later how one can contribute to a project without having write access to it. But if you are involved in a serious collaboration with others on a project, you might want to facilitate the process by letting them edit the project directly).
Invite collaborators to a GitHub repository
- Go to your GitHub project page
- Click on the Settings tab
- Click on the Manage access section on the left-hand side (you will be prompted for your GitHub password)
- Click on the Invite a collaborator green button
- Invite your collaborators with one of their GitHub user name, their email address, or their full name
Extra 2
You want to contribute to a project for which you don't have write access
Finally, another common scenario is very similar to the first one (you are invited to collaborate to an existing project), with the difference that you are not granted write access to the project on GitHub. This extra section covers the workflow in this case.
If you do not have write access to a remote, you cannot push to it and you need to submit a pull request (PR). Here is a summary of the workflow:
- Fork the project
- Clone your fork on your machine (this will automatically set your fork as a remote to your new local project and that remote is automatically called
origin
) - Add a second remote, this one pointing to the initial project. Usually, people call that remote
upstream
- Pull from
upstream
to make sure that your contributions are made on an up-to-date version of the project - Create and checkout a new branch
- Make and commit your changes on that branch
- Push that branch to your fork (i.e.
origin
— remember that you do not have write access onupstream
) - Go to the original project GitHub's page and open a pull request from your fork. Note that after you have pushed your branch to origin, GitHub will automatically offer you to do so.
Fork the repository
First, go to GitHub and fork the project by clicking on the Fork button in the top right corner.
Clone your fork
Then, navigate to the directory in which you want to clone the project and clone your fork.
Add the initial project as upstream
git remote add upstream <address-of-initial-project>
From there on, you can:
- Pull from
upstream
(the repository to which you do not have write access and to which you want to contribute). This allows you to keep your fork up-to-date. - Push to and pull from
origin
(this is your fork, to which you have read and write access).
Pull request
You are now ready to submit pull requests: push your development branch to your fork, then go to the original project on GitHub and open a pull request from there (at this point GitHub will offer you to do so).
The maintainer of the original project may accept or decline your PR. They may also make comments and ask you to make changes. If so, make new changes and push additional commits to that branch.
Once the PR is merged by the maintainer, you can delete the branch on your fork and pull from upstream
to update your local fork with the recently accepted changes.