Exploring Orphaned Branches to Understand Git's Internals
by Christoph Schiessl on DevOps and Git
You have to be aware of Git's internal data structure, before you can understand what orphaned branches are and how to use them. So, let's start by examining this data structure. Surprisingly, this is easier than it sounds. In fact, there are only two "rules" you have to know.
- All commits in a Git repository have a list of references to their parent commits. Ordinary commits have precisely one, whereas commits created with
git merge
have multiple parents. The number of parent commits is theoretically unlimited. - Cycles in child-parent relationships between commits are forbidden. In other words, a commit's ancestors must never include the commit itself.
If you take a minute to reflect on these two rules, you'll realize that there has to be at least one commit with zero parents. It's logically impossible to simultaneously observe both rules without having at least one of these so-called root commits.
What we have so far is a full description of Git's internal data structure, which is well-known as a directed acyclic graph in computer science. All of Git's features can be explained with the graph because they are all built on top of it. Git's command-line interface, as well as third-party GUIs, are just tools to make working with the underlying graph easier.
Branches and History
Branches are nothing more than named references pointing to specific commits in your repository's graph. The most important feature of branches is that they provide an entry point into your repository's history.
Fine, so how exactly does repository history relate to the graph? Well, it all comes down to the child-parent relationships between commits. For example, if you execute the command git log master
, it gives you a list of commits in reverse chronological order. The most recent commit (i.e., HEAD
of the master
branch) is followed by its parents, which in turn is followed by its parents and so on. The directed acyclic graph guarantees that any given commit has a finite number of ancestors: If you traverse through a commit's chain of ancestors, you have to hit a root commit eventually. Therefore, git log
's output is guaranteed finite.
Orphaned Branches
The crucial point of this article is the following: There's nothing in the definition of the directed acyclic graph forbidding the existence of multiple root commits. As a matter of fact, that's precisely what makes orphaned branches possible.
Most repositories have a history resembling the following:
i---j---k <== branch 'your-wip-feature'
/
a---b---c---d---h---l <== branch 'master'
\ /
e---f---g <== branch 'your-completed-feature'
As you can see, most commits have precisely one parent commit. The only exceptions are a
(root commit with zero parents) and h
(merge commit with two parents).
With orphaned branches, you can completely separate your branches, because they are starting from different root commits. Therefore, you can build repositories like this:
a---b---c---d <== branch 'master'
1---2---3---4 <== branch 'independent-history'
Creating additional root commits or orphaned branches can be accomplished with git checkout
, together with its --orphan
option. Quoting Git's documentation:
The first commit made on this new branch will have no parents, and it will be the root of a new history, totally disconnected from all the other branches and commits.
In other words, it creates a new root commit and uses it as a starting point for your new branch.
Conclusion
The usefulness of orphaned branches is limited to rare occasions. For instance, if you are using GitHub Pages, it creates a new branch named gh-pages
for you. This branch is "orphaned" and, therefore, completely separate from your repository's prior history.
Thank you for reading, and have a nice day!