Git submodule tutorial – from zero to hero

Git allows a repository to include other Git repositories as submodules. However, the submodules mechanism is complicating several workflows that are usually straightforward, like committing or checking out changes. This Git submodule tutorial explains the concept of submodules and helps you navigate the most important workflows. All steps and commands are demonstrated by example, using a typical use-case for submodules: a client/server architecture.

Introduction

Git submodules allow you to include a Git repository within another Git repository, by reference. A common use-case is shared code. Suppose you write an application using the client-server architecture. The client serializes data which is deserialized by the server. Both client and server need access to the data models. However, you don’t want to duplicate the model code to both repositories. With Git submodules, you can create a separate repository named common that contains the models and other common code, and reference it in your client and server repository.

Beware of increased complexity

While Git submodules provide a powerful mechanism to structure your repositories and avoid code duplication, there are caveats! Submodules make checking out and committing code more complex and error-prone.

In this article I’ll try to make the concepts and commands as clear as possible, and also provide a few tips specific to the Git interface of JetBrains IDEs (e.g. PyCharm).

Anatomy of submodules

Let’s consider the following graphical illustration of the example I presented in the Introduction section.

Git submodule example scenario

The two “parent” repositories named client and server both reference the repository named common as submodule. The .gitmodules file states which submodules the parent repository has. For each submodule, it contains the relative “mount point” path in the parent repository (which is src/common in this case), and the Git remote URL, which the Git binary needs, for it to know where to clone the submodule from. Git will fully clone the submodule, s.t. the content of folder src/common is a regular Git repository (with a .git subfolder, history, etc.). Note: the .gitmodules file is a simple text file, committed to each parent repository as regular file, even though it starts with “.git”.

While the .gitmodules file contains information regarding which submodules there are, and where to clone them from, it does not contain the specific commit of the submodule that should be checked out by the parent repository. This information is part of the regular commits of the parent repository. From the perspective of the parent repository, Git pretends that the mount point path, such as src/common, is a virtual file, whose content is the commit hash of the submodule that should be checked out. Whenever the commit currently checked out in the src/common repository is different than the one contained in the virtual src/common file, Git will detect this as a difference. Git allows you to commit this difference, as a new commit, to the parent repository.

Working with submodules

Let’s examine the three main tasks which are either new, or have changed, because you now use submodules. These are creating a submodule, adopting a submodule, and changing and committing code.

Creating a submodule (for the first time)

The following steps illustrate how to create a submodule, reference it from a parent repository, and commit and push the changes. The steps are oriented towards the client/server/common example I introduced above. If you are doing this for the very first time, I recommend you practice these steps on a toy project. Also practice the other steps (adopting submodules, etc., see below).

  • Discuss with your teammates whether using submodules actually makes sense. There might be better alternatives for managing dependencies, e.g. tools like Gradle (Java), pip (Python) or NPM (JavaScript). In case you have access to private repositories/registries of such mechanisms, you could just push your versioned common module to those, and let these tools handle the dependencies, rather than using Git submodules.
  • If you still want to use Git submodules, think of a sensible directory structure, as shown in the image above. Git submodules are similar to Linux mount points. When using a file manager (e.g. your IDE) the transition between the parent repository (e.g. server) and the submodule (common) is seamless and unnoticed! When the submodule is integrated into the parent repositories, the directory structure needs to “make sense”, in all parent repositories. Beware that some programming languages have rules regarding the file contents and the file’s directory structure. For instance, a Java source file must be in folder a/b/c if its content starts with package a.b.c;.
  • Create a bare git repository for the common submodule. Repositories you create on platforms like GitHub or Gitlab are such bare repositories. If you’re testing things locally, just create an empty directory on your file system and use git init --bare to initialize it.
  • Fill the common repository, so that it has at least a master branch with one commit. Note: if you’re not starting from scratch but want to move existing files from a parent repository into the common submodule, then read this article.
  • Perform the following steps for each parent repository (client and server) respectively:
    • Choose a relative path where to include the common submodule, e.g. src/common – in this example I chose this path to be the same for every parent repository, but there might be cases where different paths are more appropriate.
    • In a terminal window, switch to the parent Git repository’s root and type
      git submodule add <common repository .git URL> <path where to init submodule>
      e.g. git submodule add https://host.com/common.git src/common
      This command does two things: it creates the .gitmodules file described above in the repository root. It then initializes the common submodule, creating the src/common directory and checking out the common code (latest commit on master).
    • Document that this repository is using a Git submodule, e.g. in your README.md file (see section adopting a submodule for the commands your teammates need to use)
    • Add the detected changes to the Git index, and commit and push them.
  • Tell all your teammates that they need to use Git submodules from now on. Shout it at them, send them emails, Slack messages, update your internal wiki, etc. Be prepared that there will always be someone who did not get the memo and will complain that “Nothing works no more!!11″…
    • Note: Many Git GUI-based tools automatically detect submodules and offer to initialize them for you – at least when cloning a repository for the first time. However, you should not rely on this behavior and therefore forego documentation. Some of your teammates might just use the shell, or a GUI client which does not have this functionality.

Adopting a submodule (done by your teammates)

Your teammates (or other external developers) need to perform these steps when updating their existing parent repository, or when cloning the parent repository for the first time:

  • In a terminal window, change to the root of the parent Git repository.
  • Pull the latest changes. You should now see the .gitmodules file, but the sub-directory src/common won’t exist yet.
  • Type git submodule init which initializes all submodules not initialized yet (in this case: the common repository is initialized)
  • Type git submodule update which fills the src/common directory. It pulls the code of the common repository, and checks out the specific commit referenced by the virtual src/common file. This commit may very well be an older commit, i.e., it may differ from the HEAD of the common repository’s master branch.
    • To reiterate: on your file system, the common submodule is actually a normal Git repository. Once you change your working directory to src/common, the normal Git commands apply, like git status.
    • Note: by default Git checks out submodules, like common, in detached HEAD mode. This is problematic once you want to commit changes – more details in the next section.
  • From now on, git pull should also update your submodules (that is, git pull calls git submodule update for you in case it detects a change in the virtual src/common file). I recommend that you verify this behavior for your installed Git version. Git evolves over time, and the behavior of older versions may be different.

For users of JetBrains’ IDEs

Make sure to close and re-open the project after performing the above steps. You should see both the parent and submodule repositories in the Settings → Version Control window.

JetBrains IDEs show an indicator for the currently checked out commit (or branch name) in the bottom right corner. This indicator is actually context-aware. It will change, depending on which file you have opened. If the file is from the a submodule, it indicates the current commit hash (or branch name / tag) of that submodule.

Changing and committing code

In our client-server scenario, you will typically open the parent Git repositories in your IDE – not the submodule. One of the caveats of Git submodules is that all the files you see in the file manager (or IDE) will appear as if they belonged to the parent Git repository – even though some of them belong to submodules. If you change files of a submodule, git status (run in the parent repository) won’t detect them. The command’s output will only show you that the virtual src/common file has changed, but won’t list the specific files that changed in the common repository.

To avoid that you forget or overlook such changes, I recommend that you either use the command
git submodule foreach git status
or to use a graphical tool that is aware of multiple repositories at once. The JetBrains IDEs are an example. You can configure the commit and log window to graphically group the detected changed files by repository. This makes it easier to identify which repository is affected by each file change. The image sequence below explains how to activate this group by repository feature.

Let’s assume you did change files of a submodule. To commit the changes, follow these steps:

  1. Make sure you are using the correct submodule commit to base your new commit on. Ideally, this should never be a problem. But it could be, if you manually switched branches on the parent repository in the past, and forgot to update the Git submodules in the process.
    • Type git diff <submodule directory>, e.g. git diff src/common
      The first line will indicate the commit hash that the parent repository references, the second line indicates the commit hash you have actually checked out. This coincides with the output of git submodule status.
    • If the output is something like
      -Subproject commit d56e8bdb2beb830e19f6740ddad1e18f4837736b
      +Subproject commit d56e8bdb2beb830e19f6740ddad1e18f4837736b-dirty

      then you are good to go. This is just a pseudo-change. Proceed with step 2.
    • If the commit hashes of the output differ, this means that your submodule did not have the commit checked out that was expected by the parent repository. Thus, your changes are not applied to the right set of files. Use the commands git stash (while you are in the submodule’s root directory) to stash your changes, followed by git submodule update (run in the parent repository directory) to get your submodule to the right commit, and then git stash pop (in the submodule directory) to re-apply your changes. Verify that your changes still look correct.
  2. Verify that your submodule is not in detached HEAD mode – if it were, you could not commit your changes. JetBrains IDEs will indicate this with a small yellow warning triangle in the Git branch indicator in the bottom right corner. If you use the shell, switch to the submodule’s root directory and type git symbolic-ref HEAD. If the output is fatal: ref HEAD is not a symbolic ref then you are in detached HEAD mode, otherwise the output indicates the branch you are on. If you are in detached HEAD mode, you have two options:
    • Examine the submodule’s Git log, maybe the HEAD of an existing branch name (e.g. master) already points to the same commit hash as the one you already have checked out. In this case, just switch to that branch, with git checkout <branchname>
    • Otherwise, create a new (temporary or permanent) branch and switch to it, e.g. with git checkout -b temp
  3. Commit the changes of the submodule. This produces a new commit hash.
  4. If you created a temporary branch in the previous step, you may want to get rid of it. But first, make sure your new commit is still accessible in the submodule, e.g. by tagging it.
  5. The command git diff src/common (run in the parent repository root directory) will now indicate two different commit hashes. The second line should list the commit hash you just produced in step 3. Stage and commit the changed src/common file. Your parent repository now references the new, updated commit of the submodule.
  6. Push the commits of the submodule and the parent repository at the same time.

In case everything worked, now is a good time to also update your other parent repositories. For instance, if you had changed your server repository with the above instructions, you would now..

  • Open your client repository,
  • git fetch the changes of your common submodule,
  • Checkout the new submodule commit you created in step 3 above,
  • Commit and push the changes the above commands caused for the virtual src/common file of the client project.

Hints and tricks

Before I conclude, here are a few tips when working with Git submodules:

  • Whenever you manually switch branches on the parent repository, make sure to run git submodule update after each switch, to avoid that your submodules become out of sync.
  • If you use feature branches in your parent repositories, consider to also create corresponding feature branches in the submodule(s). Otherwise you may lose track of your changes. Once you are ready to merge the feature, first merge it in the submodule(s), followed by the parent repositories.
  • Use git submodule update --checkout to get rid of the pseudo-change of the virtual submodule file/directory (src/common in the above example). This forces Git to check out the referenced commit in detached HEAD mode.
  • If you changed the .gitmodules file, e.g. because you migrated repositories to a different remote URL, then all teammates need to run git submodule sync followed by git submodule update to ensure that the updates are applied properly. This applies even to your own machine: if you migrated one specific (locally cloned) parent repository to a different remote URL, you also have to run above commands in all other parent repositories (client, server, etc.) on your own machine!

Conclusion

With the tips given in this article, you should be able to navigate the depths and pitfalls of Git submodules. Nevertheless, submodules make many Git workflows more complicated. If possible, consider other dependency mechanisms instead.

Leave a Comment