Git Submodules: Managing Repositories Inside Repositories

Submodules allow you to include and track external repositories inside your project.

Git Submodules: Managing Repositories Inside Repositories

Git submodules allow you to include one Git repository as a subdirectory inside another Git repository while keeping the history of both repositories completely separate. This is essential when your project depends on external libraries, shared components, or services that have their own independent version control. Submodules let you lock these dependencies to specific commits, giving you precise control over which version of the external code your project uses.

Without submodules, you would have to manually copy external code into your repository, losing the ability to track updates, or use package managers that may not give you the same level of version control. Submodules solve this by treating the external repository as a reference rather than copying its files directly. The parent repository stores only a reference to the specific commit of the submodule, not the files themselves. To understand submodules properly, it is helpful to be familiar with Git core concepts, branching fundamentals, and working with remote repositories.

Git submodules in simple terms:
Parent Repository (my-project)
        │
        ├── src/
        ├── tests/
        └── libs/
            └── external-library/  ← Submodule (points to another repo)
                    │
                    └── (Separate Git history, tracked independently)

Parent repo only stores: "Use commit abc123 from external-library repo"

What Are Git Submodules

A Git submodule is a reference to another Git repository embedded as a subdirectory in your main repository. Unlike a regular directory, the submodule directory is not tracked by the parent repository's Git. Instead, the parent repository stores a pointer to a specific commit in the external repository. When someone clones the parent repository, they must explicitly initialize and update the submodules to fetch the actual files.

Submodules are useful when you want to include a library or component that is developed independently from your main project. You can update the submodule to a newer version when you are ready, and the parent repository records exactly which version you are using. This gives you reproducibility: anyone cloning your repository at a specific commit will get the exact same version of the submodule.

  • Independent History: The submodule has its own commit history, branches, and remotes, completely separate from the parent repository.
  • Version Pinning: The parent repository records a specific commit hash of the submodule, not a branch name.
  • Explicit Updates: Updating a submodule to a newer version requires an explicit command and creates a new commit in the parent repository.
  • Recursive Operations: Most Git commands have a --recursive flag to operate on submodules as well.

Why Git Submodules Matter

Submodules solve the problem of dependency management in Git. They allow you to include external code while maintaining control over exactly which version you use and keeping the history of that external code separate from your own.

  • Reproducible Builds: The parent repository records the exact commit of each submodule, ensuring that everyone uses the same version of dependencies.
  • Independent Versioning: Submodules can be updated independently of the parent repository, allowing you to pull in bug fixes without affecting your main project history.
  • No Code Duplication: Instead of copying external code into your repository, submodules reference the original source, saving space and avoiding duplication.
  • Separation of Concerns: Submodule code remains in its own repository with its own issues, pull requests, and release cycles.
  • Flexible Workflows: You can contribute back to submodule repositories directly from within your parent project checkout.

Adding a Submodule

To add a submodule to your repository, use the git submodule add command followed by the URL of the external repository and optionally the local path where it should be placed.

# Add a submodule
git submodule add https://github.com/library/external-lib.git libs/external-lib

# Git will:
# 1. Clone the external repository into libs/external-lib
# 2. Add a .gitmodules file to track submodule information
# 3. Stage the submodule reference (the commit hash) for commit

# Commit the submodule
git commit -m "Add external-lib as a submodule"

After adding a submodule, Git creates a .gitmodules file in the root of your repository. This file stores the mapping between the local path and the remote URL of each submodule. It should be committed to your repository so that others can clone the submodules.

Example .gitmodules file:
[submodule "libs/external-lib"]
    path = libs/external-lib
    url = https://github.com/library/external-lib.git
    branch = main

Cloning a Repository with Submodules

When you clone a repository that contains submodules, the submodule directories are created but remain empty by default. You must initialize and update the submodules to fetch their actual content.

# Clone the parent repository
git clone https://github.com/username/my-project.git
cd my-project

# Initialize and update submodules (two-step process)
git submodule init      # Registers submodules from .gitmodules
git submodule update    # Fetches and checks out the specific commits

# Or do both in one command
git submodule update --init --recursive

# Clone and init submodules in one step
git clone --recursive https://github.com/username/my-project.git

Updating Submodules

Submodules do not update automatically when you pull changes in the parent repository. You must explicitly update them to the commit recorded in the parent repository.

# After pulling changes in the parent repo
git pull

# Update submodules to the commits recorded in the parent
git submodule update --recursive

# Update submodules to the latest commit on their tracked branch
git submodule update --remote

# Update a specific submodule
git submodule update --remote libs/external-lib

Working Inside Submodules

You can navigate into a submodule directory and work with it as if it were a standalone repository. This allows you to make changes to the submodule, commit them, and push them to its remote repository.

# Navigate into the submodule
cd libs/external-lib

# Work normally (checkout branches, make changes, commit)
git checkout main
git pull origin main
# ... make changes ...
git add .
git commit -m "Add new feature to library"
git push origin main

# Go back to parent repository
cd ../..

# The parent repo sees that the submodule has changed
git status
# modified: libs/external-lib (new commits)

# Stage and commit the submodule reference update
git add libs/external-lib
git commit -m "Update external-lib to latest version"

Removing a Submodule

Removing a submodule requires several steps to completely clean up all references.

# Step 1: Remove the submodule entry from .gitmodules
git config -f .gitmodules --remove-section submodule.libs/external-lib

# Step 2: Remove the submodule entry from .git/config
git config --remove-section submodule.libs/external-lib

# Step 3: Unstage and remove the submodule directory
git rm --cached libs/external-lib
rm -rf libs/external-lib

# Step 4: Remove from .git/modules
rm -rf .git/modules/libs/external-lib

# Step 5: Commit the changes
git commit -m "Remove external-lib submodule"

Common Submodule Commands

Command Purpose
git submodule add <url> <path> Add a new submodule
git submodule init Initialize submodules from .gitmodules
git submodule update Fetch and checkout the recorded commits
git submodule update --remote Update submodules to latest commit on tracked branch
git submodule foreach <command> Run a command in each submodule
git submodule status Show the current commit of each submodule
git submodule sync Update submodule URLs from .gitmodules
Using foreach to run commands in all submodules:
# Pull latest changes in all submodules
git submodule foreach 'git pull origin main'

# Checkout a branch in all submodules
git submodule foreach 'git checkout develop'

# Show status of all submodules
git submodule foreach 'git status'

Submodules vs Alternatives

Approach How It Works Best For
Git Submodules References external repository at a specific commit External libraries with independent versioning, shared components across multiple projects
Git Subtrees Copies external repository files into your repo When you want to modify external code locally, or need to share code with teams that cannot access the external repo
Package Managers Use language-specific tools (npm, Composer, pip) Most common approach for dependencies; easier for developers but less Git integration
Manual Copy Copy files manually into the repository One-off inclusion where updates are not expected

Common Submodule Mistakes to Avoid

  • Forgetting --recursive When Cloning: Cloning without --recursive leaves submodule directories empty. Always use --recursive or remember to run git submodule update --init.
  • Committing Submodule Changes Without Pushing: The parent repo records a commit hash. If you commit a new submodule hash without pushing that submodule commit, others cannot update.
  • Working on a Detached HEAD: By default, submodules are checked out at a specific commit, not on a branch. This puts you in detached HEAD state. Navigate into the submodule and checkout a branch if you plan to make changes.
  • Not Updating Submodules After Pull: When you pull changes in the parent repo that update submodule references, you must run git submodule update to actually update the submodule files.
  • Using Branches Instead of Commits: Submodules track commits, not branches. The branch setting in .gitmodules is only used for --remote updates.
  • Nested Submodules: Submodules can contain their own submodules, but this quickly becomes complex to manage. Avoid deep nesting when possible.

Submodules Best Practices

  • Use Stable Commit References: Point submodules to stable releases or specific commit hashes, not moving branches, to ensure reproducible builds.
  • Document Submodule Workflows: Add a section to your README explaining how to clone, update, and work with submodules.
  • Use Relative URLs When Possible: For submodules hosted on the same server, use relative URLs in .gitmodules so that different protocols (ssh/https) work automatically.
  • Update Submodules Explicitly: Always update submodules deliberately and commit the new hash in the parent repo. Do not automate submodule updates in CI without review.
  • Keep Submodule Depth Shallow: For large submodules, use --depth 1 when cloning to save time and disk space.
  • Prefer Subtrees for Simple Dependencies: If you do not need independent submodule history, consider using git subtree instead, which is simpler for some use cases.

Frequently Asked Questions

  1. What is the difference between a submodule and a regular directory?
    A regular directory is tracked directly by the parent repository. A submodule is a reference to an external repository; the parent repository stores only a commit hash, not the actual files. Submodules have their own independent Git history, branches, and remotes.
  2. Why do submodule directories show as modified after pulling?
    When you pull changes in the parent repository, the submodule reference (commit hash) may be updated. However, the actual submodule files are not automatically updated. You must run git submodule update to sync the files to the new commit.
  3. Can I have a submodule that points to a branch instead of a commit?
    Submodules always record a specific commit hash. However, you can configure a submodule to track a branch using the branch setting in .gitmodules, then use git submodule update --remote to update to the latest commit on that branch.
  4. How do I change the URL of a submodule?
    Edit the .gitmodules file and run git submodule sync to update the submodule's remote URL. Then commit the changes to .gitmodules.
  5. What is the difference between submodules and subtrees?
    Submodules store a reference to an external repository. Subtrees copy the external repository's files directly into your repository. Subtrees are simpler for users but duplicate history. Submodules keep history separate but require extra steps for cloning and updating.