Docker image tagging: best practices in a CI pipeline

This article explains why you should not tag your own Docker images with only the “latest” tag. I discuss alternative best practices, categorizing them into stable vs. unstable version tags. I also fully deconstruct a Docker image tag into its basic components to improve your understanding of Docker image names and tags.

Introduction

When you build a Docker image in a CI pipeline, you need to assign one or more tags to the built image. If you use the Docker build engine (and not something else, such as podman), you would run a command such as
docker build -t mytag .” to build the image, followed by docker push mytag to push the image to some image registry.

The problem: if you start learning Docker, many tutorials on the Internet tell you to let the image tag end with “latest“. Unfortunately, this approach has numerous problems. In this article, I explain strategies you can use for tagging your images, specifically for the last section of the tag, which specifies the image version.

Components of a Docker image tag

Before we dive into the problems of using latest, we first need to understand the components of an image tag. Image tags come in many shapes and forms. Typical examples are:

  • postgres
  • postgres:12.2
  • gcr.io/kaniko-project/executor:v1.6.0

The general form of an image tag is this: [registry.host[:12345]/] [somenamespace/] image-name [:version-tag]

The angle brackets [] indicate optional sections.

Let’s look at the image tag components:

  • registry.host[:someport] specifies the server address, e.g. gcr.io for the third above example. The port can be omitted if the registry is accessible via the default HTTPS port. If the host name and port are omitted, the Docker Hub server, “docker.io“, is used as server, as documented here.
  • somenamespace is typically the name(s) of the project or organization that published the image. If you use the GitLab image registry, this would be the GitLab groups and project name, e.g. “<gitlab-group-name>/<gitlab-project-name>“, or “<gitlab-group-name>/<gitlab-subgroup-name>/<gitlab-project-name>“.
    • For Docker Hub images, the namespace is missing (e.g. for postgres:12.2) in case the image is considered to be an official image (docs). Otherwise, the group has the name of the user or organization who pushed the image, e.g. bitnami for “bitnami/jenkins-exporter“.
  • image-name is simply the name of the image that you choose at will
  • version-tag can either be omitted (in which the Docker build engine assumes “:latest“), or you can set it to any kind of string, e.g. “1.0“. More details below.

Image meta-data and the registry address

A few of the above components don’t actually make it into the meta-data of an image. With meta-data I mean the output of docker image inspect <some-image:tag> – the output does not contain the registry host. Under the hood, a client (such as the docker CLI, or Skopeo) will use the registry.host part simply as the destination for the HTTP requests it makes to download the image. For instance, if you run “docker pull gcr.io/kaniko-project/executor:v1.6.0” the Docker client would perform a HTTP GET https://gcr.io/v2/kaniko-project/executor/manifests/v1.6.0 request first, which returns a JSON describing the blobs of the compressed image layers of that image, followed by more GET requests to download the layer-blobs. See also here and here for more information.

Why using latest is bad

When you only use latest for the version part of your tag (and don’t additionally tag the image with other tags), you may run into the following problems:

  • If you re-execute an older CI job (or if you run the same CI job in multiple testing / feature Git branches), the CI jobs will keep overwriting the latest tag. “latest” loses its meaning. Was it meant to refer to the latest stable version, or also to the bleeding edge of some feature branch? Your production environment will most likely become unstable if you configure it to use the latest tag of your image.
  • It would become impossible to use some older version of your image on purpose in some of your deployments.

Candidates for version element

The following table illustrates which elements you can put into the version part of your Docker image tag. The examples are given for GitLab CI/CD, but other CI/CD systems will typically have similar environment variables. Note that you can also combine/concatenate them!

Version element descriptionGitLab CI/CD predefined environment variable name
Git tag
Note: only available in “tag-pipelines”, that is, pipelines started for commits that are actually tagged with Git!
CI_COMMIT_TAG
Git commit SHA-256 hashCI_COMMIT_SHA
Shortened Git commit hashCI_COMMIT_SHORT_SHA
Git branch name
Note: not always available (e.g. missing for a tag-pipeline)!
CI_COMMIT_BRANCH
Semantic version string parsed from code filesN/A: you need to implement it yourself
date + timestampCI_JOB_STARTED_AT
A unique build numberCI_JOB_ID

Need for semantic versioning

If you are building a purely internally-operated service that doesn’t have any (external) customers, or does not allow customers to download specific versions of your software, you probably don’t need semantic version numbers. Semantic version numbers only make sense if you have clients / customers who want or expect nice-looking version numbers. A purely internal service will work fine just using a date+timestamp instead of a version.

Stable vs. unstable version tags

A stable version tag is one whose name doesn’t change over time – but whenever you call docker run for that version tag, you might get a different image! In other words: the “stability” refers to the name, not the actual image target.

An unstable version tag is the opposite: it was uniquely created for a specific image build. Whenever you call docker run for that unstable version tag, it will point to the same image. But since that specific image build becomes out of date quickly, you have to call docker run with a different version tag over time, to stay up-to-date.

I also sometimes compare Docker image tags to Git tags and Git branches: stable Docker version tags correspond to Git branches, because they are both mutable pointers. Unstable Docker version tags then correspond to (immutable) Git tags.

The main advantage of stable version tags: as the user (or the script) who calls docker run, you don’t need to change the tag on each call, because the one you use is stable.

The main disadvantages of stable version tags: on distributed and self-healing platforms, such as Kubernetes, the time of pulling the image dictates the exact image build running on a node. This is a problem, because you may end up with different nodes (pods) running different image builds!

The (dis-) advantages of unstable tags are simply the opposite of those of stable tags. You have very precise control at the point when docker run is called (and won’t have problems with platforms like Kubernetes anymore). But you need a mechanism to make the caller of docker run aware of the correct version tag.

Conclusion: tagging recommendation

Given the advantages and disadvantage of stable/unstable tags, we can formulate a recommendation strategy:

Case 1: you use a platform such as Kubernetes: use unique, unstable version tags (that e.g. include the Git commit hash, or other elements of above table), and push them into the cluster, as I explained e.g. in this Docker deployment article.

The elements you place in your version tag depend on the approach you use for building the image. For instance, if your builds are only triggered by Git pushes, using the Git commit SHA as a discriminator may already be enough. However, if you (additionally) do image builds on a regular schedule for specific branches, e.g. to always include the latest security patches, one of the elements in the image version tag should include at least a date+timestamp, and maybe a unique build number. Otherwise (e.g. if you only used the Git commit SHA) your version tag might be reused for multiple builds created on different days.

Case 2: you don’t use platforms such as Kubernetes: use stable version tags. You can have one or more tags, depending on whether you have multiple deployments/environments, e.g. production vs. staging. Make sure that your CI jobs only update the version tag in the respective Git branches – for instance, the CI job running for the staging branch should only build and push the Docker image with the staging tag.

You can also combine both approaches. That is, you can tag your images with both stable and unstable version tags. This makes sense in situations where you don’t know how or where your image will be used.

Finally, you should be aware that you can also force the container runtime (such as Docker) to pull a specific build, even for images that come with a stable tag, using the immutable identifier. For instance,
docker pull ubuntu@sha256:82becede498899ec668628e7cb0ad87b6e1c371cb8a1e597d83a47fac21d6af3
downloads a very specific Ubuntu 20.04.2 LTS image, built on 2021-07-26.

The background information is documented here. This is handy if you want to use an image that only offers stable tags, but you want to use it in Kubernetes or similar environments. The @sha256:... acts like an unstable tag. Its only disadvantage is that the hash does not provide any information about the image (such as the application’s semantic version, or a build date), which is why it makes sense that you explicitly create unstable version tags that contain this kind of information.

2 thoughts on “Docker image tagging: best practices in a CI pipeline”

  1. Do you think it’s relevant to push an image via Pull Request?

    I would like to be able to individually test the images of each PR…

    Reply
    • Yes, this is something you would typically do if you run some sort of integration or End2End test in a PR pipeline, which requires Docker images so that you can (temporarily) start the services under test.

      Reply

Leave a Comment