Docker optimization guide: optimize build speed in CI pipelines

This article offers several tips for tweaking the build speed of Docker images in CI pipelines. I explain multiple caching-tricks offered by BuildKit, which is an alternative image build engine. I also elaborate on how the .dockerignore file and extra arguments to package managers such as apt can speed up your image builds.

Docker optimization guide series

This article is part of a multi-part series on working with Docker in an optimized way:

Optimize Docker development speed
– Optimize Docker image build speed in CI (this article)
Optimize Docker image size
Optimize Docker image security

Introduction

As further explained in my previous CI/CD Basics article, CI/CD is the process of fully automating tasks such as building, testing and deploying your software. Instead of running these tasks manually, on a developer laptop, you have them executed on a server all the time, whenever a developer pushed some code.

If one of the steps in a CI/CD pipeline is to package your application as Docker image (and push it to an image registry), you may have noticed long build times. In this article, I present several tips for tweaking the docker build command, so that building becomes fast, which improves the feedback time for your developers.

Use multi-stage builds

Multi-stage builds allow you to define multiple Docker images (which build upon each other) within the same Dockerfile. This speeds up image building in those scenarios where only some steps of the image-build are slow, e.g. compiling a large third-party component, when these steps don’t need to be repeated every time. With multi-stage builds, you can extract these slow steps into a separate helper base image #1, which is used by your (frequently re-built) run-time image #2. This speeds up the build speed of image #2, because a cached version of image #1 can be used. The official manual offers an excellent introduction into multi-stage builds.

Note: another optimization technique is caching, presented in the next section. When you use multi-stage builds, you also need to push the earlier stages (image #1) for the caching to work. See this article for details.

Use BuildKit with proper caching configuration

BuildKit is an alternative backend to build Docker images from your Dockerfile. Since Docker v19.03 it is included into the docker CLI, but (as of January 2022) it is still not used as default backend on Linux. Consequently, you need to explicitly activate it in your (Linux-based) CI pipeline! Note that since Docker v20.10, BuildKit is used by default on Docker for Desktop on Windows and macOS!

The main advantages of BuildKit are:

  • It runs those steps in parallel which can be run in parallel (which speeds up e.g. multi-stage builds)
  • It supports build secrets (see docs), which avoids that secrets end up in the final image. Instead, secrets are just made available in the container/image temporarily during the build process. I will go into further details in a future article of this series.
  • It supports storing/downloading individual image layers from the registry, instead of only relying on the local layer cache (docs). This massively improves the build speed in case you have multiple CI pipeline runners. When runner #1 builds and pushes the image (and is configured to also push caching meta-data along with the image), runner #2 can detect (for each layer) whether it really needs to re-build it: runner #2 no longer only consults its local cache, but retrieves cache-metadata from the registry, and can, on a cache hit, just download the already-built layer(s) from the registry.
  • It supports mounting cache-directories (docs) from the host into the build-container for specific RUN statements
    (e.g. “RUN --mount=type=cache,target=/some/path,id=some-id <command>“). The cache is managed by the BuildKit daemon. This is useful to temporarily mount cache directories of package managers, such as pip, npm or apt/yum/etc., which are shared between consecutive image builds.

The general steps to use the caching features are as follows:

  1. To enable BuildKit in Linux-based CI pipelines such as GitLab, set the DOCKER_BUILDKIT environment variable to 1 (master switch).
  2. To enable layer caching (which speeds up builds when having multiple CI runners), run these two commands:
    • docker build --pull --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from $IMAGE_TAG \
      -f Dockerfile -t $IMAGE_TAG .
      docker push $IMAGE_TAG
    • Notes:
      • The --pull argument is optional. Its causes BuildKit to download the most recent base images (specified in the FROM statement of your Dockerfile)
      • In this example we are using the same image tag ($IMAGE_TAG) for caching and the “real” image (used by your clients). This can cause problems if your image tag is different in every CI run (e.g. if the tag includes the Git commit hash). In this case, I recommend that you use a stable image tag for the caching-image. You need to re-tag the built image and push it twice – once for the cache image, and once for the “real” image. Example:
        docker build --pull --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from $IMAGE_CACHE_TAG \
        -f Dockerfile -t $IMAGE_CACHE_TAG .
        docker tag $IMAGE_CACHE_TAG $IMAGE_TAG
        docker push $IMAGE_TAG
        docker push $IMAGE_CACHE_TAG
      • If you use multi-stage builds, you also need to push the cache images of earlier stages explicitly, and use the --cache-from argument multiple times, for the caching to work. See this article for details.
  3. To use the temporarily-mounted cache directories (to speed up re-building layers that use package managers such as apt or pip), you need to adapt your Dockerfile, as I’ve explained here in this article series.

Use .dockerignore files

The .dockerignore file lets you specify files that the Docker build process should not copy from the host into the build context. My previous article presents the concept in detail here. This greatly speeds up local Docker image building (because it omits folders such as “.git”), but your CI pipelines can also benefit in case your repository contains larger files or folders that don’t need to be included into the image.

Limit installed packages

Some package managers install additional “recommended” components along with those components you explicitly specified. This slows down the installation and consumes (often) unnecessary disk space. You can avoid this as follows for some package managers:

  • For Debian/Ubuntu: apt-get install -y --no-install-recommends <list of package names to install>
  • For RPM-based systems such as RHEL or CentOS: dnf -y install --setopt=install_weak_deps=False <list of package names to install>

Leave a Comment