Docker optimization guide: the 5 best tips to optimize Docker development speed

This article presents 5 tips that improve the speed of building Docker images locally. It focuses on two problems: iterating on your unfinished Dockerfile (where you still figure out configuration options and which packages to install), and iterating on your code (assuming a finished Dockerfile). I discuss tricks such as using BuildKit, the .dockerignore file, or tweaking the RUN statement order in your Dockerfile.

Docker optimization guide series

This article is part of a multi-part series on working with Docker in an optimized way:

– Optimize Docker development speed (this article)
Optimize Docker image build speed in CI
Optimize Docker image size
Optimize Docker image security

Introduction

Docker has become an established tool for many software developers, not only to share and deploy software, but also to develop it. During development (on your development workstation), you will typically face two kinds of challenges that slow you down:

  1. Iterating on your Dockerfile: you are working on a still unfinished Dockerfile. You are frequently changing statements in the Dockerfile, but re-building the image takes very long -> you want to reduce the build time, to iterate faster.
  2. Iterating on your code: you have a finished Dockerfile, but still working on the code. Re-building the image on each code change takes very long -> here you also want to reduce the build time.

Let’s take a look at solution approaches for each problem separately. Note that you can (and should) combine them!

Dockerfile iteration

Approach 1: Splitting RUN statements

Due to Docker’s image caching feature (explained below in section Use a smart RUN statement order), you can improve your Dockerfile iteration speed by deliberately creating “too many” layers during the iteration phase (and aggregating them at the very end):

  1. Build a Docker image that contains those statements that are already known to work. In the most extreme case, this image is just an alias for your chosen base image, e.g. your Dockerfile might only have a statement such as FROM python3.8-slim.
  2. Start a temporary container of that image. Iteratively run the different commands you believe are necessary for the final image, inside this container. For each command that worked (returning exit code 0), create a corresponding RUN command line in your Dockerfile.
  3. From time to time (after adding a few commands), stop the temporary container, rebuild the image, and restart the temporary container. Thanks to Docker’s layer caching, those RUN statements whose layers were already cached won’t be re-built.

Once your Docker image works as expected, you finalize your Dockerfile: you reduce the number of layers / RUN commands, by aggregating RUN statements into fewer RUN statements that each span multiple lines. See the example below:

Before:

FROM ruby:3.1.0-buster
WORKDIR /app
RUN git clone https://some.project.git
RUN cd project
RUN bundle install
RUN rake db:migrate
Code language: Dockerfile (dockerfile)

After:

FROM ruby:3.1.0-buster
WORKDIR /app
RUN set -eo pipefail
RUN git clone https://some.project.git && \
    cd project && \
    bundle install && \
    rake db:migrate
Code language: Dockerfile (dockerfile)

To avoid that image building completes even though some errors happened in a large RUN statement, use the Bash strict mode: add a RUN set -eo pipefail command at the top of your Dockerfile. See here for background information.

Approach 2: Use BuildKit’s cache mount

You can use BuildKit’s RUN --mount=type=cache feature (as documented here) to speed up lines such as RUN apt-get install -y ... . This speeds up re-building layers that involve package managers, such as apt, pip or npm.

The basic process is as follows:

  • Make sure to use the BuildKit engine for building Docker images: on Docker for Desktop (Windows / macOS), BuildKit is used by default. On Linux, you need to set the DOCKER_BUILDKIT environment variable to “1” to enable BuildKit.
  • Follow these instructions to use the cache-mounting option: it explains the concrete arguments you should use for Go packages, and apt.
    • For NPM, use something like this: RUN --mount=type=cache,target=/root/.npm,id=npm npm ci
    • For pip, use something like this: RUN --mount=type=cache,target=/root/.cache,id=pip pip install -r requirements.txt
    • Note: the above examples assume that you are installing packages as root user. if this is not the case, you need to change the /root/ path prefix to the home directory of the user you are running the commands with.

Code iteration

Approach 1: Use container-based development mode of your IDE

IDEs such as Visual Studio Code let you develop within running Docker containers directly. For VS Code, this feature is documented here and is called Dev containers. The basic idea is that code files exist on your host, and are mounted into a Docker container. Inside the container, not only are all the dependencies installed (as instructed in your Dockerfile), but VS Code additionally installs parts of itself into that container, e.g. the language parsing features, a debugger, or various other VS Code extensions. VS Code thus runs on the host (just the UI, basically as thin client) and in the container. Any code you change on the host will be instantly available inside the container, due to the bind mount VS Code has set up.

Other IDEs, such as JetBrains IDEAJ-based IDEs (e.g. PyCharm) also offer Docker(-compose)-based development features.

You can find more details about container-based development environments in my blog post here.

Approach 2: Use a smart RUN statement order

One of the advantages of Docker is its layer-caching feature. In a nutshell, each statement in the Dockerfile creates a new image layer. When you repeatedly run docker build ..., the Docker daemon reuses as many cached layers as possible. Docker automatically determines whether it can reuse a layer, as explained in these official docs. Once a layer (or its respective Dockerfile statement) is considered to be modified (and Docker can no longer use its cached version), all image layers above the first modified one have to re-built, to accommodate the changes.

The general idea to avoid frequent re-builds of the entire layer stack is to design the Dockerfile in such a way that statements/layers that change rarely come first in the stack, and layers that change frequently come last.

Consider the following example:

You are developing a containerized Docker application. In your Dockerfile you have the following statements:

FROM python:3.9
WORKDIR /app
COPY src src
RUN pip install -r src/requirements.txt
CMD ["python", "my_app.py"]
Code language: Dockerfile (dockerfile)

The problem here is that the frequent code changes invalidate the cached layer for line #3. Thus, the layers built for lines 3-5 must be rebuilt on each code change, and the pip install command is rather slow. You can improve this example as follows:

FROM python:3.9
WORKDIR /app
COPY ./src/requirements.txt .
RUN pip install -r requirements.txt
COPY src src
CMD ["python", "my_app.py"]
Code language: Dockerfile (dockerfile)

Now, pip install only needs to run when the requirements.txt file changes. Otherwise, the cached layer is used.

Approach 3: Use a .dockerignore file

The first step the build engine does when you run docker build ... is to copy the files the build process should have access to from the host into the so-called Docker build context. The ADD or COPY statements in your Dockerfile actually only use the files&folders of that build context as source, not the files or folders of the host.

The copy process is often slow if there are too many (large) files/folders on the host’s file system which you don’t want copied, such as the “.git” directory, or any other larger files/folders.

The .dockerignore file lets you exclude files that shall not be copied into the build context, which speeds up building. See here for the official docs. In spirit, it is very similar to the .gitignore file.

What you put in there depends on your project (and the used programming languages), but here is a general recommendation:

  • Meta-data from your VCS, e.g. the “.git” or “.svn” folder
  • Meta-data from your operating system, e.g. “**/.DS_Store”
  • Any other large files (e.g. database backup dumps, raw data used for automated tests, etc.) that exist on the host or the repo. If you store large files via Git LFS, it makes sense to consult your .gitattributes file for lines that contain “filter=lfs” – maybe some of these entries should also belong into the .dockerignore file.
  • Documentation
  • Tests (assuming that you don’t run them as part of building the image, which would be an anti-pattern anyway)
  • Any caches or installed packages that will be re-built or re-installed within the container anyway, e.g. the node_modules folder for JavaScript application using npm/etc. – this is highly specific to the used programming language or package-manager, so you should Google for “dockerignore <language>” etc. to find the best entries. Some examples are available here.

Conclusion

Optimizing your development workflow is very important. While memes such as this one are funny and often true, you don’t really want the “compiler” (Docker, in this case) forcing many micro-breaks upon you. You can’t really do anything meaningful during these breaks. Attempting to do several tasks in parallel is mentally exhausting and error-prone, at least for some of us. Learning and applying the above tips will make you more efficient, avoiding idle time. In the follow-up article I will discuss more optimization tips tailored for your CI pipeline, some of which you can also apply to your local development workflow.

Leave a Comment