This article presents 5 tips that improve the speed of building Docker images locally. It focuses on two problems: iterating on your unfinished
Dockerfile (where you still figure out configuration options and which packages to install), and iterating on your code (assuming a finished
Dockerfile). I discuss tricks such as using BuildKit, the
.dockerignore file, or tweaking the
RUN statement order in your
Docker optimization guide series
This article is part of a multi-part series on working with Docker in an optimized way:
– Optimize Docker development speed (this article)
– Optimize Docker image build speed in CI
– Optimize Docker image size
– Optimize Docker image security
Docker has become an established tool for many software developers, not only to share and deploy software, but also to develop it. During development (on your development workstation), you will typically face two kinds of challenges that slow you down:
- Iterating on your
Dockerfile: you are working on a still unfinished
Dockerfile. You are frequently changing statements in the
Dockerfile, but re-building the image takes very long -> you want to reduce the build time, to iterate faster.
- Iterating on your code: you have a finished
Dockerfile, but still working on the code. Re-building the image on each code change takes very long -> here you also want to reduce the build time.
Let’s take a look at solution approaches for each problem separately. Note that you can (and should) combine them!
Approach 1: Splitting RUN statements
Due to Docker’s image caching feature (explained below in section Use a smart RUN statement order), you can improve your
Dockerfile iteration speed by deliberately creating “too many” layers during the iteration phase (and aggregating them at the very end):
- Build a Docker image that contains those statements that are already known to work. In the most extreme case, this image is just an alias for your chosen base image, e.g. your
Dockerfilemight only have a statement such as
- Start a temporary container of that image. Iteratively run the different commands you believe are necessary for the final image, inside this container. For each command that worked (returning exit code 0), create a corresponding
RUN commandline in your
- From time to time (after adding a few commands), stop the temporary container, rebuild the image, and restart the temporary container. Thanks to Docker’s layer caching, those
RUNstatements whose layers were already cached won’t be re-built.
Once your Docker image works as expected, you finalize your
Dockerfile: you reduce the number of layers /
RUN commands, by aggregating RUN statements into fewer RUN statements that each span multiple lines. See the example below:
FROM ruby:3.1.0-buster WORKDIR /app RUN git clone https://some.project.git RUN cd project RUN bundle install RUN rake db:migrateCode language: Dockerfile (dockerfile)
FROM ruby:3.1.0-buster WORKDIR /app RUN set -eo pipefail RUN git clone https://some.project.git && \ cd project && \ bundle install && \ rake db:migrateCode language: Dockerfile (dockerfile)
To avoid that image building completes even though some errors happened in a large
RUN statement, use the Bash strict mode: add a
RUN set -eo pipefail command at the top of your
Dockerfile. See here for background information.
Approach 2: Use BuildKit’s cache mount
You can use BuildKit’s
RUN --mount=type=cache feature (as documented here) to speed up lines such as
RUN apt-get install -y ... . This speeds up re-building layers that involve package managers, such as
The basic process is as follows:
- Make sure to use the BuildKit engine for building Docker images: on Docker for Desktop (Windows / macOS), BuildKit is used by default. On Linux, you need to set the
DOCKER_BUILDKITenvironment variable to “1” to enable BuildKit.
- Follow these instructions to use the cache-mounting option: it explains the concrete arguments you should use for Go packages, and apt.
- For NPM, use something like this:
RUN --mount=type=cache,target=/root/.npm,id=npm npm ci
- For pip, use something like this:
RUN --mount=type=cache,target=/root/.cache,id=pip pip install -r requirements.txt
- Note: the above examples assume that you are installing packages as
rootuser. if this is not the case, you need to change the
/root/path prefix to the home directory of the user you are running the commands with.
- For NPM, use something like this:
Approach 1: Use container-based development mode of your IDE
IDEs such as Visual Studio Code let you develop within running Docker containers directly. For VS Code, this feature is documented here and is called Dev containers. The basic idea is that code files exist on your host, and are mounted into a Docker container. Inside the container, not only are all the dependencies installed (as instructed in your
Dockerfile), but VS Code additionally installs parts of itself into that container, e.g. the language parsing features, a debugger, or various other VS Code extensions. VS Code thus runs on the host (just the UI, basically as thin client) and in the container. Any code you change on the host will be instantly available inside the container, due to the bind mount VS Code has set up.
Other IDEs, such as JetBrains IDEAJ-based IDEs (e.g. PyCharm) also offer Docker(-compose)-based development features.
You can find more details about container-based development environments in my blog post here.
Approach 2: Use a smart
RUN statement order
One of the advantages of Docker is its layer-caching feature. In a nutshell, each statement in the
Dockerfile creates a new image layer. When you repeatedly run
docker build ..., the Docker daemon reuses as many cached layers as possible. Docker automatically determines whether it can reuse a layer, as explained in these official docs. Once a layer (or its respective
Dockerfile statement) is considered to be modified (and Docker can no longer use its cached version), all image layers above the first modified one have to re-built, to accommodate the changes.
The general idea to avoid frequent re-builds of the entire layer stack is to design the
Dockerfile in such a way that statements/layers that change rarely come first in the stack, and layers that change frequently come last.
Consider the following example:
You are developing a containerized Docker application. In your
Dockerfile you have the following statements:
FROM python:3.9 WORKDIR /app COPY src src RUN pip install -r src/requirements.txt CMD ["python", "my_app.py"]Code language: Dockerfile (dockerfile)
The problem here is that the frequent code changes invalidate the cached layer for line #3. Thus, the layers built for lines 3-5 must be rebuilt on each code change, and the
pip install command is rather slow. You can improve this example as follows:
FROM python:3.9 WORKDIR /app COPY ./src/requirements.txt . RUN pip install -r requirements.txt COPY src src CMD ["python", "my_app.py"]Code language: Dockerfile (dockerfile)
pip install only needs to run when the
requirements.txt file changes. Otherwise, the cached layer is used.
Approach 3: Use a
The first step the build engine does when you run
docker build ... is to copy the files the build process should have access to from the host into the so-called Docker build context. The
COPY statements in your
Dockerfile actually only use the files&folders of that build context as source, not the files or folders of the host.
The copy process is often slow if there are too many (large) files/folders on the host’s file system which you don’t want copied, such as the “.git” directory, or any other larger files/folders.
.dockerignore file lets you exclude files that shall not be copied into the build context, which speeds up building. See here for the official docs. In spirit, it is very similar to the
What you put in there depends on your project (and the used programming languages), but here is a general recommendation:
- Meta-data from your VCS, e.g. the “.git” or “.svn” folder
- Meta-data from your operating system, e.g. “**/.DS_Store”
- Any other large files (e.g. database backup dumps, raw data used for automated tests, etc.) that exist on the host or the repo. If you store large files via Git LFS, it makes sense to consult your
.gitattributesfile for lines that contain “
filter=lfs” – maybe some of these entries should also belong into the
- Tests (assuming that you don’t run them as part of building the image, which would be an anti-pattern anyway)
- Any caches or installed packages that will be re-built or re-installed within the container anyway, e.g. the
npm/etc. – this is highly specific to the used programming language or package-manager, so you should Google for “dockerignore <language>” etc. to find the best entries. Some examples are available here.
Optimizing your development workflow is very important. While memes such as this one are funny and often true, you don’t really want the “compiler” (Docker, in this case) forcing many micro-breaks upon you. You can’t really do anything meaningful during these breaks. Attempting to do several tasks in parallel is mentally exhausting and error-prone, at least for some of us. Learning and applying the above tips will make you more efficient, avoiding idle time. In the follow-up article I will discuss more optimization tips tailored for your CI pipeline, some of which you can also apply to your local development workflow.