This article introduces several tricks that you can apply at build-time, to reduce the size of your Docker images, including the use of a small base image, multi-stage builds, consolidation of
RUN statements, avoiding separate
chown commands, or using the tool docker-slim.
Docker optimization guide series
Docker has become a commodity to package and distribute (backend) software. So much so, that the Docker image format has been standardized by the OCI (Open Container Initiative) in a specification called image-spec. You can use Docker or other tools (such as buildah or kaniko) to build images, and the resulting image can be loaded by any compatible container run-time. Throughout the rest of this article, I’ll use the “Docker image” terminology, but I really mean OCI-compliant images.
One challenging aspect is to keep the size of Docker images small. The core advantage of small Docker images is cost reduction: small images consume less disk space, can be downloaded (and extracted) faster, and thus reduce the container start time.
Let’s take a look at different approaches to reduce the Docker image size.
Choose a suitable base image
The base image is the name of the image referenced in the
FROM statement at the beginning of your
Dockerfile. The base image decides which Linux distro you are using, and what packages come installed. When you choose the base image, carefully examine the available images (e.g. on Docker Hub) and possibly-available “slim” image tags for these images:
- Some distributions, such as alpine, are very size-optimized by default (~3 MB). However, alpine has the caveat that it uses musl instead of the much more common glibc C-library. This often causes compatibility problems, as e.g. explained here.
- Many Linux distribution images, e.g. ubuntu or debian, are also rather small (less than 40 MB) by default. Some distributions, such as debian, have a
slimtag, which cuts down the image size (e.g. from ~40 MB to ~20 MB in case of Debian).
- Programming-language-specific images (e.g.
node, etc.) often have rather large default images (several hundred MBs). But there is a
slimvariant, where optional packages (such as compilers) are removed and are therefore much smaller (often 10x smaller).
- There are other stripped-down Linux distro images with focus on security, such as distroless, but they come with several caveats, e.g. making it harder to debug running containers, and having a rather steep learning curve to customize them.
Often a Docker image becomes large simply because your app needs to download and compile dependencies used by your app. The image size increases due to unneeded files & folders that you only need to build/install, but not to run your application. Examples include:
- You need to install a package manager (e.g.
apt-get install python3-pip) to get the dependencies.
- You don’t need the package manager (such as
pip) to run your application!
- You don’t need the package manager (such as
- The package manager might cache downloaded dependencies (see below for how to avoid it)
- You don’t need that cache to run your application!
- You need to compile native code (e.g. written in C/C++) with compiler tool-chains, e.g. to be able to install Python extensions that include native modules
- You don’t need the compiler to run your application!
With multi-stage builds, you can split your build process into two (or more) separate images:
- A “build image” into which you install all the packages and compilers, and do the compilation (disk space is not a concern here)
- A “run image” into which you only copy your application code, as well as other (compiled) libraries, which you copy from the build image into the run image
To learn more about multi-stage builds, refer to the corresponding section in my Optimize Docker image build speed in CI article.
Consolidate RUN commands
Whenever you only need certain files temporarily (that is, you download/install them, use them, then delete them again), you should build a single
RUN command that performs all these steps, instead of having multiple
The following is an inefficient example, where you temporarily download source code just to compile it, finally deleting the source code again (because you don’t need it, to run your application):
FROM debian:latest WORKDIR /app RUN git clone https://some.project.git RUN cd project RUN make RUN mv ./binary /usr/bin/ RUN cd .. && rm -rf projectCode language: Dockerfile (dockerfile)
This is inefficient, because each
RUN command adds a new layer to the image, which contains the changed files and folders. These layers are “additive”, using an overlay-type file system. However, deleting files only marks these files as deleted – no disk space is reclaimed!
Instead, this alternative is better:
FROM debian:latest WORKDIR /app RUN git clone https://some.project.git && \ cd project && \ make && \ mv ./binary /usr/bin/ && \ cd .. && rm -rf projectCode language: Dockerfile (dockerfile)
In the above example, the source code does not end up in the image at all, because it was already deleted before the
RUN command completed.
You can apply the same trick when you need to install packages (e.g. with
apt, etc.), that you just need temporarily, e.g. to download or compile software. In a single
RUN statement, use the package manager to install the packages, then use them, then uninstall them again.
Squash image layers
docker-squash is a Python-based tool (installed via
pip) that squashes the last N layers of an image to a single layer. The term “squash” has the same meaning as in Git squash. Squashing is useful to reduce your Docker image size, in case there are layers in which you created many (large) files or folders, which you then deleted in a newer layer. After squashing the affected layers, the deleted files and folders are not part of the squashed layer anymore.
docker-squash saves the resulting image locally. Note:
docker-squash cannot squash arbitrary layers in the stack – only the last N layers (where you can specify N, and N can also be “all” layers). In other words: in a 7 layer image, you cannot squash only layers #3 to #5.
See the docker-squash page for installation and usage instructions. To figure out which layers to squash, you can use the
docker history <imagename> command (which lists the layers and their size), or look at the efficiency score of each layer using dive, which is a tool to explore Docker images, layer by layer.
Save space when installing dependencies
It is common to use package managers to install third party software component that your app depends on. Examples for package managers are
yum (for Linux packages), or programming-language-specific ones, such as
pip for Python.
There are two approaches to save space with package managers:
- Instruct the package manager to install as few additional dependencies as possible. Often, the repositories used by package managers have links between dependencies, of the form “A requires B to work” (strong link), or “A profits from also having B installed” (weak link). Some package managers have an switch to disable the latter link-type (which you need to enable explicitly).
- Disable (or clean) the cache of the package manager. By default, package managers have a cache, for good reason: it speeds up repeated package installations. For instance, if you run “
pip install requests“, the
requestspackage will be present on your system twice (as space-inefficient copy, not as efficient symbolic link): in the final destination (e.g. the
site-packagesdirectory of Python), and in a separate
pipcache folder. If you need to install the same package (
requests) again, the package manager may skip downloading it from the Internet, but use the locally cached version instead. However, when building Docker images, we don’t want this cache to exist, because it unnecessarily blows up the image size.
Saving space works differently for each package manager. Here are a few examples:
- For Debian/Ubuntu:
apt-get install -y --no-install-recommends <list of package names to install> && <optional: do something with packages> && apt-get clean && rm -rf /var/lib/apt/lists/*
- This ensures that no recommended packages are installed, and that the cache is cleared at the end.
- For RPM-based systems, like RHEL:
dnf -y install --setopt=install_weak_deps=False <list of package names to install> && <optional: do something with packages> && dnf clean all
- The “
dnf clean all” command ensures that all caches are deleted.
- For Python (
pip install --no-cache-dir <list of package names to install>
--no-cache-dirargument ensures that no cache is created to begin with.
- For Node.js (
npm ci && npm cache clean --force
- The “
npm ci” command (docs) is more efficient and clean than “
npm install“. The second command cleans out the NPM cache.
Caveat when using cache mounts
Avoid superfluous chowns
Whenever some statement in the
Dockerfile modifies a file in the build container in any way (including just meta-data changes), a whole new copy of that file is stored in the new layer. This is also true for changes in file ownership, or permissions. Thus, recursive
chowns can therefore result in very large images, because Docker duplicates every affected file. Therefore, instead of:
COPY code . RUN chown -R youruser codeCode language: Dockerfile (dockerfile)
you should do:
COPY --chown=youruser code .Code language: Dockerfile (dockerfile)
This will perform the
chown as part of the
COPY, ensuring that only one instance of the files is created.
.dockerignore file lets you specify files & folders that the Docker build process should not copy from the host into the build context. My previous article presented the concept in detail, see here. This not only speeds up Docker image building (because the build context is populated faster), but this can also make your image smaller: it avoids that you accidentally copy large files or folders into the image, which you don’t need to run your application. An example are data files (e.g. big CSV files with raw data required only for automated tests), which someone incorrectly placed in the “
src” folder whose entire content is copied into the image via a
COPY statement in your
docker-slim is a tool that reduces your image size, by starting a temporary container of your image, and figuring out (via static + dynamic analyses) which files are really used by your application in that container.
docker-slim then builds a new single-layer image that contains only those files and folders that were actually used. This makes
docker-slim the most effective approach of all those listed here, but it does not come without caveats!
- Results in very small images – they are often smaller than alpine-based images!
- High image security (distroless-like), because all other tools (e.g. the shell,
curl, package manager, etc.) are removed, unless your application needs them.
Dockerfiles: you can ignore most other tricks of this article, e.g. use large base images and have many (unoptimized)
docker-slimwill figure out the rest!
docker-slimmay throw away too many files, e.g. because your application uses lazy loading/calling. This can make the resulting Docker image unusable in production. Even worse, these errors might only show up after the slimmed-down image has been in use for a while, because the missing files are only needed in certain (sometimes hard-to-replicate) edge cases. For instance, if you developed a multi-language application (and most users use English by default),
docker-slimmight discard translation files, which is only noticed by a small set of your users (using those languages) after some time.
To handle this disadvantage, you need to tweak
docker-slim via two mechanisms:
- Explicit “preserve path”-lists, that contain paths to files and folders that
docker-slimshould definitely keep (even though they were not used during the tests in the temporary container)
- Dynamic probes, which are HTTP requests or shell command calls that
docker-slimmakes to your application, to force it to (lazy-) load dependencies, e.g. native libraries
The official manual provides further details about how to install and use docker-slim. Here you can find a GitLab-based demo repository where I build a non-optimized Docker image first, then slim it down with
docker-slim, perform a smoke test on the slim image, and finally compare the slim and non-slim images. The pipeline includes scripts that generate a list of files that
docker-slim removed from the slim image, which you should inspect manually. If you spot any files that your application might need, add them to the preserved-paths-list.
To get small Docker images, there unfortunately is no “silver bullet”. Either you do not use the “catch-all” tool,
docker-slim, but then you need to spend time implementing the other tips listed in this article. Or you do use
docker-slim, but then you need to invest time to write automated system-level tests (which identify problems in the slimmed-down container) and tweak the
docker-slim probes or “preserve path”-lists.