Backup Docker volumes (and restore them) – done right

This article explains how to use “tar” correctly, to backup Docker volumes and restore them. I explain why two top-ranked tutorials are not doing a good job, by taking them apart. Finally, I give hints for creating backups of Docker volumes in production.

Introduction

Making a local backup of a Docker volume and restoring it is a common task if you operate container-based software with Docker engine directly (not using Kubernetes). Unfortunately, most search results you find on the internet for a query such as “Docker volume backup” yield incorrect and dangerous tutorials, whose makers have either not understood how the underlying tools work, or have not bothered explaining some obscure command line parameters which you need to adapt. In this piece, I will explain why these tutorials are wrong, and how to actually correctly back up and restore Docker volumes.

How to do it wrong – official Docker tutorial

Case study #1 is the official Docker tutorial. Following their commands to the letter does work, but adapting it to your (real-world) use case, e.g. a MySQL container, will fail.

At the time of writing this article, the tutorial does this:

  • Create a new container named dbstore that creates and mounts an anonymous volume (if you do not know what this means, look for “anonymous” here) to /dbdata in the container. The ultimate goal is to back up and restore the content of that anonymous volume. The full command in the tutorial is:
    docker run -v /dbdata --name dbstore ubuntu /bin/bash
  • Create a temporary container using the ubuntu:latest image, used to back up the volume. That temporary container is given two volume mounts: the one from the dbstore container (using --volumes-from dbstore) and a bind mount that maps the working directory of the host to /backup in the container. In the container, they run “tar cvf /backup/backup.tar /dbdata“. The full command is:
    docker run --rm --volumes-from dbstore -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
  • To restore the backup, they run another temporary ubuntu container with the same volume mounts as used the previous temporary ubuntu container. Inside the container, they run multiple commands, so they have to change the start command (CMD) to bash -c "..." to make this work. The full command is:
    docker run --rm --volumes-from dbstore2 -v $(pwd):/backup ubuntu bash -c "cd /dbdata && tar xvf /backup/backup.tar --strip 1"

Let’s analyze the reasons why following that tutorial is not a good idea:

  1. The tutorial assumes that your volume is mounted to some path on the root level in the container, e.g. /dbdata in the tutorial. In practice, most volumes are mounted into a deeply nested path, e.g. /var/lib/mysql for a MySQL container. So if you were to follow the Docker tutorial and simply replaced /dbdata with /var/lib/mysql , the directory structure in the created backup archive starts with the relative path “var“. The reason is that the directory structure that tar creates is relative to the current container’s working directory, if possible, or relative to “/” (root) otherwise. In this case, the working directory of the ubuntu container is already / (root), therefore tar creates the directory structure starting from there. When restoring the backup, the tutorial uses --strip 1 which tells tar to remove only the very first segment of the paths in the tar file. For the scenario of the tutorial this works finedbdata is stripped, because cd /dbdata made sure the working directory (the extraction destination) is set correctly. But for a MySQL backup, tar will try to extract a “lib” folder into /var/lib/mysql, so you end up with your backup data being restored to /var/lib/mysql/lib/… which won’t work. If you adapted that tutorial to MySQL and observed that tar does not throw any errors, it looks as if the restore process had worked, even though it did not.
  2. The tutorial glosses over the fact that you should first delete the existing data prior to restoring a backup. Otherwise, the restore process will only overwrite existing data, but leave other (obsolete) data in place. While this is not an issue if the target volume has just been created (and is therefore empty), the situation is different if the target volume already contained data, and the goal of the restore operation is to revert the volume to a previous state. Here, not deleting the old data first means that (after the restore process finished) your volume contains a mixture of data prior and after the point of time of restore, which can confuse the software reading the restored data.

How to do it wrong – HowToGeek tutorial

This tutorial is another one that pops up at the very top of the search result page of a big search engine. It basically adapts the above tutorial of the Docker folks to a realistic scenario, where you want to backup the volume of a MySQL container. By default, the MySQL engine running in the official mysql image reads and writes data to /var/lib/mysql.

At the time of writing this article, the tutorial does this:

  • Create a new container named mysql with a volume named mysql_data bound to /var/lib/mysql. The full command of the tutorial is:
    docker run -d --name mysql -v mysql_data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mysql mysql:8
  • To create the backup: see step 2 of the other Docker tutorial above: this tutorial does exactly the same, just using different mount paths. Full command:
    docker run --rm --volumes-from mysql -v $PWD:/backup-dir ubuntu tar cvf /backup-dir/mysql-backup.tar /var/lib/mysql
  • To restore the backup: see step 3 above. Full command:
    docker run --rm --volumes-from mysql -v $PWD:/backup-dir bash -c "cd /var/lib/mysql && tar xvf /backup-dir/mysql-backup.tar"

The HowToGeek tutorial has two problems:

  1. Following the tutorial to the letter does not work. The underlying reason is similar to reason #1 from above: the directory structure of the produced mysql-backup.tar file starts with “var“. In the restore process, the author does not use the --strip argument for tar at all, which means that the resulting restored directory structure of your backup ends up in /var/lib/mysql/var/lib/mysql where it won’t do any good.
  2. Same as reason #2 from above.

Also, both tutorials decided that wasting disk space is somehow a good idea, by using tar without any compression. No further questions, your honor.

How to correctly backup Docker volumes and restore them

In short, I recommend the following commands:

# Define the name of your volume, on macOS/Linux
VOLUME="replace with the name of your volume"
# Define the name of your volume, on Windows (PowerShell)
$VOLUME="replace with the name of your volume"

# Backup:
docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu tar cvzf /backup-dir/backup.tar.gz /data
# Restore:
docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; cd /data && tar xvzf /backup-dir/backup.tar.gz --strip 1"Code language: Bash (bash)

Note: do not be alarmed by the first two lines of the output of the restore command:

rm: refusing to remove '.' or '..' directory: skipping '/data/.'
rm: refusing to remove '.' or '..' directory: skipping '/data/..'Code language: JavaScript (javascript)

This output is generated by rm -rf /data/{*,.*} which deletes all files in the volume, including files starting with a dot. It is basically a shorthand for running two commands: rm -rf /data/* (which deletes all files and folders except for those starting with a dot) and rm -rf /data/.* (only deletes files starting with a dot). Unfortunately, this means that the rm command also tries to delete the (pseudo) files named “.” and “..” (the first two lines you see whenever you run ls -la), which fails, because they are not “real” files and thus cannot be deleted.

Here are a few pointers regarding my solution, and why this works better than the discussed tutorials:

  • We explicitly encode our knowledge about the volume to be backed up or restored into the backup & restore commands (particularly: the target path a backup is mounted to in the container that uses the volume). This is safer than relying on --volumes-from which hides the actual volumes (and their target paths) from you.
  • We add the z flag to tar, to compress the archive with gzip, which avoids wasting space.
  • We first delete the existing data in the volume, prior to restoring the tar.gz archive. Since the rm command exits with code 1 (because it cannot delete the pseudo files), we need to use a semicolon instead of && between the rm and the tar xvzf command, because && would have stopped executing after the first failing command.

I leave it as an exercise to the reader whether the following restore command would also have worked:

docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; tar xvzf /backup-dir/backup.tar.gz"

Making backups in production

The above scenario is actually just a “toy example”, for a “one-off” backup. Running the above command regularly (e.g. in a cron job) would product a full backup each time, which would consume too much space over time – even when using compression. There are better ways to do regular backups, using incremental and differential storage techniques. Also, you should store the backup on a remote location, not on the same file system where your data itself is stored.

Fortunately, there are dedicated tools which support these kinds of compressed, incremental remote backups. Battle-tested tools include:

While there are also higher-level tools that manage these tools, e.g. borgmatic or autorestic, I recommend that you keep it simple and rather use your chosen backup tool directly, learning about its intricacies. High-level tools add even more complexity, of which there is already plenty!

Conclusion

Although the task of making a backup of a Docker volume (or restoring it) just using tar seems simple, this is a great example of hidden complexity, where the devil is in the details. It is a signal to all DevOps folks out there: blindly following tutorials does not always work – you still need to understand the details.

Whenever possible, I live by the motto “if I don’t know why something works, I won’t know how to fix it once it fails”. Therefore, investing time into studying the tools further is time well spent.

This scenario also demonstrated that just because you do not see an error message, it does not mean that the executed command worked correctly.

Did you run into similar issues of this kind? Let me know in the comments. 

8 thoughts on “Backup Docker volumes (and restore them) – done right”

  1. hi,

    Thanks for the heads up on issues with backup and restore of Docker volumes using the command line. (Interestingly the backup and restore extension in Docker Desktop works fine but not much use on remote server. )

    I was still finding problems restoring the mysql data volume from the backup until I changed the strip value to 3 from 1 in your script — as it was prepending /www/html. Have I missed something…

    Here is my bash script which is basically yours with a few variable declarations added to make it more convenient.

    ##############################################################
    #!/bin/bash

    currentdir=”$(basename $PWD)”
    echo $currentdir

    ## docker automatically prepends the current directory to the volume names
    WP_container=$currentdir”_wordpress_1″
    Mysql_container=$currentdir”_db_1″

    WP_rootdir=/var/www/html
    Mysql_rootdir=/var/lib/mysql

    WP_volume=$currentdir”_wp_data”
    Mysql_volume=$currentdir”_db_data”

    echo “Restoring WP Vol – $WP_volume”

    #sleep 3

    docker run –rm -v “$WP_volume:$WP_rootdir/” -v “${PWD}:/backup” ubuntu bash -c “rm -rf $WP_rootdir/{*,.*}; cd $WP_rootdir && tar xvzf /backup/$WP_volume-backup.tar.gz –strip 3”

    sleep 5

    #echo “Restoring Mysql Vol – $Mysql_volume”

    sleep 3

    docker run –rm -v “$Mysql_volume:$Mysql_rootdir/” -v “${PWD}:/backup” ubuntu bash -c “rm -rf $Mysql_rootdir/{*,.*}; cd $Mysql_rootdir && tar xvzf /backup/$Mysql_volume-backup.tar.gz –strip 3”

    Reply
    • The reason why –strip 1 does not work is because it assumes that the backup-folder inside the container is mounted to the root level (in my example it is mounted to “/backup-dir”). In your script, you mount it to a more deeply-nested folder, and for each folder level you need to increase the –strip value by one.

      Reply
  2. Hi. I think you should point out in the article that your solution also suffers from the relative path issue, unless you adjust –strip as needed. But anyway, since tar uses the current working directory of the container, why not just skip the `cd` command altogether during restoring phase?

    Reply
    • Yes, you are right. I removed the You can of course exchange “/data” for any other destination in the container, e.g. “/var/lib/mysql“ phrase from the article. Thanks.

      Reply
  3. hey Marius!
    what are your toughts on running this backup while a PostgreSQL database is running?
    should the database be stopped before running it or do you think it is ok? any risks or colateral effects?

    Reply
  4. this is so useful and true, I came to similar conlusions and found the so called instructions that are popularly indexed by SEO to be sadly wanting.

    DR is not a nice to have and making reliable backups for Docker, Swarm or any other orchestration be it K8s, K3s or equivalent has to be doable and the example instructions non opaque.

    Thanks for posting this and for putting the options ( that work ) down

    Reply

Leave a Comment