This article explains how to use “tar” correctly, to backup Docker volumes and restore them. I explain why two top-ranked tutorials are not doing a good job, by taking them apart. Finally, I give hints for creating backups of Docker volumes in production.
Introduction
Making a local backup of a Docker volume and restoring it is a common task if you operate container-based software with Docker engine directly (not using Kubernetes). Unfortunately, most search results you find on the internet for a query such as “Docker volume backup” yield incorrect and dangerous tutorials, whose makers have either not understood how the underlying tools work, or have not bothered explaining some obscure command line parameters which you need to adapt. In this piece, I will explain why these tutorials are wrong, and how to actually correctly back up and restore Docker volumes.
How to do it wrong – official Docker tutorial
Case study #1 is the official Docker tutorial. Following their commands to the letter does work, but adapting it to your (real-world) use case, e.g. a MySQL container, will fail.
At the time of writing this article, the tutorial does this:
- Create a new container named
dbstore
that creates and mounts an anonymous volume (if you do not know what this means, look for “anonymous” here) to/dbdata
in the container. The ultimate goal is to back up and restore the content of that anonymous volume. The full command in the tutorial is:docker run -v /dbdata --name dbstore ubuntu /bin/bash
- Create a temporary container using the
ubuntu:latest
image, used to back up the volume. That temporary container is given two volume mounts: the one from thedbstore
container (using--volumes-from dbstore
) and a bind mount that maps the working directory of the host to/backup
in the container. In the container, they run “tar cvf /backup/backup.tar /dbdata
“. The full command is:docker run --rm --volumes-from dbstore -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
- To restore the backup, they run another temporary
ubuntu
container with the same volume mounts as used the previous temporaryubuntu
container. Inside the container, they run multiple commands, so they have to change the start command (CMD
) tobash -c "..."
to make this work. The full command is:docker run --rm --volumes-from dbstore2 -v $(pwd):/backup ubuntu bash -c "cd /dbdata && tar xvf /backup/backup.tar --strip 1"
Let’s analyze the reasons why following that tutorial is not a good idea:
- The tutorial assumes that your volume is mounted to some path on the root level in the container, e.g.
/dbdata
in the tutorial. In practice, most volumes are mounted into a deeply nested path, e.g./var/lib/mysql
for a MySQL container. So if you were to follow the Docker tutorial and simply replaced/dbdata
with/var/lib/mysql
, the directory structure in the created backup archive starts with the relative path “var
“. The reason is that the directory structure thattar
creates is relative to the current container’s working directory, if possible, or relative to “/
” (root) otherwise. In this case, the working directory of theubuntu
container is already/
(root), thereforetar
creates the directory structure starting from there. When restoring the backup, the tutorial uses--strip 1
which tellstar
to remove only the very first segment of the paths in the tar file. For the scenario of the tutorial this works fine:dbdata
is stripped, becausecd /dbdata
made sure the working directory (the extraction destination) is set correctly. But for a MySQL backup, tar will try to extract a “lib
” folder into/var/lib/mysql
, so you end up with your backup data being restored to/var/lib/mysql/lib/…
which won’t work. If you adapted that tutorial to MySQL and observed thattar
does not throw any errors, it looks as if the restore process had worked, even though it did not. - The tutorial glosses over the fact that you should first delete the existing data prior to restoring a backup. Otherwise, the restore process will only overwrite existing data, but leave other (obsolete) data in place. While this is not an issue if the target volume has just been created (and is therefore empty), the situation is different if the target volume already contained data, and the goal of the restore operation is to revert the volume to a previous state. Here, not deleting the old data first means that (after the restore process finished) your volume contains a mixture of data prior and after the point of time of restore, which can confuse the software reading the restored data.
How to do it wrong – HowToGeek tutorial
This tutorial is another one that pops up at the very top of the search result page of a big search engine. It basically adapts the above tutorial of the Docker folks to a realistic scenario, where you want to backup the volume of a MySQL container. By default, the MySQL engine running in the official mysql
image reads and writes data to /var/lib/mysql
.
At the time of writing this article, the tutorial does this:
- Create a new container named
mysql
with a volume namedmysql_data
bound to/var/lib/mysql
. The full command of the tutorial is:docker run -d --name mysql -v mysql_data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=mysql mysql:8
- To create the backup: see step 2 of the other Docker tutorial above: this tutorial does exactly the same, just using different mount paths. Full command:
docker run --rm --volumes-from mysql -v $PWD:/backup-dir ubuntu tar cvf /backup-dir/mysql-backup.tar /var/lib/mysql
- To restore the backup: see step 3 above. Full command:
docker run --rm --volumes-from mysql -v $PWD:/backup-dir bash -c "cd /var/lib/mysql && tar xvf /backup-dir/mysql-backup.tar"
The HowToGeek tutorial has two problems:
- Following the tutorial to the letter does not work. The underlying reason is similar to reason #1 from above: the directory structure of the produced
mysql-backup.tar
file starts with “var
“. In the restore process, the author does not use the--strip
argument fortar
at all, which means that the resulting restored directory structure of your backup ends up in/var/lib/mysql/var/lib/mysql
where it won’t do any good. - Same as reason #2 from above.
Also, both tutorials decided that wasting disk space is somehow a good idea, by using tar
without any compression. No further questions, your honor.
How to correctly backup Docker volumes and restore them
In short, I recommend the following commands:
# Define the name of your volume, on macOS/Linux
VOLUME="replace with the name of your volume"
# Define the name of your volume, on Windows (PowerShell)
$VOLUME="replace with the name of your volume"
# Backup:
docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu tar cvzf /backup-dir/backup.tar.gz /data
# Restore:
docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; cd /data && tar xvzf /backup-dir/backup.tar.gz --strip 1"
Code language: Bash (bash)
Note: do not be alarmed by the first two lines of the output of the restore command:
rm: refusing to remove '.' or '..' directory: skipping '/data/.'
rm: refusing to remove '.' or '..' directory: skipping '/data/..'
Code language: JavaScript (javascript)
This output is generated by rm -rf /data/{*,.*}
which deletes all files in the volume, including files starting with a dot. It is basically a shorthand for running two commands: rm -rf /data/*
(which deletes all files and folders except for those starting with a dot) and rm -rf /data/.*
(only deletes files starting with a dot). Unfortunately, this means that the rm
command also tries to delete the (pseudo) files named “.” and “..” (the first two lines you see whenever you run ls -la
), which fails, because they are not “real” files and thus cannot be deleted.
Here are a few pointers regarding my solution, and why this works better than the discussed tutorials:
- We explicitly encode our knowledge about the volume to be backed up or restored into the backup & restore commands (particularly: the target path a backup is mounted to in the container that uses the volume). This is safer than relying on
--volumes-from
which hides the actual volumes (and their target paths) from you. - We add the
z
flag totar
, to compress the archive with gzip, which avoids wasting space. - We first delete the existing data in the volume, prior to restoring the
tar.gz
archive. Since therm
command exits with code 1 (because it cannot delete the pseudo files), we need to use a semicolon instead of&&
between therm
and thetar xvzf
command, because&&
would have stopped executing after the first failing command.
I leave it as an exercise to the reader whether the following restore command would also have worked:
docker run --rm -v "${VOLUME}:/data" -v "${PWD}:/backup-dir" ubuntu bash -c "rm -rf /data/{*,.*}; tar xvzf /backup-dir/backup.tar.gz"
Making backups in production
The above scenario is actually just a “toy example”, for a “one-off” backup. Running the above command regularly (e.g. in a cron job) would product a full backup each time, which would consume too much space over time – even when using compression. There are better ways to do regular backups, using incremental and differential storage techniques. Also, you should store the backup on a remote location, not on the same file system where your data itself is stored.
Fortunately, there are dedicated tools which support these kinds of compressed, incremental remote backups. Battle-tested tools include:
- Duplicity, which is also dockerized as
wernight/duplicity
(non-official, but well-established) - Restic, dockerized as
restic/restic
(official) - Borg, dockerized as
horaceworblehat/borg-server
(non-official, rather new)
While there are also higher-level tools that manage these tools, e.g. borgmatic or autorestic, I recommend that you keep it simple and rather use your chosen backup tool directly, learning about its intricacies. High-level tools add even more complexity, of which there is already plenty!
Conclusion
Although the task of making a backup of a Docker volume (or restoring it) just using tar
seems simple, this is a great example of hidden complexity, where the devil is in the details. It is a signal to all DevOps folks out there: blindly following tutorials does not always work – you still need to understand the details.
Whenever possible, I live by the motto “if I don’t know why something works, I won’t know how to fix it once it fails”. Therefore, investing time into studying the tools further is time well spent.
This scenario also demonstrated that just because you do not see an error message, it does not mean that the executed command worked correctly.
Did you run into similar issues of this kind? Let me know in the comments.
hi,
Thanks for the heads up on issues with backup and restore of Docker volumes using the command line. (Interestingly the backup and restore extension in Docker Desktop works fine but not much use on remote server. )
I was still finding problems restoring the mysql data volume from the backup until I changed the strip value to 3 from 1 in your script — as it was prepending /www/html. Have I missed something…
Here is my bash script which is basically yours with a few variable declarations added to make it more convenient.
##############################################################
#!/bin/bash
currentdir=”$(basename $PWD)”
echo $currentdir
## docker automatically prepends the current directory to the volume names
WP_container=$currentdir”_wordpress_1″
Mysql_container=$currentdir”_db_1″
WP_rootdir=/var/www/html
Mysql_rootdir=/var/lib/mysql
WP_volume=$currentdir”_wp_data”
Mysql_volume=$currentdir”_db_data”
echo “Restoring WP Vol – $WP_volume”
#sleep 3
docker run –rm -v “$WP_volume:$WP_rootdir/” -v “${PWD}:/backup” ubuntu bash -c “rm -rf $WP_rootdir/{*,.*}; cd $WP_rootdir && tar xvzf /backup/$WP_volume-backup.tar.gz –strip 3”
sleep 5
#echo “Restoring Mysql Vol – $Mysql_volume”
sleep 3
docker run –rm -v “$Mysql_volume:$Mysql_rootdir/” -v “${PWD}:/backup” ubuntu bash -c “rm -rf $Mysql_rootdir/{*,.*}; cd $Mysql_rootdir && tar xvzf /backup/$Mysql_volume-backup.tar.gz –strip 3”
The reason why –strip 1 does not work is because it assumes that the backup-folder inside the container is mounted to the root level (in my example it is mounted to “/backup-dir”). In your script, you mount it to a more deeply-nested folder, and for each folder level you need to increase the –strip value by one.
Hi Marius,
The mystery is resolved.
Merci mille fois
Nick
Hi. I think you should point out in the article that your solution also suffers from the relative path issue, unless you adjust –strip as needed. But anyway, since tar uses the current working directory of the container, why not just skip the `cd` command altogether during restoring phase?
Yes, you are right. I removed the You can of course exchange “/data” for any other destination in the container, e.g. “/var/lib/mysql“ phrase from the article. Thanks.
hey Marius!
what are your toughts on running this backup while a PostgreSQL database is running?
should the database be stopped before running it or do you think it is ok? any risks or colateral effects?
Hey, as a tip, this is a question often asked, so searching for it on the internet will quickly give you an answer. See e.g. https://stackoverflow.com/questions/3380515/can-i-just-backup-postgress-directory-while-its-running
this is so useful and true, I came to similar conlusions and found the so called instructions that are popularly indexed by SEO to be sadly wanting.
DR is not a nice to have and making reliable backups for Docker, Swarm or any other orchestration be it K8s, K3s or equivalent has to be doable and the example instructions non opaque.
Thanks for posting this and for putting the options ( that work ) down
I came up with the following useful options from tar:
– “-C” to substitute the cd commands
– “–one-file-system” to avoid backing up other mount points in the directory
– “–recursive-unlink” to avoid the rm and substitute the contents of the tar archive directly
backup:
“`
docker run –rm –volumes-from open-webui -v ~/backup:/backup ubuntu tar czvf /backup/open-webui.tar.gz –one-file-system -C /app/backend/data .
“`
restore:
“`
docker stop open-webui || true
docker run –rm –volumes-from open-webui -v ~/backup:/backup ubuntu tar xzvf /backup/open-webui.tar.gz –recursive-unlink -C /app/backend/data
docker start open-webui
“`
Thanks for mentioning this. However, careful testing needs to be done, as –recursive-unlink may not behave the way you expect it to. See also https://stackoverflow.com/questions/7933680/remove-directory-structure-when-extracting-tar-archive
That SO post, you are citing is from 13y ago and was not marked as solved. If you use the tar command with “-C” option, it works as expected. I updated the SO (getting me probably the gravedigger badge and a lot of anger from the mods).
I recommend you try it yourself if it works for you. Would be also happy to know if it is indeed not working as expected.