Docker data volumes are designed to solve one of the deep paradoxes of containers, which is this: For the very same reasons that containers make apps highly portable — and, by extension, create more nimble data centers — they also make it hard to store data persistently. That’s because, by design, containerized apps are ephemeral. Once you shut down a container, everything inside it disappears. That makes your data center more flexible and secure, since it lets you spin up apps rapidly based on clean images. But it also means that data stored inside your containers disappears by default.
How do you resolve this paradox? There are actually several ways. You could jerry-rig a system for loading data into a container each time it is spun up ( via SSH, for example), then exporting it somehow, but that’s messy. You could also turn to traditional distributed storage systems, like NFS, which you can access directly over the network. But that won’t work well if you have a complicated (software-defined) networking situation (and you probably do in a large data center). You’d think someone would have solved the Docker container storage challenge in a more elegant way by now — and someone has! Docker data volumes provide a much cleaner, straightforward way to provide persistent data storage for containers.
That’s what I’ll cover here. Keep reading for instructions on setting up and deploying Docker data volumes (followed by brief notes on storing data persistently directly on the host).
Creating a Docker Data Volume
To use a data volume in Docker, you first need to create a container to host the volume. This is pretty basic. Just use a command like:
docker create -v /some/directory mydatacontainer debian
This command tells Docker to create a new container named mydatacontainer based on the Debian Docker image. (You could use any of Docker’s other OS images here, too.) Meanwhile, the -v flag in the command above sets up a storage container in the directory /some/directory inside the container.
To repeat: That means the data is stored at /some/directory inside the container called mydatacontainer — not at /some/directory on your host system.
The beauty of this, of course, is that we can now write data to /some/directory inside this container, and it will stay there as long as the container remains up.
Using a Data Volume in Docker
So that’s all good and well. But how do you actually get apps to use the new data volume you created?
Pretty easily. The next and final step is just to start another container, using the –volumes-from flag to tell Docker that this new container should store data in the data volume we created in the first container.
Our command would look something like this:
docker run --volumes-from mydatacontainer --volumes-from debian
Now, any data changes made inside the container debian will be saved inside mydatacontainer at the directory /some/directory.
And they’ll stay there if you stop debian — which means this is a persistent data storage solution. (Of course, if you stop mycontainervolume, then you’ll also lose the data inside.)
You can have as many data volumes as you want, by the way. Just specify multiple ones when you run the container that will access the volumes.
Data Storage on Host instead of a container?
You may be thinking, “What if I want to store my data directly on the host instead of inside another container?”
There’s good news. You can do that, too. We won’t use data storage volumes for this, though. Instead, we’ll run a command like:
docker run -v /host/dir:/container/dir -i image
This starts a new container based on the image image and maps the directory /host/dir on the host system to the directory /container/dir inside the container. That means that any data that is written by the container to /container/dir will also appear inside /host/dir on the host, and vice versa.
There you have it. You can now have your container data and eat it, too. Or something like that.
About the Author
Hemant Jain is the founder and owner of Rapidera Technologies, a full service software development shop. He and his team focus a lot on modern software delivery techniques and tools. Prior to Rapidera he managed large scale enterprise development projects at Autodesk and Deloitte.
Managing Container Data Using Docker Data Volumes is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.