Docker Storage
By default all files created inside a container are stored on a writable container layer. This means that:
- The data doesn’t persist when that container no longer exists, and it can be difficult to get the data out of the container if another process needs it.
- A container’s writable layer is tightly coupled to the host machine where the container is running. You can’t easily move the data somewhere else.
- Writing into a container’s writable layer requires a storage driver to manage the filesystem. The storage driver provides a union filesystem, using the Linux kernel. This extra abstraction reduces performance as compared to using data volumes, which write directly to the host filesystem.
Docker has two options for containers to store files on the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts.
Keep reading for more information about persisting data or taking advantage of in-memory files.
Choose the right type of mount
No matter which type of mount you choose to use, the data looks the same from within the container. It is exposed as either a directory or an individual file in the container’s filesystem.
An easy way to visualize the difference among volumes, bind mounts, and tmpfs
mounts is to think about where the data lives on the Docker host.
Volumes are stored in a part of the host filesystem which is managed by Docker (
/var/lib/docker/volumes/
on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
tmpfs
mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
More details about mount types
Volumes: Created and managed by Docker. You can create a volume explicitly using the
docker volume create
command, or Docker can create a volume during container or service creation.When you create a volume, it is stored within a directory on the Docker host. When you mount the volume into a container, this directory is what is mounted into the container. This is similar to the way that bind mounts work, except that volumes are managed by Docker and are isolated from the core functionality of the host machine.
A given volume can be mounted into multiple containers simultaneously. When no running container is using a volume, the volume is still available to Docker and is not removed automatically. You can remove unused volumes using
docker volume prune
.When you mount a volume, it may be named or anonymous. Anonymous volumes are not given an explicit name when they are first mounted into a container, so Docker gives them a random name that is guaranteed to be unique within a given Docker host. Besides the name, named and anonymous volumes behave in the same ways.
Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities.
Bind mounts: Available since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full path on the host machine. The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist. Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available. If you are developing new Docker applications, consider using named volumes instead. You can’t use Docker CLI commands to directly manage bind mounts.
Bind mounts allow access to sensitive files
One side effect of using bind mounts, for better or for worse, is that you can change the host filesystem via processes running in a container, including creating, modifying, or deleting important system files or directories. This is a powerful ability which can have security implications, including impacting non-Docker processes on the host system.
Bind mounts and volumes can both be mounted into containers using the -v
or --volume
flag, but the syntax for each is slightly different. For tmpfs
mounts, you can use the --tmpfs
flag. We recommend using the --mount
flag for both containers and services, for bind mounts, volumes, or tmpfs
mounts, as the syntax is more clear.
Good use cases for volumes
Volumes are the preferred way to persist data in Docker containers and services. Some use cases for volumes include:
Sharing data among multiple running containers. If you don’t explicitly create it, a volume is created the first time it is mounted into a container. When that container stops or is removed, the volume still exists. Multiple containers can mount the same volume simultaneously, either read-write or read-only. Volumes are only removed when you explicitly remove them.
When the Docker host is not guaranteed to have a given directory or file structure. Volumes help you decouple the configuration of the Docker host from the container runtime.
When you want to store your container’s data on a remote host or a cloud provider, rather than locally.
When you need to back up, restore, or migrate data from one Docker host to another, volumes are a better choice. You can stop containers using the volume, then back up the volume’s directory (such as
/var/lib/docker/volumes/<volume-name>
).When your application requires high-performance I/O on Docker Desktop. Volumes are stored in the Linux VM rather than the host, which means that the reads and writes have much lower latency and higher throughput.
When your application requires fully native file system behavior on Docker Desktop. For example, a database engine requires precise control over disk flushing to guarantee transaction durability. Volumes are stored in the Linux VM and can make these guarantees, whereas bind mounts are remoted to macOS or Windows, where the file systems behave slightly differently.
Good use cases for bind mounts
In general, you should use volumes where possible. Bind mounts are appropriate for the following types of use case:
Sharing configuration files from the host machine to containers. This is how Docker provides DNS resolution to containers by default, by mounting
/etc/resolv.conf
from the host machine into each container.Sharing source code or build artifacts between a development environment on the Docker host and a container. For instance, you may mount a Maven
target/
directory into a container, and each time you build the Maven project on the Docker host, the container gets access to the rebuilt artifacts.If you use Docker for development this way, your production Dockerfile would copy the production-ready artifacts directly into the image, rather than relying on a bind mount.
When the file or directory structure of the Docker host is guaranteed to be consistent with the bind mounts the containers require.
Good use cases for tmpfs mounts
tmpfs
mounts are best used for cases when you do not want the data to persist either on the host machine or within the container. This may be for security reasons or to protect the performance of the container when your application needs to write a large volume of non-persistent state data.
Tips for using bind mounts or volumes
If you use either bind mounts or volumes, keep the following in mind:
If you mount an empty volume into a directory in the container in which files or directories exist, these files or directories are propagated (copied) into the volume. Similarly, if you start a container and specify a volume which does not already exist, an empty volume is created for you. This is a good way to pre-populate data that another container needs.
If you mount a bind mount or non-empty volume into a directory in the container in which some files or directories exist, these files or directories are obscured by the mount, just as if you saved files into
/mnt
on a Linux host and then mounted a USB drive into/mnt
. The contents of/mnt
would be obscured by the contents of the USB drive until the USB drive were unmounted. The obscured files are not removed or altered, but are not accessible while the bind mount or volume is mounted.
Image Storage
Let's imagine we want to pull a Docker image from a registry, like so:
$ sudo docker pull nginx
When you run this command, Docker will attempt to pull the nginx
image from the Docker Hub, which is a bit like GitHub but for Docker images. On the Docker Hub, you can see the descriptions of Docker images and take a look at their Dockerfiles, which contain the instructions that tell Docker how to build the image from the source.
Once the command completes, you should have the nginx
image in your local machine, being managed by your local Docker engine.
We can verify this is the case by listing the local images:
$ sudo docker images
You should see something like this:
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
nginx latest 5328fdfe9b8e 5 months ago 133.9 MB
Now, if we want to launch an nginx
container, the process is very fast because we already have the nginx
image stored locally.
We can launch it like so:
$ sudo docker run --name web1 -d -p 8080:80 nginx
This command maps port 80 of the container to port 8080 of the host machine. After it has run, you can connect to ip address:8080
to verify that nginx responds. but ensure the security group is open to 8080
But what's going on in the background, as far as this container's file system is concerned? To understand that, we need to look at the copy-on-write mechanism.
The Copy-on-Write Mechanism
When we launch an image, the Docker engine does not make a full copy of the already stored image. Instead, it uses something called the copy-on-write mechanism. This is a standard UNIX pattern that provides a single shared copy of some data, until the data is modified.
To do this, changes between the image and the running container are tracked. Just before any write operation is performed in the running container, a copy of the file that would be modified is placed on the writeable layer of the container, and that is where the write operation takes place. Hence the name, "copy-on-write".
If this wasn't happening, each time you launched an image, a full copy of the filesystem would have to be made. This would add time to the startup process and would end up using a lot of disk space.
Because of the copy-on-write mechanism, running containers can take less than 0.1 seconds to start up, and can occupy less than 1MB on disk. Compare this to Virtual Machines (VMs), which can take minutes and can occupy gigabytes of disk space, and you can see why Docker has seen such fast adoption.
But how is the copy-on-write mechanism implemented? To understand that, we need to take a look at the Union File System.
Docker has the benefit of being a complete product (the "batteries included" model) but also providing pluggability in case you want to add things.
To see what storage driver your Docker engine is using, run:
$ sudo docker info
If you're using the Docker default storage driver, you should see something like this:
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.11.0-1022-aws
Operating System: Ubuntu 20.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.832GiB
Name: ip-172-31-82-169
ID: UGCB:UAS4:6JZB:AKLS:NGNE:HPT5:HBTQ:ZOH2:552N:BZCN:M3LO:B2NJ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Notice the Storage Driver: overlay2
line in this output. That means we're using the stock overlay2 driver.
That's all we'll say about storage drivers for now, as there's way too much to cover in this post. If you want to know more, the official docs are a good place to start.
Let's look at the way Docker works with app generated data.
Volumes
A volume is a directory mounted inside a container that exists outside of the union file system. They are created via a Dockerfile, or the Docker CLI tool. The volume can map to an existing directory on the host machine, or remote NFS device.
The directory a volume maps to exists independently from any containers that mount it. This means you can create containers, write to volumes, and then destroy the containers again, without fear of losing any app data.
Volumes are great when you need to share data (or state) between containers, by mounting the same volume in multiple containers. Though take note: it's important to implement locks or some other concurrent write access protection.
They're also great when you want to share data between containers and the host machines, for example accessing source code.
Another common use is of volumes is when you're dealing with large files, such as logs or databases. That's because writing to a volume is faster than writing to the union file system, which uses the (IO expensive) copy-on-write mechanism.
To demonstrate the power of volumes and how to use them, let's look at two scenarios.
RUNNING A CONTAINER WITH A VOLUME FLAG
Launch a container with -v
, the volume flag:
$ sudo docker run -d -v /code -p 8080:80 --name mynginx nginx
This creates a procedurally named directory (which we will look at shortly) on the host machine and then maps it to the /code
directory in the container.
You can see the volume has been created and mounted with this command:
$ sudo docker inspect mynginx
You should see a long JSON-like output like this:
"Mounts": [
{
"Name": "12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a",
"Source": "/var/lib/docker/volumes/12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a/_data",
"Destination": "/code",
"Driver": "local",
"Mode": "",
"RW": true
},
],
"Image": "nginx",
"Volumes": {
"/code": {},
"/var/cache/nginx": {}
},
This output confirms the creation of the volume at the docker engine level as well as the mapping to the container’s /code
directory. Also take note of /var/lib/docker/volumes/12f6[...]/_data
, being the the volume path. We will use this path to access our data on the host machine.
Okay, next, grab a shell inside the container:
$ sudo docker exec -it mynginx /bin/bash
Check the /code
directory exists:
$ ls
bin boot code dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
Change to the /code directory:
$ cd code
Write something to a test file:
$ echo Hello > myfile
And exit the container:
$ exit
Cool. So we just wrote some data to a file inside the volume mount inside our container. Let's look inside that directory on the host machine we saw in the docker inspect
output above to see if we can find the data we wrote.
Login as the superuser, so you can access the Docker lib
files:
$ sudo –i
Now, change to the directory listed in the previous docker inspect output:
$ cd /var/lib/docker/volumes/12f6b6d488484c65bedcda8300166d76e6879a496ce2d0742ab23981621c8b1a/_data
Check the contents of the directory:
$ ls
myfile
Bingo! That's the file we created inside the container.
You can even run cat myfile
if you want to check the contents are the same. Or additionally, you could modify the contents here and then grab a shell inside the container and check that it has been updated there.
CREATE ENGINE LEVEL VOLUMES AND STORAGE FOR TRANSIENT CONTAINERS
Since Docker 1.9, it is possible to create volumes using the Docker API.
You can create a volume via the Docker API like this:
$ sudo docker volume create vol1
$ $ sudo docker volume ls DRIVER VOLUME NAME local vol1
$ sudo docker volume inspect vol1
[
{
"CreatedAt": "2022-02-28T05:11:07Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/vol1/_data",
"Name": "vol1",
"Options": {},
"Scope": "local"
}
]
Now, let's run a little test by cd into this location
root@dockerboss:/var/lib/docker/volumes/vol1/_data# docker run -itd -v vol1:/logs ubuntu
What's happening here?
So, let's see if we can find that file on our host machine.
You can then read this file, write to it, and so on. And everything you do will be reflected inside the container. And vice versa.
Conclusion
In this post on Docker storage, we saw:
How docker images are stored locally by the Docker engine
How the copy-on-write mechanism and the union file system optimize storage and start up time for Docker containers
The variety of storage drivers compatible with Docker
How volumes provide shared persistent data for Docker containers
Install Docker
Purpose | Command |
---|---|
Install Docker | yum install docker -y |
Enable Docker Service | systemctl enable docker |
Start Docker Service | systemctl start docker |
Check Docker service | systemctl status docker |
Add user to group | usermod -a -G docker ec2-user |
Check Docker Version | docker version |
Check Docker Info | docker info |
Login to DockerHub | docker login |
Logout of DockerHub | docker logout |
Docker Images
Purpose | Command |
---|---|
List images | docker images |
Pull image from DockerHub | docker pull <img> |
Rename image | docker image tag <img> <new-name> |
Push images to DockerHub | docker push <img name> |
Remove image | docker rmi <img> |
Remove unused images | docker image prune -a |
Remove all images | docker rmi $(docker images -q) |
Docker Containers
Containers are isolated execution environments.
Purpose | Command |
---|---|
List continers | docker ps -a |
Create a container form an image | docker run -it -d -p 8080:80 --name service1 httpd |
Rename container | docker rename <container> <new name> |
Stop container | docker stop <container> |
Start container | docker start <container> |
Restart container | docker restart <container> |
Create a container with a link | docker run -it -d --name service2 --link service1 ubuntu |
Execute container | docker exec -it <container> bash |
Create an image from a container | docker commit <container> <new-imgage-name> |
Remove container | docker rm <container> |
Removes all stopped containers | docker container prune |
kill all running containers | docker kill $(docker ps -q) |
Docker Storage
Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts.
Purpose | Command |
---|---|
Create a volume | docker volume create <vol name> |
List volumes | docker volume ls |
Inspect a volume | docker volume inspect <vol name> |
Remove a volume | docker volume rm <vol name> |
Start a container with a new volume | docker run -d --name service1 --mount source=vol1,target=/app nginx |
Start a container with an existing volume | docker run -it -d --name service2 -v /home/ec2-user/storage:/testfile ubuntu |
Remove unused volumes | docker volume prune |
Docker File
vim dockerfile
Create a dockerfile FROM ubuntu
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update -y
RUN apt install -y apache2
ADD . /var/www/html
ENTRYPOINT apachectl -D FOREGROUND
ENV name DevOps
FROM ubuntu
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update -y
RUN apt install -y apache2
ADD . /var/www/html
ENTRYPOINT apachectl -D FOREGROUND
ENV name DevOps
vim index.html
Create an index.html file <html>
<title> Hello from CodingDojo </title>
<body> Hello world! </body>
</html>
<html>
<title> Hello from CodingDojo </title>
<body> Hello world! </body>
</html>
Purpose | Command |
---|---|
Build an image from dockerfile | docker build -t dockerfile . |
Create a container form the image | docker run -d -p 8080:80 dockerfile |
http://publicIP:8080
To access the container Docker Swarm
Docker swarm is a container orchestration tool, meaning that it allows the user to manage multiple containers deployed across multiple host machines. One of the key benefits associated with the operation of a docker swarm is the high level of availability offered for applications.
Purpose | Command |
---|---|
Initialize a swarm | docker swarm init |
Join a node to a swarm | docker swarm join --token <token> HOST:PORT |
Leaves the swarm | docker swarm leave --force |
List nodes in the swarm | dokcer node ls |
Promote node to manager | docker node promote <node> |
Demote node from manager | docker node demote <node> |
Creates a service | docker service create alpine ping 8.8.8.8 |
List services | docker service ls |
Lists the tasks that are running | docker service ps <service name> |
Updates a service | docker service update <service ID> --replicas 3 |
No comments:
Post a Comment