What is Docker?
Docker is all about making it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package. By doing so, thanks to the container, the developer can rest assured that the application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code.
In a way, Docker is a bit like a virtual machine. But unlike a virtual machine, rather than creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they're running on and only requires applications be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.
And importantly, Docker is open source. This means that anyone can contribute to Docker and extend it to meet their own needs if they need additional features that aren't available out of the box.
Who is Docker for?
Docker is a tool that is designed to benefit both developers and system administrators, making it a part of many DevOps (developers + operations) toolchains. For developers, it means that they can focus on writing code without worrying about the system that it will ultimately be running on. It also allows them to get a head start by using one of thousands of programs already designed to run in a Docker container as a part of their application. For operations staff, Docker gives flexibility and potentially reduces the number of systems needed because of its small footprint and lower overhead.
What can Docker can do for you?
Docker solves many of the same problem that a VM solves, plus some other that VMs could solve if they didn’t were so resource intensive. Here are some of the things that Docker can deal with:
- Isolating an application dependencies
- Creating an application image and replicating it
- Creating ready to start applications that are easily distributable
- Allowing easy and fast scalation of instances
- Testing out applications and disposing them afterwards
The idea behind Docker is to create portable lightweight containers for software applicationsthat can be run on any machine with Docker installed, regardless of the underlying OS, akin to the cargo containers used on ships. Pretty ambitious, and they’re succeeding.
What does Docker do exactly?
In this section I will not be explaining what technologies Docker uses to do what it does, or what specific commands are available, that’s on the last section, here I’ll explain the resources and abstractions that Docker offers.
The two most important entities in Docker are images and containers. Aside from those, linksand volumes are also important. Let’s start with images.
Images
Images on Docker are like the snapshot of a virtual machine, but way more lightweight, way way way more lightweight (more on the next section).
There are several ways to create an image on Docker, most of them rely on creating an new image based on an already existing image, and since there are public images to pretty much everything you need, including for all the major linux distributions, it’s not likely that you will not find one that suit your needs. If you however feel the need to build and image from scratch, there are ways.
To create an image you take one image and modify it to create a child image. This can be done either through a file that specifies a base image and the modifications that are to be done, or live by “running” an image, modifying it and committing it. There are advantages to each method, but generally you’ll want to use a file to specify the changes.
Images have an unique ID, and an unique human-readable name and tag pair. Images can be called, for example, ubuntu:latest, ubuntu:precise, django:1.6, django:1.7, etc.
Containers
Now onto containers. From images you can create containers, this is the equivalent of creating a VM from a snapshot, but way more lightweight. Containers are the ones that run stuff.
Let use an example, you could download an image of ubuntu (there is a public repository of images called the docker registry), modify it by installing Gunicorn and your Django app with all its dependencies, and then create a container from that image that runs your app when it starts.
Containers, like VMs, are isolated (with one little caveat that I’ll discuss later). They also have an unique ID and a unique human-readable name. It’s necessary for containers to expose services, so Docker allows you to expose specific ports of a container.
Containers have one big difference that separate them from VMs, they are designed to run a single process, they don’t simulate well a complete environment (if that’s what you need check out LXC). You may be tempted to run a runit or supervisord instance and get several processes up,but it’s really not necessary (in my humble opinion).
The whole single process vs multiple processes is somewhat of an outstanding debate. You should know that the Docker designers heavily promote the "one process per container approach", and that the only case where you really have no other option but to run more than one process is to run something like ssh, to access the container while it is running for debugging purposes, however the command docker exec solves that problem.
Containers are desing thus, to run an application, not a machine. You may use containers as VMs, but as we will see, you would lose a lot of flexibility, since Docker gives tools to separate your application from your data, allowing you to update your running code/systems rapidly and easily without putting at risk your data.
Volumes
Volumes are how you persist data beyond the lifespan of your container. They are spaces inside the container that store data outside of it, allowing you to destroy/rebuild/change/exile-to-oblivion your container, keeping your data intact. Docker allows you to define what parts are yourapplication and what parts are your data, and gives you the tools to keep themseparated. One of the biggest changes in mindset that one must make when working with Docker is that containers should be ephemeral and disposable.
Volumes are specific to each container, you can create several containers from a single image and define different volumes for each. Volumes are stored in the filesystem of the host running Docker. Whatever is not a volume is stored in other type of filesystem, but more on that later.
Volumes can also be used to share data between containers, I recommend reading the volumes documentation for details.
Links
Links are another very important part of Docker.
Whenever a container is started, a random private IP is assigned to it, other containers can use this IP address to communicate with it. This is important for 2 reasons: first it provides a way for containers to talk to each other, second containers share a local network; I had a problem once when I started two elasticsearch containers for two clients on the same machine, but left the cluster name to the default setting, the two elasticsearch servers promptly made an unsolicited cluster
To allow intercontainer communication Docker allows you to reference other existing containers when spinning up a new one, those referenced containers receive an alias (that you specify) inside the container you just created. We say that the two containers are linked.
So if my DB container is already running, I can create my webserver container and reference the DB container upon creation, giving it an alias, dbapp for example. When inside my newly created webserver container I can use the hostname dbapp to communicate with my DB container at any time.
Docker takes it one step further, requiring you to state which ports a container will advertise to other containers when it is linked. When new containers link to a container that advertises ports, Docker will generate environment variables inside the new containers with information about the exposed ports.
Portability of Docker images
There is one caveat when creating images. Docker allows you specify volumes and ports in an image. Containers created from that image inherit those settings. However, Docker doesn’t allow you to specify anything on an image that is not portable.
You can define exposed ports, but only those ports that are exposed to other containers when links are created, you can’t specify ports exposed to the host, since you don't know wich ports will be available on the hosts that might use that image.
You can’t define links on an image either. Making a link requires you to reference another container by name, and you can't know beforehand how will the containers be named on every host that might use the image.
There are some other things that you can't do that are out of the scope of this post, but the point is that images must be completely portable, Docker doesn’t allow otherwise.
So those are the primary moving parts, you create images, use those to create containers that expose ports and have volumes if needed, and connect several containers together with links. How can this all work with little to no overhead?
How does Docker do what it needs to be done?
Two words: cgroups and union filesystems. Docker uses cgroups to provide container isolation, and union filesystem to store the images and make containers ephemeral.
Cgroups
This is a Linux kernel feature that makes two things possible:
- Limit resource utilization (RAM, CPU) for Linux process groups
- Make PID, UTS, IPC, Network, User and mount namespaces for process groups
The keyword here is namespace. A PID namespace, for example, permits processes in it to use PIDs isolated and independent of the main PID namespace, so you could have your own init process with a PID of 1 within a PID namespace. Analogous for all the other namespaces. You can then use cgroups to create an environment where processes can be executed isolated from the rest of your OS, but the key here is that the processes on this environment use your already loaded and running kernel, so the overhead is pretty much the same as running another process. Chroot is to cgroups what I am to The Hulk, Bane and Venom combined.
Union filesystems
An union filesystem allows a layered accumulation of changes through an union mount. In an union filesystem several filesystems can be mounted on top of each other, the result is a layered collection of changes. Each filesystem mounted represents a collection of changes to the previous filesystem, like a diff.
When you download an image, modify it, and store your new version, you’ve just made a new union filesystem to be mounted on top of the initial layers that conformed your base image. This makes Docker images very light, for example: your DB, Nginx and Syslog images can all share the same Ubuntu base, each one storing only the changes from this base that they need to function.
Tiny and small step-by-step
So, step 1. Install Docker.
The Docker cmd utilities need root permissions to work. You may include your user in the docker group to avoid having to sudo everything.
Step two, lets download an image from the public registry using the following command:
$> docker pull ubuntu:latest
ubuntu:latest: The image you are pulling has been verified
3b363fd9d7da: Pull complete
.....<bunch of downloading-stuff output>.....
8eaa4ff06b53: Pull complete
Status: Downloaded newer image for ubuntu:latest
$>
There are images for pretty much everything you may need on this public registry: Ubuntu, Fedora, Postgresql, MySQL, Jenkins, Elasticsearch, Redis, etc. The Docker developers maintain several images in the public registry, but the bulk of what you can pull from it come from users that publish their own creations.
There may be come a time when you need/want a private registry (for containers for developing apps and such), you should read this first. Now, there are ways to setup your own private registry.You could also just pay for one.
Step three, list your images:
$> docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
ubuntu latest 8eaa4ff06b53 4 days ago 192.7 MB
Step four, create a container from that image.
$> docker run --rm -ti ubuntu /bin/bash
root@4638a40c2fbb:/# ls
bin boot dev etc home lib lib64 media mnt opt proc root......
root@4638a40c2fbb:/# exit
Quick rundown of what you did on that last command:
- --rm: tells Docker to remove the container as soon as the process is running exits. Good for making tests and avoiding clutter
- -ti: tell Docker to allocate a pseudo tty and put me on interactive mode. This is for entering the container and is good for rapid prototyping and playing around, but for production containers you will not be turning these flags on
- ubuntu: this is the image we're basing the container on
- /bin/bash: the command to run, and since we started on interactive mode, it gives us a prompt to the container
On the run command you specify your links, volumes, ports, name of the container (Docker assings default name if you do not provide one) etc.
Now let's run a container on the background:
$> docker run -d ubuntu ping 8.8.8.8
31c68e9c09a0d632caae40debe13da3d6e612364198e2ef21f842762df4f987f
$>
The output is the assigned ID, yours will vary as it is random. Let's check out what our container is up to:
$> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
31c68e9c09a0 ubuntu:latest "ping 8.8.8.8" 2 minutes ago Up 2 minutes loving_mcclintock
There he is, his automated assigned human-readable name is loving_mcclintock. Now lets check inside the container to see what's happening:
$> docker exec -ti loving_mcclintock /bin/bash
root@31c68e9c09a0:/# ps -aux|grep ping
root 1 0.0 0.0 6504 636 ? Ss 20:46 0:00 ping 8.8.8.8
root@31c68e9c09a0:/# exit
What we just did is to execute a program inside the container, in this case the program was /bin/bash. The flags -ti serves the same purpose as in docker run, so it just placed us inside a shell in the container.