Those are a personal notes about Docker Swarm, derivated by Documentation. After theory part, there is a concrete example. Enjoy!
Docker Swarm is a Docker-native clustering system. It turns a pool of Docker hosts into a single, virtual host.
Feature highlights
- Cluster management integrated with Docker Engine
- Scaling
- Desired state reconciliation
- Multi-host networking
- Load balancing
- Secure by default
- Rolling updates
Key Concepts
A swarm consists of multiple Docker hosts which run in swarm mode and act as managers and workers. A given Docker hosts can be a manager, a worker or perform both roles.
When you create a service, you define its optimal state (number of replicas, network and storage resources available to it, ports the service exposes to the outside world, and more). Docker works to maintain that desired state. If a worker becomes unavailable, Docker schedules that’s tasks on other nodes. (A task is a running container which is part of swarm service and managed by a swarm manager).
You can modify a service’s configuration without the need to manually restart the service. Docker will update the swarm to fit new configuration.
Node
A node is an instance of Docker engine. You can run one or more nodes on a single physical computer (docker-machine
), but typically each node runs on different machines.
Load balancing
The swarm manager uses ingress load balancing to expose the services you want to make available externally to the swarm. Swarm mode has an internal DNS component that automatically assigns each service in the swarm a DNS entry. The swarm manager uses internal load balancing to distribute requests among services within the cluster based upon the DNS name of the service.
How Works
Nodes
Manager nodes
Manager nodes handle cluster management tasks:
- Maintaining cluster state
- scheduling services
- serving swarm mode HTTP API endpoints
The internal state is maintained using a Raft algorithm. If a manager fails (and there are no more managers), the service will continue to run, but you will need to create a new cluster to recover.
Worker nodes
Worker nodes are instances of Docker Engine whose unique purpose is to execute containers. By default, all managers are also workers. To prevent this, set --availability
to Drain
. The scheduler gracefully stops tasks on nodes in Drain mode and schedules the tasks on an Active node.
Services
A service is frequently an image for a microservice within the context of some larger application (HTTP Server, database, or any other distributed environment).
When you define a service, you specify, optionally, in addition to image,
- the port where the swarm will make the service available outside the swarm
- an overlay network to connect to other services in the swarm
- CPU and memory limits
- a rolling update policy
- the number of replicas of the image to run in the swarm
The services can be of two types:
- replicated, when you specify the number of identical tasks you want to run (web services…)
- global, when runs one task on every node (monitoring agents, antivirus…)
Security
The swarm mode public key infrastructure (PKI) system build into Docker makes it simple to securely deploy a container orchestration system. The nodes use TLS to authenticate, authorize and encrypt the communication with other nodes. When you create a new swarm manager, it creates a new root Certificate Authority which is used to secure communication with other nodes.
Deploy a service
docker service create --name web nginx
This command create a service, called “web”, that run nginx image. Now you can change all the configurations using docker service update
command. docker service update --publish-add 80 web
To remove a service, simply do, docker service remove web
.
There are lots of configuration that can be set to run a service.
Alessandros-MBP:~ alessandro$ docker service create --help
Usage: docker service create [OPTIONS] IMAGE [COMMAND] [ARG...]
Create a new service
Options:
--config config Specify configurations to expose to the service
--constraint list Placement constraints
--container-label list Container labels
--credential-spec credential-spec Credential spec for managed service account (Windows only)
-d, --detach Exit immediately instead of waiting for the service to converge (default true)
--dns list Set custom DNS servers
--dns-option list Set DNS options
--dns-search list Set custom DNS search domains
--endpoint-mode string Endpoint mode (vip or dnsrr) (default "vip")
--entrypoint command Overwrite the default ENTRYPOINT of the image
-e, --env list Set environment variables
--env-file list Read in a file of environment variables
--group list Set one or more supplementary user groups for the container
--health-cmd string Command to run to check health
--health-interval duration Time between running the check (ms|s|m|h)
--health-retries int Consecutive failures needed to report unhealthy
--health-start-period duration Start period for the container to initialize before counting retries towards unstable (ms|s|m|h)
--health-timeout duration Maximum time to allow one check to run (ms|s|m|h)
--help Print usage
--host list Set one or more custom host-to-IP mappings (host:ip)
--hostname string Container hostname
-l, --label list Service labels
--limit-cpu decimal Limit CPUs
--limit-memory bytes Limit Memory
--log-driver string Logging driver for service
--log-opt list Logging driver options
--mode string Service mode (replicated or global) (default "replicated")
--mount mount Attach a filesystem mount to the service
--name string Service name
--network network Network attachments
--no-healthcheck Disable any container-specified HEALTHCHECK
--no-resolve-image Do not query the registry to resolve image digest and supported platforms
--placement-pref pref Add a placement preference
-p, --publish port Publish a port as a node port
-q, --quiet Suppress progress output
--read-only Mount the container's root filesystem as read only
--replicas uint Number of tasks
--reserve-cpu decimal Reserve CPUs
--reserve-memory bytes Reserve Memory
--restart-condition string Restart when condition is met ("none"|"on-failure"|"any") (default "any")
--restart-delay duration Delay between restart attempts (ns|us|ms|s|m|h) (default 5s)
--restart-max-attempts uint Maximum number of restarts before giving up
--restart-window duration Window used to evaluate the restart policy (ns|us|ms|s|m|h)
--rollback-delay duration Delay between task rollbacks (ns|us|ms|s|m|h) (default 0s)
--rollback-failure-action string Action on rollback failure ("pause"|"continue") (default "pause")
--rollback-max-failure-ratio float Failure rate to tolerate during a rollback (default 0)
--rollback-monitor duration Duration after each task rollback to monitor for failure (ns|us|ms|s|m|h) (default 5s)
--rollback-order string Rollback order ("start-first"|"stop-first") (default "stop-first")
--rollback-parallelism uint Maximum number of tasks rolled back simultaneously (0 to roll back all at once) (default 1)
--secret secret Specify secrets to expose to the service
--stop-grace-period duration Time to wait before force killing a container (ns|us|ms|s|m|h) (default 10s)
--stop-signal string Signal to stop the container
-t, --tty Allocate a pseudo-TTY
--update-delay duration Delay between updates (ns|us|ms|s|m|h) (default 0s)
--update-failure-action string Action on update failure ("pause"|"continue"|"rollback") (default "pause")
--update-max-failure-ratio float Failure rate to tolerate during an update (default 0)
--update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 5s)
--update-order string Update order ("start-first"|"stop-first") (default "stop-first")
--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
-u, --user string Username or UID (format: <name|uid>[:<group|gid>])
--with-registry-auth Send registry authentication details to swarm agents
-w, --workdir string Working directory inside the container
Update
As seen, you can specify how update a service. With --update-dalay
flag configures the time dalay between updates to a service task or sets of tasks (and can set parallelism actions with --update-parallelism
). If an update succeeds, a new tasks is updated, else, if the update fails, an action can be performed (pause|continue|rollback
).
Volumes
You can create two types of mounts for services in a swarm
volume
mounts, orbind
mounts. You can configure it using the--mount
flag on creating phase, or--mount-add|--mount-rm
on updating phase.
Data Volumes
Data Volume is a storage that remains alive after a container for a task has been removed. The preferred method is to leverage an existing volume:
$ docker service create \
--mount src=<VOLUME-NAME>,dst=<CONTAINER-PATH> \
--name myservice \
<IMAGE>
You can also create a volume at deployment time, just before starting the container.
--mount type=volume,src=<VOLUME-NAME>,dst=<CONTAINER-PATH>,volume-driver=<DRIVER>,volume-opt=<KEY0>=<VALUE0>,volume-opt=<KEY1>=<VALUE1>
Bind Mounts
Bind mounts are file system paths from the host. Docker mounts the path into the container.
docker service create \
--mount type=bind,src=<HOST-PATH>,dst=<CONTAINER-PATH> \
--name myservice \
<IMAGE>
Some problems can occur with bind mounts:
- if you bind mount a host path into a service’s container, the path must exist on every swarm node.
- The Docker swarm mode scheduler may reschedule a running service containers at any time if they became unhealthy or unreachable
- Host bind mounts are completely non-portable. When you use bind mounts, there is no guarantee that your application will run the same way in development as it does in production.
Manage sensitive data with Docker secrets
A secret is a blob of data (such a password, SSH private key, SSl certificate…) that should not be transmitted over a network or store unencrypted in a Dockerfile or in your application’s source code.
When you add a secret to the swarm, Docker sends the secret to the swarm manager over a mutual TLS connection. The location of the mount point within the container defaults to /run/secrets/<secret_name
(Linux containers) or C:\ProgramData\Docker\secrets
(Windows containers).
Scale the service
In order to scale the service in the swarm, access to a manager and then execute:
docker service scale <SERVICE-ID>=<NUMBER-OF-TASKS>
Docker Swarm doesn’t implement any tool to scale up or down in accordance to load factor. To achieve this feature you can use Orbiter.
Let's play!!!
Let’s try Docker Swarm. We are going to create a single, usual container on Docker Engine, with Apache Webserver (httpd
Docker image), and try to saturate it with Apache Benchmark
tool. After that, we are going to create a Docker Swarm Service and try to see if the network is able to resist to Apache Benchmark
with the same amount of request.
Docker standard container
We deploy a service, starting a container with the following command:
docker run --rm -it --name web -p 8080:80 -v web:/usr/local/apache2/htdocs/ httpd:latest
(Some problems may occur on MacOS while trying to mount a volume in this way. Try to insert absolute path to web
directory instead of relative path).
Now check that all works going on http://localhost:8080
with your favorite browser. You should see something like this It works!
Now try to make the webserver unavailable using ab
(Apache Benchmark). You can run ab
with… Docker container! Let’s write:
time docker run --net=host --rm jordi/ab ab -c 10000 -n 30000 -r http://localhost:8080/
to measure time needed to complete 30000 connections to webserver, with 10000 connections performed simultaneously and without closing the socket in the case of error (-r
flag).
My output is
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 3000 requests
Completed 6000 requests
Completed 9000 requests
Completed 12000 requests
Completed 15000 requests
Completed 18000 requests
Completed 21000 requests
Completed 24000 requests
Completed 27000 requests
Completed 30000 requests
Finished 30000 requests
Server Software: Apache/2.4.29
Server Hostname: localhost
Server Port: 8080
Document Path: /
Document Length: 45 bytes
Concurrency Level: 10000
Time taken for tests: 141.045 seconds
Complete requests: 30000
Failed requests: 13593
(Connect: 0, Receive: 4530, Length: 4533, Exceptions: 4530)
Total transferred: 7373257 bytes
HTML transferred: 1148085 bytes
Requests per second: 212.70 [#/sec] (mean)
Time per request: 47015.097 [ms] (mean)
Time per request: 4.702 [ms] (mean, across all concurrent requests)
Transfer rate: 51.05 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 605 1363.4 9 15168
Processing: 0 23477 44276.7 1906 131857
Waiting: 0 4547 8138.6 892 63191
Total: 0 24081 44220.3 3070 135258
Percentage of the requests served within a certain time (ms)
50% 3070
66% 8556
75% 15994
80% 26979
90% 127318
95% 127405
98% 127626
99% 130373
100% 135258 (longest request)
real 2m 22.08s
user 0m 0.02s
sys 0m 0.00s
Now try to build a distributed service that execute this webserver with 6 tasks and see how many times does it needed to complete 30000 connections.
Docker Swarm Mode
Create 4 Nodes
To create 4 nodes on the same machine, we are going to use docker-machine
with the following command: docker-machine create --driver virtualbox worker1
And do this for four times, or:
for i in `seq 1 4`;
do
docker-machine create --driver virtualbox worker$i;
done
Now we have 4 workers, you can see those with docker-machine ls
. If all is gone correctly, your output should be similar to this:
Alessandros-MBP:web alessandro$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
worker1 - virtualbox Running tcp://192.168.99.100:2376 v17.11.0-ce
worker2 - virtualbox Running tcp://192.168.99.101:2376 v17.11.0-ce
worker3 - virtualbox Running tcp://192.168.99.102:2376 v17.11.0-ce
worker4 - virtualbox Running tcp://192.168.99.103:2376 v17.11.0-ce
Create a manager
Now that we have created 4 workers, let’s do create a Swarm Manager.
docker-machine create manager1
and then, get the manager1
IP address, typing docker-machine ls
and reading the URL field for manager1
.
Connect now to manager1
with
docker-machine ssh manager1
and then start a new swarm, so run inside manager1
:
docker swarm init --advertise-addr <MANAGER-IP>
in my case, <MANAGER-IP>
is 192.168.99.104
.
Join the workers to the swarm
This initialization create a token, that you have to use to join from workers.
docker@manager1:~$ docker swarm init --advertise-addr 192.168.99.104
Swarm initialized: current node (ghip4g5s8l1qaj19u9bt4z1g5) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-5vkvrcb97hwf7lnd8bwdwoz86n9roqh594aj34unp8ephtj7wb-c6vyjcedx7qsj0zvxxaje7zz7 192.168.99.104:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
So get the command docker swarm join --token ... <MANAGER-IP>:<PORT
and paste it into each worker connecting to it via docker-machine ssh
.
You can perform this operation typing
for i in `seq 1 4`;
do
docker-machine ssh worker$i docker swarm join --token <TOKEN>
<MANAGER-IP>:<PORT`;
done
Now we have the swarm created.
Reconnect via docker-machine ssh
to manager1
to configure the service, then execute docker node ls
to see all the worker joined to the swarm.
docker@manager1:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
ghip4g5s8l1qaj19u9bt4z1g5 * manager1 Ready Active Leader
rarxfvelnw1sl6kxll2hvum7z worker1 Ready Active
64rcnb9unhkrkx15gk43tld5b worker2 Ready Active
oerktjryawqzadakuwnjnka3l worker3 Ready Active
whq7kd3zkld9jlg7vgbi307uj worker4 Ready Active
Create a service
Now start a service, with 5 tasks and mounting a “/web” volume with:
docker service create --replicas 5 -p 8080:80 --name web httpd
Now you can see the service progress with
$ docker service ps web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
vsyi9ywzy1xc web.1 httpd:latest worker1 Running Running 3 minutes ago
cbs958bkwx3f web.2 httpd:latest worker2 Running Running 2 minutes ago
thalirehjpfa web.3 httpd:latest worker3 Running Running 3 minutes ago
shrtupjr4bjy web.4 httpd:latest worker4 Running Running 2 minutes ago
ttz7c79ogk8r web.5 httpd:latest manager1 Running Running 3 minutes ago
Pay attention on CURRENT STATE, it must be a “Running” state, not “Preparing”.
Now reperform the Benchmark and see the result:
$ time docker run --net=host --rm jordi/ab ab -c 10000 -n 30000 -r http://localhost:80/
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 3000 requests
Completed 6000 requests
Completed 9000 requests
Completed 12000 requests
Completed 15000 requests
Completed 18000 requests
Completed 21000 requests
Completed 24000 requests
Completed 27000 requests
Completed 30000 requests
Finished 30000 requests
Server Software: Apache/2.4.29
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 45 bytes
Concurrency Level: 10000
Time taken for tests: 77.014 seconds
Complete requests: 30000
Failed requests: 167
(Connect: 0, Receive: 22, Length: 123, Exceptions: 22)
Total transferred: 8635031 bytes
HTML transferred: 1344555 bytes
Requests per second: 389.54 [#/sec] (mean)
Time per request: 25671.464 [ms] (mean)
Time per request: 2.567 [ms] (mean, across all concurrent requests)
Transfer rate: 109.49 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 6530 10928.3 1369 35220
Processing: 1 931 2714.5 505 64186
Waiting: 0 655 2305.9 256 64186
Total: 1 7460 11398.3 2432 65780
Percentage of the requests served within a certain time (ms)
50% 2432
66% 3324
75% 3973
80% 15080
90% 33031
95% 33893
98% 34062
99% 34100
100% 65780 (longest request)
real 1m 18.10s
user 0m 0.01s
sys 0m 0.00s
This result introduces a new tool implemented in Docker Swarm, the load balancing. As you can see, we have hit a manager(but should perform a connection using an IP address of another node) and the Swarm redirect the request to a running node.
Some other stuff
By default, the manager is also a worker. To avoid this, write:
docker node update --availability drain manager1
To delete a docker-machine
created, just write:
docker-machine rm <NAME_LIST> (e.g. manager1 worker1 worker2)
To scale the number of tasks that run for a service, you can execute:
docker service scale web=8
Try to delete a worker1
(write docker-machine rm worker1
) where, probably, is running a task and see, executing on manager1
docker service ps web
what happens (answer: the manager reschedule the task on another node automatically).