Those are a personal notes about Docker Swarm, derivated by Documentation. After theory part, there is a concrete example. Enjoy!
Docker Swarm is a Docker-native clustering system. It turns a pool of Docker hosts into a single, virtual host.
- Cluster management integrated with Docker Engine
- Desired state reconciliation
- Multi-host networking
- Load balancing
- Secure by default
- Rolling updates
A swarm consists of multiple Docker hosts which run in swarm mode and act as managers and workers. A given Docker hosts can be a manager, a worker or perform both roles.
When you create a service, you define its optimal state (number of replicas, network and storage resources available to it, ports the service exposes to the outside world, and more). Docker works to maintain that desired state. If a worker becomes unavailable, Docker schedules that’s tasks on other nodes. (A task is a running container which is part of swarm service and managed by a swarm manager).
You can modify a service’s configuration without the need to manually restart the service. Docker will update the swarm to fit new configuration.
A node is an instance of Docker engine. You can run one or more nodes on a single physical computer (
docker-machine), but typically each node runs on different machines.
The swarm manager uses ingress load balancing to expose the services you want to make available externally to the swarm. Swarm mode has an internal DNS component that automatically assigns each service in the swarm a DNS entry. The swarm manager uses internal load balancing to distribute requests among services within the cluster based upon the DNS name of the service.
Manager nodes handle cluster management tasks:
- Maintaining cluster state
- scheduling services
- serving swarm mode HTTP API endpoints
The internal state is maintained using a Raft algorithm. If a manager fails (and there are no more managers), the service will continue to run, but you will need to create a new cluster to recover.
Worker nodes are instances of Docker Engine whose unique purpose is to execute containers. By default, all managers are also workers. To prevent this, set
Drain. The scheduler gracefully stops tasks on nodes in Drain mode and schedules the tasks on an Active node.
A service is frequently an image for a microservice within the context of some larger application (HTTP Server, database, or any other distributed environment).
When you define a service, you specify, optionally, in addition to image,
- the port where the swarm will make the service available outside the swarm
- an overlay network to connect to other services in the swarm
- CPU and memory limits
- a rolling update policy
- the number of replicas of the image to run in the swarm
The services can be of two types:
- replicated, when you specify the number of identical tasks you want to run (web services…)
- global, when runs one task on every node (monitoring agents, antivirus…)
The swarm mode public key infrastructure (PKI) system build into Docker makes it simple to securely deploy a container orchestration system. The nodes use TLS to authenticate, authorize and encrypt the communication with other nodes. When you create a new swarm manager, it creates a new root Certificate Authority which is used to secure communication with other nodes.
Deploy a service
docker service create --name web nginx This command create a service, called “web”, that run nginx image. Now you can change all the configurations using
docker service update command.
docker service update --publish-add 80 web To remove a service, simply do,
docker service remove web.
There are lots of configuration that can be set to run a service.
Alessandros-MBP:~ alessandro$ docker service create --help Usage: docker service create [OPTIONS] IMAGE [COMMAND] [ARG...] Create a new service Options: --config config Specify configurations to expose to the service --constraint list Placement constraints --container-label list Container labels --credential-spec credential-spec Credential spec for managed service account (Windows only) -d, --detach Exit immediately instead of waiting for the service to converge (default true) --dns list Set custom DNS servers --dns-option list Set DNS options --dns-search list Set custom DNS search domains --endpoint-mode string Endpoint mode (vip or dnsrr) (default "vip") --entrypoint command Overwrite the default ENTRYPOINT of the image -e, --env list Set environment variables --env-file list Read in a file of environment variables --group list Set one or more supplementary user groups for the container --health-cmd string Command to run to check health --health-interval duration Time between running the check (ms|s|m|h) --health-retries int Consecutive failures needed to report unhealthy --health-start-period duration Start period for the container to initialize before counting retries towards unstable (ms|s|m|h) --health-timeout duration Maximum time to allow one check to run (ms|s|m|h) --help Print usage --host list Set one or more custom host-to-IP mappings (host:ip) --hostname string Container hostname -l, --label list Service labels --limit-cpu decimal Limit CPUs --limit-memory bytes Limit Memory --log-driver string Logging driver for service --log-opt list Logging driver options --mode string Service mode (replicated or global) (default "replicated") --mount mount Attach a filesystem mount to the service --name string Service name --network network Network attachments --no-healthcheck Disable any container-specified HEALTHCHECK --no-resolve-image Do not query the registry to resolve image digest and supported platforms --placement-pref pref Add a placement preference -p, --publish port Publish a port as a node port -q, --quiet Suppress progress output --read-only Mount the container's root filesystem as read only --replicas uint Number of tasks --reserve-cpu decimal Reserve CPUs --reserve-memory bytes Reserve Memory --restart-condition string Restart when condition is met ("none"|"on-failure"|"any") (default "any") --restart-delay duration Delay between restart attempts (ns|us|ms|s|m|h) (default 5s) --restart-max-attempts uint Maximum number of restarts before giving up --restart-window duration Window used to evaluate the restart policy (ns|us|ms|s|m|h) --rollback-delay duration Delay between task rollbacks (ns|us|ms|s|m|h) (default 0s) --rollback-failure-action string Action on rollback failure ("pause"|"continue") (default "pause") --rollback-max-failure-ratio float Failure rate to tolerate during a rollback (default 0) --rollback-monitor duration Duration after each task rollback to monitor for failure (ns|us|ms|s|m|h) (default 5s) --rollback-order string Rollback order ("start-first"|"stop-first") (default "stop-first") --rollback-parallelism uint Maximum number of tasks rolled back simultaneously (0 to roll back all at once) (default 1) --secret secret Specify secrets to expose to the service --stop-grace-period duration Time to wait before force killing a container (ns|us|ms|s|m|h) (default 10s) --stop-signal string Signal to stop the container -t, --tty Allocate a pseudo-TTY --update-delay duration Delay between updates (ns|us|ms|s|m|h) (default 0s) --update-failure-action string Action on update failure ("pause"|"continue"|"rollback") (default "pause") --update-max-failure-ratio float Failure rate to tolerate during an update (default 0) --update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 5s) --update-order string Update order ("start-first"|"stop-first") (default "stop-first") --update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1) -u, --user string Username or UID (format: <name|uid>[:<group|gid>]) --with-registry-auth Send registry authentication details to swarm agents -w, --workdir string Working directory inside the container
As seen, you can specify how update a service. With
--update-dalay flag configures the time dalay between updates to a service task or sets of tasks (and can set parallelism actions with
--update-parallelism). If an update succeeds, a new tasks is updated, else, if the update fails, an action can be performed (
You can create two types of mounts for services in a swarm
bindmounts. You can configure it using the
--mountflag on creating phase, or
--mount-add|--mount-rmon updating phase.
Data Volume is a storage that remains alive after a container for a task has been removed. The preferred method is to leverage an existing volume:
$ docker service create \ --mount src=<VOLUME-NAME>,dst=<CONTAINER-PATH> \ --name myservice \ <IMAGE>
You can also create a volume at deployment time, just before starting the container.
Bind mounts are file system paths from the host. Docker mounts the path into the container.
docker service create \ --mount type=bind,src=<HOST-PATH>,dst=<CONTAINER-PATH> \ --name myservice \ <IMAGE>
Some problems can occur with bind mounts:
- if you bind mount a host path into a service’s container, the path must exist on every swarm node.
- The Docker swarm mode scheduler may reschedule a running service containers at any time if they became unhealthy or unreachable
- Host bind mounts are completely non-portable. When you use bind mounts, there is no guarantee that your application will run the same way in development as it does in production.
Manage sensitive data with Docker secrets
A secret is a blob of data (such a password, SSH private key, SSl certificate…) that should not be transmitted over a network or store unencrypted in a Dockerfile or in your application’s source code.
When you add a secret to the swarm, Docker sends the secret to the swarm manager over a mutual TLS connection. The location of the mount point within the container defaults to
/run/secrets/<secret_name (Linux containers) or
C:\ProgramData\Docker\secrets (Windows containers).
Scale the service
In order to scale the service in the swarm, access to a manager and then execute:
docker service scale <SERVICE-ID>=<NUMBER-OF-TASKS>
Docker Swarm doesn’t implement any tool to scale up or down in accordance to load factor. To achieve this feature you can use Orbiter.
Let’s try Docker Swarm. We are going to create a single, usual container on Docker Engine, with Apache Webserver (
httpd Docker image), and try to saturate it with
Apache Benchmark tool. After that, we are going to create a Docker Swarm Service and try to see if the network is able to resist to
Apache Benchmark with the same amount of request.
Docker standard container
We deploy a service, starting a container with the following command:
docker run --rm -it --name web -p 8080:80 -v web:/usr/local/apache2/htdocs/ httpd:latest
(Some problems may occur on MacOS while trying to mount a volume in this way. Try to insert absolute path to
web directory instead of relative path).
Now check that all works going on
http://localhost:8080 with your favorite browser. You should see something like this It works!
Now try to make the webserver unavailable using
ab (Apache Benchmark). You can run
ab with… Docker container! Let’s write:
time docker run --net=host --rm jordi/ab ab -c 10000 -n 30000 -r http://localhost:8080/
to measure time needed to complete 30000 connections to webserver, with 10000 connections performed simultaneously and without closing the socket in the case of error (
My output is
This is ApacheBench, Version 2.3 <$Revision: 1796539 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 3000 requests Completed 6000 requests Completed 9000 requests Completed 12000 requests Completed 15000 requests Completed 18000 requests Completed 21000 requests Completed 24000 requests Completed 27000 requests Completed 30000 requests Finished 30000 requests Server Software: Apache/2.4.29 Server Hostname: localhost Server Port: 8080 Document Path: / Document Length: 45 bytes Concurrency Level: 10000 Time taken for tests: 141.045 seconds Complete requests: 30000 Failed requests: 13593 (Connect: 0, Receive: 4530, Length: 4533, Exceptions: 4530) Total transferred: 7373257 bytes HTML transferred: 1148085 bytes Requests per second: 212.70 [#/sec] (mean) Time per request: 47015.097 [ms] (mean) Time per request: 4.702 [ms] (mean, across all concurrent requests) Transfer rate: 51.05 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 605 1363.4 9 15168 Processing: 0 23477 44276.7 1906 131857 Waiting: 0 4547 8138.6 892 63191 Total: 0 24081 44220.3 3070 135258 Percentage of the requests served within a certain time (ms) 50% 3070 66% 8556 75% 15994 80% 26979 90% 127318 95% 127405 98% 127626 99% 130373 100% 135258 (longest request) real 2m 22.08s user 0m 0.02s sys 0m 0.00s
Now try to build a distributed service that execute this webserver with 6 tasks and see how many times does it needed to complete 30000 connections.
Docker Swarm Mode
Create 4 Nodes
To create 4 nodes on the same machine, we are going to use
docker-machine with the following command:
docker-machine create --driver virtualbox worker1 And do this for four times, or:
for i in `seq 1 4`; do docker-machine create --driver virtualbox worker$i; done
Now we have 4 workers, you can see those with
docker-machine ls. If all is gone correctly, your output should be similar to this:
Alessandros-MBP:web alessandro$ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS worker1 - virtualbox Running tcp://192.168.99.100:2376 v17.11.0-ce worker2 - virtualbox Running tcp://192.168.99.101:2376 v17.11.0-ce worker3 - virtualbox Running tcp://192.168.99.102:2376 v17.11.0-ce worker4 - virtualbox Running tcp://192.168.99.103:2376 v17.11.0-ce
Create a manager
Now that we have created 4 workers, let’s do create a Swarm Manager.
docker-machine create manager1
and then, get the
manager1 IP address, typing
docker-machine ls and reading the URL field for
Connect now to
docker-machine ssh manager1
and then start a new swarm, so run inside
docker swarm init --advertise-addr <MANAGER-IP>
in my case,
Join the workers to the swarm
This initialization create a token, that you have to use to join from workers.
docker@manager1:~$ docker swarm init --advertise-addr 192.168.99.104 Swarm initialized: current node (ghip4g5s8l1qaj19u9bt4z1g5) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-5vkvrcb97hwf7lnd8bwdwoz86n9roqh594aj34unp8ephtj7wb-c6vyjcedx7qsj0zvxxaje7zz7 192.168.99.104:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
So get the command
docker swarm join --token ... <MANAGER-IP>:<PORT and paste it into each worker connecting to it via
You can perform this operation typing
for i in `seq 1 4`; do docker-machine ssh worker$i docker swarm join --token <TOKEN> <MANAGER-IP>:<PORT`; done
Now we have the swarm created.
docker-machine ssh to
manager1 to configure the service, then execute
docker node ls to see all the worker joined to the swarm.
docker@manager1:~$ docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ghip4g5s8l1qaj19u9bt4z1g5 * manager1 Ready Active Leader rarxfvelnw1sl6kxll2hvum7z worker1 Ready Active 64rcnb9unhkrkx15gk43tld5b worker2 Ready Active oerktjryawqzadakuwnjnka3l worker3 Ready Active whq7kd3zkld9jlg7vgbi307uj worker4 Ready Active
Create a service
Now start a service, with 5 tasks and mounting a “/web” volume with:
docker service create --replicas 5 -p 8080:80 --name web httpd
Now you can see the service progress with
$ docker service ps web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS vsyi9ywzy1xc web.1 httpd:latest worker1 Running Running 3 minutes ago cbs958bkwx3f web.2 httpd:latest worker2 Running Running 2 minutes ago thalirehjpfa web.3 httpd:latest worker3 Running Running 3 minutes ago shrtupjr4bjy web.4 httpd:latest worker4 Running Running 2 minutes ago ttz7c79ogk8r web.5 httpd:latest manager1 Running Running 3 minutes ago
Pay attention on CURRENT STATE, it must be a “Running” state, not “Preparing”.
Now reperform the Benchmark and see the result:
$ time docker run --net=host --rm jordi/ab ab -c 10000 -n 30000 -r http://localhost:80/ This is ApacheBench, Version 2.3 <$Revision: 1796539 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 3000 requests Completed 6000 requests Completed 9000 requests Completed 12000 requests Completed 15000 requests Completed 18000 requests Completed 21000 requests Completed 24000 requests Completed 27000 requests Completed 30000 requests Finished 30000 requests Server Software: Apache/2.4.29 Server Hostname: localhost Server Port: 80 Document Path: / Document Length: 45 bytes Concurrency Level: 10000 Time taken for tests: 77.014 seconds Complete requests: 30000 Failed requests: 167 (Connect: 0, Receive: 22, Length: 123, Exceptions: 22) Total transferred: 8635031 bytes HTML transferred: 1344555 bytes Requests per second: 389.54 [#/sec] (mean) Time per request: 25671.464 [ms] (mean) Time per request: 2.567 [ms] (mean, across all concurrent requests) Transfer rate: 109.49 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 6530 10928.3 1369 35220 Processing: 1 931 2714.5 505 64186 Waiting: 0 655 2305.9 256 64186 Total: 1 7460 11398.3 2432 65780 Percentage of the requests served within a certain time (ms) 50% 2432 66% 3324 75% 3973 80% 15080 90% 33031 95% 33893 98% 34062 99% 34100 100% 65780 (longest request) real 1m 18.10s user 0m 0.01s sys 0m 0.00s
This result introduces a new tool implemented in Docker Swarm, the load balancing. As you can see, we have hit a manager(but should perform a connection using an IP address of another node) and the Swarm redirect the request to a running node.
Some other stuff
By default, the manager is also a worker. To avoid this, write:
docker node update --availability drain manager1
To delete a
docker-machine created, just write:
docker-machine rm <NAME_LIST> (e.g. manager1 worker1 worker2)
To scale the number of tasks that run for a service, you can execute:
docker service scale web=8
Try to delete a
docker-machine rm worker1) where, probably, is running a task and see, executing on
docker service ps web what happens (answer: the manager reschedule the task on another node automatically).