Automated docker ambassadors with CoreOS + registrator + ambassadord

I’m just starting to play around with docker, and I’ve been investigating the use ofÂ CoreOS for deploying a cluster of docker containers. Though I’ve only been using it for a week, I really like what I’ve seen so far. CoreOS is makes it veryÂ easy to cluster together a group of machines using etcd, and in particular, I really like their fleet software, which allows you to manage systemd units (which you can use to run docker containers) across an entire CoreOS cluster. Fleet makes it easy toÂ do things like high availability, failure recovery, and other useful things without too much extra effort right out of the box. The one piece missing is how to connect the containers together. There are some ways they’ve documented to do it, but honestly most of the ways I’ve seen on the internet consist of a bunch of shell script glue that feels really hacky to me.

In the docker community, something called theÂ ‘ambassador’ pattern has emerged, which is this idea of proxying connections to container A from container B via container P, and container P has enough smarts in it to transparently redirect connections to many different containers depending on parameters. However, most of the stuff I’ve found on the web is very labor intensive and full of nasty shell scripting that is easy to mess up.

Jeff Lindsay has created the first stage of what I think is aÂ really good general solution to this problem — namely, his projects called registrator and ambassadord. Registrator listens for docker containers to startup, and automatically adds them something like etcd or consul. You link your containers to ambassadord, and when your container tries to make an outgoing connection, it will do a lookup to figure out where the connection needs to go, and connect you there. It’s pretty easy, with very little configuration needed for the involved containers.

CoreOS already ships with etcd built-in, so CoreOS + registrator + ambassadord seems to be a great combination to me. I’ve modified CoreOS’s sample vagrant cluster to demonstrate how to use these to connect containers together.

Setup the CoreOS cluster using Vagrant

First, use the instructions in the original README.md file to start the 3-machine cluster up — make sure you have at least 8GB of free RAM! If you already have Vagrant 1.6+ installed, it should be as easy as:

git cloneÂ https://github.com/virtuald/coreos-vagrant-ambassadord.git
cp user-data.sample user-data
- Uncomment the discovery token line with a new url obtained fromÂ https://discovery.etcd.io/new
vagrant up
vagrant ssh core-01

Once everything comes up, you’ll need to wait for registrator + ambassadord to download and come up. You can use ‘journalctl -u registrator.service’ and ‘journalctl -u ambassadord.service’ to check on the progress.Â If you execute ‘docker ps’, you should see both of their containers running, and it will show something like so:

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
CONTAINER ID        IMAGE                         COMMAND                CREATED             STATUS              PORTS               NAMES
9c40b5e6b823        virtuald/registrator:latest   /bin/registrator -ip   2 minutes ago       Up 2 minutes                            registrator         
0e46ce5e07e1        virtuald/ambassadord:latest   /start --omnimode      2 minutes ago       Up 2 minutes        10000/tcp           backends

Alright!

Note: you’ll notice that it downloaded registrator/ambassadord from virtuald/registrator, and not from progrium/registrator. They have fixes that Jeff hasn’t merged in as of this writing.Â

Verify cluster operation

When you ssh in, you should be able to use fleet to list all the machines in your cluster.

core@core-01 ~ $ fleetctl list-machines
MACHINE		IP		METADATA
909d077d...	172.17.8.102	-
971caf36...	172.17.8.103	-
d70a88f8...	172.17.8.101	-

If this doesn’t work, it’s probably because you didn’t setup the etcd discovery correctly in the user-data file. Refer to their cluster documentation for details.

Remote fleetctl operation

Using fleetctl from within the cluster is cool, but it’s even better if you install it on your host machine. Either build it with go from their github repo, or on OSX you can do ‘brew install fleetctl’. Once you’ve done that, you can do the following to get your fleet working remotely:

$ source units/env.sh
$ ssh-add ~/.vagrant.d/insecure_private_key

Run ‘fleetctl list-machines’ to make sure it works.

Example application: an NSQ messaging cluster

Ok, now that we’ve got things working, let’s do something useful. I’ve decided to setup a small NSQ messaging cluster, which consists of a lookup daemon and some messaging daemons that all need to talk to each other. Moderately complex, but pretty simple once the config is all done.

In my repo, there’s a directory called ‘units’. cd into that, and you can launch an nsq cluster from that directory. First, do ‘fleetctl start nsqlookupd.service’. You can check the status using ‘fleetctl list-units’, and once it’s ready to go it will look like this:

UNIT			STATE		LOAD	ACTIVE	SUB	DESC				MACHINE
nsqlookupd.service	launched	loaded	active	running	NSQ lookup daemon instance	d70a88f8.../172.17.8.101

If it says it’s ‘activating’, that means it’s downloading the docker images still. Otherwise, piece of cake. Now, let’s launch 2 NSQ daemons to connect to the lookupd. Just do ‘fleetctl start nsqd.1.service’ and ‘fleetctl start nsqd.2.service’. Once it’s done launching, list-units should show something similar to this:

UNIT			STATE		LOAD	ACTIVE	SUB	DESC				MACHINE
nsqd.1.service		launched	loaded	active	running	NSQ daemon instance		d70a88f8.../172.17.8.101
nsqd.2.service		launched	loaded	active	running	NSQ daemon instance		971caf36.../172.17.8.103
nsqlookupd.service	launched	loaded	active	running	NSQ lookup daemon instance	d70a88f8.../172.17.8.101

Easy! Now, we can verify thatÂ things are connected by checking the logs. ssh into the machine using ‘vagrant ssh core-0X’, and do ‘journalctl -u nsqd.1.service’ (or etc), and you should see log messages indicating that the nsq daemon has started, and connected to the lookup daemon!

It doesn’t get much simpler than that. Let’s test out this setup, using the instructions taken from the NSQ docker page. In one terminal, watch nsqlookupd (substitute x.x.x.x with the IP address nsqlookupd lives at).

watch -n 0.5 "curl -s curl http://x.x.x.x:4161/topics"

In another terminal, you can send a message to one of the nsq daemons. But let’s do that through the ambassador container to one of the nsq dameons randomly, instead of explicitly specifying the IP address! SSH into one of the cluster machines, and execute this:

source /etc/environment; docker run --rm -it --link backends:api -e BACKEND_4151="etcd://${COREOS_PRIVATE_IPV4}:4001/services/nsqd" radial/busyboxplus

This will give you a busybox shell. Go ahead and send a message to the daemon.

curl -d 'hello world 1' http://$API_PORT_10000_TCP_ADDR:4151/put?topic=test

If everything works, it should print an ‘OK’, and in the other terminal you should see the list of topics show something like:

{"status_code":200,"status_txt":"OK","data":{"topics":["test"]}}

Now, once you have this working, you can use this cluster + techniques (see below) to connect your own applications to nsqd and nsqlookupd, without any of them needing to know explicitly where the others are.

How it works

The most important part is setting up registrator and ambassadord as services to run on the cluster. I created two systemd unit files for them, and added them to the cloud-init used to initialize the cluster. These should work pretty generically for you regardless of your seutp. You can even extract the relevant parts and run them on a linux that isn’t CoreOS.

Once those services are running, you need to connect your containers to the target containers via docker links. The key part of nsqd.service is right here:

--link backends:nsqlookupd -e BACKEND_4160=etcd://${COREOS_PRIVATE_IPV4}:4001/services/nsqlookupd -e SERVICE_NAME=nsqd

The –link part means link to the backends container using nsqlookupd as an alias, and the -e BACKEND_4160 part tells ambassadord to search for the key at the specified location (which needs to be shorter), and connect any connections to port 4160 to one of the IP addresses it finds there. The SERVICE_NAME environment variable tells registrator to publish the ports for this container under that service name, instead of the default. As all of this stuff is really young and very much subject to change, I’m not going to go into it in huge detail — check out the README files for registrator + ambassadord to see how it works and how you can customize it for yourself.

Conclusion

Now, there’s still a bunch of things that are a bit awkward about this setup, and this NSQ cluster is far from production ready — but I think this is miles past what I’ve seen from other docker container connectionÂ solutions at the moment. The good part is that even though registrator/ambassadordÂ is very new and needs some work, this setup will work today. As this software gets better, I agree with Jeff that this is going to make a huge impact on the docker community.

Let me know what you think! If you have improvements/suggestions, drop a note in the comments or do a pull request on the github repo.

This entry was posted on Monday, July 28th, 2014 at 2:39 am and is filed under docker, tips. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

3 Responses to “Automated docker ambassadors with CoreOS + registrator + ambassadord”

Ryan Tanner says:

September 4, 2014 at 4:01 pm

If anyone else is having issues starting the nsqd.*.service units, I found that I needed to double the RAM allocated by Virtualbox. Edit both the Vagrantfile and config.rb and change $vb_memory to 2048.
Ericson Cepeda says:

December 29, 2014 at 3:40 pm

Hello,

I am doing:

source /etc/environment; docker run -it –rm -p 3030:3010 –link backends:rabbitmq -e BACKEND_5672=etcd://${COREOS_PRIVATE_IPV4}:4001/services/rabbitmq –link backends:redis -e BACKEND_6379=etcd://${COREOS_PRIVATE_IPV4}:4001/services/redis –link backends:mongodb -e BACKEND_27017=etcd://${COREOS_PRIVATE_IPV4}:4001/services/mongodb -e SERVICE_NAME=notifications-api twnel/notifications-api /bin/bash

However, doing ping to any of the linked services will give a response from only one of them. Any clue?

Each service is in the form:

[Service]
EnvironmentFile=/etc/environment
Environment=REDIS_IMG=dockerfile/redis REDIS_CNAME=redis

ExecStartPre=/bin/bash -c “/usr/bin/docker inspect $REDIS_IMG &> /dev/null || /usr/bin/docker pull $REDIS_IMG”
ExecStartPre=/bin/bash -c “/usr/bin/docker rm -f $REDIS_CNAME &> /dev/null; exit 0”
ExecStart=/usr/bin/docker run –name $REDIS_CNAME –rm -p 6379:6379 -e SERVICE_NAME=redis $REDIS_IMG
ExecStop=/usr/bin/docker stop $REDIS_CNAME
ExecStopPost=/usr/bin/docker rm -f $REDIS_CNAME
Ericson Cepeda says:

December 30, 2014 at 10:39 am

Hi,

I have got it!

Reading ambassador documentation: https://github.com/progrium/ambassadord, there is a part where they point:

$ docker run -d –link backends:backends -e “BACKEND_6379=redis.services.consul” progrium/mycontainer startdaemon

So all services will be in backends ports. That said, the correct command would be something like:

source /etc/environment; docker run -it –rm -p 3030:3010 –link backends:backends -e BACKEND_5672=etcd://${COREOS_PRIVATE_IPV4}:4001/services/rabbitmq -e BACKEND_6379=etcd://${COREOS_PRIVATE_IPV4}:4001/services/redis -e BACKEND_27017=etcd://${COREOS_PRIVATE_IPV4}:4001/services/mongodb -e SERVICE_NAME=notifications-api twnel/notifications-api /bin/bash

it does only require 1 –link backends:backends, and whenever the container app needs to point to a service, the host will be ‘backends’.