Random thoughts along the roadside…

Automated docker ambassadors with CoreOS + registrator + ambassadord

July 28th, 2014

I’m just starting to play around with docker, and I’ve been investigating the use ofÂ CoreOS for deploying a cluster of docker containers. Though I’ve only been using it for a week, I really like what I’ve seen so far. CoreOS is makes it veryÂ easy to cluster together a group of machines using etcd, and in particular, I really like their fleet software, which allows you to manage systemd units (which you can use to run docker containers) across an entire CoreOS cluster. Fleet makes it easy toÂ do things like high availability, failure recovery, and other useful things without too much extra effort right out of the box. The one piece missing is how to connect the containers together. There are some ways they’ve documented to do it, but honestly most of the ways I’ve seen on the internet consist of a bunch of shell script glue that feels really hacky to me.

In the docker community, something called theÂ ‘ambassador’ pattern has emerged, which is this idea of proxying connections to container A from container B via container P, and container P has enough smarts in it to transparently redirect connections to many different containers depending on parameters. However, most of the stuff I’ve found on the web is very labor intensive and full of nasty shell scripting that is easy to mess up.

Jeff Lindsay has created the first stage of what I think is aÂ really good general solution to this problem — namely, his projects called registrator and ambassadord. Registrator listens for docker containers to startup, and automatically adds them something like etcd or consul. You link your containers to ambassadord, and when your container tries to make an outgoing connection, it will do a lookup to figure out where the connection needs to go, and connect you there. It’s pretty easy, with very little configuration needed for the involved containers.

CoreOS already ships with etcd built-in, so CoreOS + registrator + ambassadord seems to be a great combination to me. I’ve modified CoreOS’s sample vagrant cluster to demonstrate how to use these to connect containers together.

Read the rest of this entry »

Posted in docker, tips | 3 Comments »

Concept Map playlist visualization generated by Exaile 3.4 beta3

July 15th, 2014

At work, I’ve been playing a little bit with visualization of data in D3. After hours, I DJ at local Lindy Hop dance events, and it occurred to me recently that it could be interesting to visualize my playlists to understand more about what I actually played. UsingÂ Exaile, I’m able to easily add a lot of metadata to the tracks in my collection, and its plugin framework made it easy to take that data and do something interesting with it.

From that initial investigation, I’ve built in a playlist visualization templating engine plugin forÂ ExaileÂ that can do some very simple visualization stuff, and I’m planning on extending the types of things I can do with it. Here’s an interesting one I created from a recent playlist. This visualization was inspired byÂ The Concept Map, except the source code for this isn’t minified. 🙂

Check out the visualization atÂ bl.ocks.org (works best in Chrome).

Posted in General | No Comments »

Transparently use avro schemata (.avsc) files in a python module

June 8th, 2014

One of the cool things about avro is that it has bindings in a couple of different languages. However, I think the only one that has native code generation support for working with avro objects is Java, which makes working with avro in the other languages a bit harder. Here’s a simple way to load your schemata dynamically (and if you don’t want to write your schemata by hand, then using maven you can generate it from AVDL files using the tip from my previous post).

What this bit of code does is overrides the __getattr__ function on the module, so anytime you try to access a type on the module, it will attempt to load the avro schema from a file of the same name with the avsc extension. To use this code, create a file called __init__.py in your directory of .avsc files, and paste the following code in.

import sys
from os.path import join, dirname
import avro.schema

class AvroSchemaLoader(object):
    '''
        This object allows us to lazily load schemata files in the current
        directory and parse them as needed.
        
        It is intended to be used as a replacement of the current module in
        sys.modules, so usage of this object should be transparent to users.
        
        For example, to access the Foo wrapper object, you would do the
        following:
        
            >>> from this_dir_name import Foo
            >>> print type(Foo)
            <avro.schema.RecordSchema at ...>
            >>>
    '''
    
    def __init__(self, module):
        # things break in odd ways if you don't keep a reference to the module here
        self.__module = module  

    def __getattr__(self, name):
        if name.startswith('__'):
            return object.__getattr__(self, name)
        
        with open(join(dirname(__file__), '%s.avsc' % name), "r") as fp:
            schema = avro.schema.parse(fp.read())
        
        setattr(self, name, schema)
        return schema


# Replace this module instance with the dynamic loader
sys.modules[__name__] = AvroSchemaLoader(sys.modules[__name__])

There’s a lot you can do to make this better — like load a wrapper around the schema instead of using the schema directly.Â I’ll leave that as an exercise for the reader. 🙂

Posted in avro, python, Software, tips | No Comments »

Automatically generating avro schemata (avsc files) using maven

June 8th, 2014

I’ve been using avro for serialization a bit lately, and it seems like a really useful, flexible, and performant technology. To use avro containers, you have to define a schema for them — but writing out JSON files is a bit of a pain. Avro provides an IDL that you can use to specify the object types instead, and it’s much easier to work with. The avro-maven-plugin is quite useful because you can automatically generate Java objects from the IDL files — but what if you’re working with the same Avro files in a different language that can’t use the IDL?

Until they add the functionality to the maven plugin, there’s an easy way you can automate this yourself using a bit of maven magic. First thing to do is add the following dependency to your project’s pom.xml

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-tools</artifactId>
  <version>1.7.6</version>
</dependency>

Next, you need to addÂ a simple little class that converts an entire directory from avdl files to avsc files. Avro-tools ships with a useful class called IdlToSchemataTool that will convert a single file for you, so converting an entire directory is just a simple wrapper around that. There is a bit of improvement that could be done here, but this gets the job done assuming your directory only has avdl files in it.

package main;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.avro.tool.IdlToSchemataTool;

/**
 * Converts an entire directory from Avro IDL (.avdl) to schema (.avsc)
 */
public class ConvertIdl {

	public static void main(String [] args) throws Exception {
		IdlToSchemataTool tool = new IdlToSchemataTool();
		
		File inDir = new File(args[0]);
		File outDir = new File(args[1]);
		
		for (File inFile: inDir.listFiles()) {
			List<String> toolArgs = new ArrayList<String>();
			toolArgs.add(inFile.getAbsolutePath());
			toolArgs.add(outDir.getAbsolutePath());
			
			tool.run(System.in, System.out, System.err, toolArgs);
		}
	}
}

Finally, you add the following to the plugins section of pom.xml to actually generate the avsc files. This uses the exec-maven-plugin to run the class we created above during compilation. This configuration assumes that you are storing your avdl files in src/main/avro, and that you want to place the files in schemata. Obviously you can reconfigure this however you want.

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>exec-maven-plugin</artifactId>
    <version>1.3</version>
    <executions>
        <execution>
            <phase>compile</phase>
            <goals>
                <goal>java</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <mainClass>main.ConvertIdl</mainClass>
        <arguments>
            <argument>${project.basedir}/src/main/avro/</argument>
            <argument>${project.basedir}/schemata/</argument>
        </arguments>
    </configuration>
</plugin>

And that’s it! To actually convert your avdl files to avsc files, run ‘mvn compile’ and the output directory should be filled with avsc files containing the JSON schema for your avro containers. Hope this helps you out, let me know if you find any bugs or have improvements.

Posted in avro, Java, maven, Software, tips | No Comments »

Downloading and uploading dashboards to/from a graphite server

April 24th, 2014

I’ve been using graphite and statsd lately, and over lunch quickly whipped up this script to download/upload dashboard configurations to/from a graphite server. Here’s the code, use as you wish.

#!/usr/bin/env python
#
# Author: Dustin Spicuzza
# 
# Use this script to download/upload dashboards to/from a graphite server.
#

import json

import sys
import urllib
import urllib2


download_url = 'http://%s/dashboard/load/%s'
upload_url = 'http://%s/dashboard/save/%s'


def download(ip, name, fname):
    
    uf = urllib2.urlopen('http://%s/dashboard/load/%s' % (ip, name))

    data = json.loads(urllib2.unquote(uf.read()))
    data_str = json.dumps(data['state'], sort_keys=True, indent=4, separators=(',',': '))

    if fname == None:
        print data_str
    else:
        with open(fname, 'w') as fp:
            fp.write(data_str)

    return 0


def upload(ip, name, fname):

    if fname is None:
        data_str = sys.stdin.read()
    else:
        with open(fname, 'r') as fp:
            data_str = fp.read()

    data = json.loads(data_str)

    if data['name'] != name:
        print "ERROR: dashboard name doesn't match name in state file"
        return 1

    post_data = urllib.urlencode([('state', json.dumps(data))])

    request = urllib2.Request(upload_url % (ip, name), post_data)
    request.add_header("Content-type", "application/x-www-form-urlencoded")

    print urllib2.urlopen(request).read()
    return 0


def usage():
    print "Usage: dasher.py [upload graphite_host name filename] | [download graphite_host name [filename]]"
    exit(1)

if __name__ == '__main__':

    if len(sys.argv) < 4:
        usage()

    action = sys.argv[1]
    ip = sys.argv[2]
    name = sys.argv[3]
    fname = None

    if len(sys.argv) > 4:
        fname = sys.argv[4]

    if action == 'upload':
        retval = upload(ip, name, fname)

    elif action == 'download':
        retval = download(ip, name, fname)

    else:
        usage()

    exit(retval)

Posted in graphite, python, statsd | 1 Comment »

Global shared development folder on Vagrant

December 20th, 2013

Vagrant is pretty awesome for development. One thing that I’ve ran into is that I use a lot of vagrant instances at various times, and much of the time I want to access my development files from inside the VM. One thing that is nice about vagrant is that by default it maps the folder where the Vagrantfile is located to /vagrant inside the VM. However, most of the time the content I want to access isn’t in that folder, so I found a good way to allow me to access stuff without needing to copy content all over the place.

What you can do is setup a global Vagrantfile, and all of the VMs that are stood up by your username will get the settings inside that VM.Â Just create a file Â ~/.vagrant.d/Vagrantfile so it looks like the following. This will mapÂ some local folder to /src on the vagrant VM — but of course, you should set the paths toÂ values that make sense for you.

Vagrant.configure("2") do |config|

    config.vm.provider :virtualbox do |vbox, override|

        # path on your local machine
        host_folder_name = "~/local/path/to/somewhere"

        # path where the local folder is mapped to inside the VM
        vm_folder_name = "/src"

        # In newer versions of Vagrant, you should use "type" otherwise
        # you may find it rsyncing your computer to the VM
        override.vm.synced_folder File.expand_path(host_folder_name), vm_folder_name, type: "virtualbox"
    end

end

Of course, if you set something like this up, definitely use the vagrant-rekey-ssh plugin to make sure that nobody else is able to access your VM via SSH using the default insecure vagrant keys.

Posted in Vagrant | No Comments »

Beware of using Vagrant VMs on a bridged network

December 10th, 2013

I love Vagrant. If you haven’t used it, Vagrant is pretty awesome. It lets you manage VMs + configurations really easily, and it’s a great development tool to use when you need to do rapid iterative development on disposable environments.

However, last week I realized that because of the way it’s implemented, all vagrant VMs share the same set of credentials to access them. Since most people only do local development with them, this is mostly ok (though it could be used to jump process boundaries and escalate privileges if you were creative and already running on the box) — however, in a bridged configuration, this is a huge security vulnerability as *anyone* on the internet could potentially use this key to SSH into your VM.

Since some of the stuff I do involves using bridged VMs, I wrote a Vagrant plugin to fix this problem. It replaces the default vagrant SSH key with one randomly generated for your user on that host. If you want to fix this security vulnerability on your vagrant installation, just do:

vagrant plugin install vagrant-rekey-ssh

Hope you find it useful! The github site for the plugin has more useful information if you want to read more.

Posted in Vagrant | No Comments »

Tool to implement ruby-inspired DSLs in python

November 17th, 2013

This weekend, I created library to aid developers in creating neat little embedded DSLs when using python, without having to do any complex parsing or anything like that. The resulting DSLs look a bit more like english than python does.

The idea for this was inspired by some ruby stuff that I’ve been using lately. I’ve been using ruby quite a bit lately, and while I am still not a huge fan of the language, I do like the idea of easy to code up DSLs that I can use to populate objects without too much effort. Since I wanted a DSL for a python project I was working on, I played with a few ideas and ended up with this tool. Read the rest of this entry »

Posted in General | No Comments »

Versioned chef environments

November 3rd, 2013

I was just recently introduced to chef, and it has turned out to be a pretty useful tool for automating infrastructure. One feature that I’m using in our environment is chef environments. A bunch of developers are all updating this file, and their cookbooks depend on attributes in it*.Â I keep getting bit by various user errors that all could be solved if a cookbook could state ‘make sure you have at least this version of the environment’.

I’m sure there’s a good reason for it, but chef does not currently support versioning environments. To work around this problem, I created a cookbook with a library function that can check a node attribute and compare the version information there to determine if the environment is out of date. It’s a bit of a hack for now, but it gets the job done.

I thought someone else might find this useful, so I uploaded the code to github and to the opscode community site. If you have a need for versioned environments in chef, this might work for you too. Let me know if you find it useful!

*Yes, this could be considered an anti-pattern, but for our use case using environments to override attributes makes perfect sense, and allows us to not have to fork every community cookbook that we want to use.

Posted in General | No Comments »

Another Python SIP wrapper for the tesseract OCR library

July 26th, 2013

Tesseract is a pretty decent open source OCR engine that was developed by HP back in the day, but is now maintained as open source by Google. It has a C++ API that you can program to, and as you would expect there are a number of wrappers (of varying quality) that allow you to use libtesseract from Python. For various reasons, none of those fit my needs, so I created my own SIP-based wrapper instead. I will not be maintaining this as a format project, but if you want an apache licensed python wrapper, you can find the code on github. 🙂

GitHub repository for Python Tesseract SIP wrapper

Posted in General | No Comments »

Automated docker ambassadors with CoreOS + registrator + ambassadord

Concept Map playlist visualization generated by Exaile 3.4 beta3

Transparently use avro schemata (.avsc) files in a python module

Automatically generating avro schemata (avsc files) using maven

Downloading and uploading dashboards to/from a graphite server

Global shared development folder on Vagrant

Beware of using Vagrant VMs on a bridged network

Tool to implement ruby-inspired DSLs in python

Versioned chef environments

Another Python SIP wrapper for the tesseract OCR library

Archives

Categories

Projects