Archive for the ‘maven’ Category

Automatically generating avro schemata (avsc files) using maven

Sunday, June 8th, 2014

I’ve been using avro for serialization a bit lately, and it seems like a really useful, flexible, and performant technology. To use avro containers, you have to define a schema for them — but writing out JSON files is a bit of a pain. Avro provides an IDL that you can use to specify the object types instead, and it’s much easier to work with. The avro-maven-plugin is quite useful because you can automatically generate Java objects from the IDL files — but what if you’re working with the same Avro files in a different language that can’t use the IDL?

Until they add the functionality to the maven plugin, there’s an easy way you can automate this yourself using a bit of maven magic. First thing to do is add the following dependency to your project’s pom.xml

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-tools</artifactId>
  <version>1.7.6</version>
</dependency>

Next, you need to add a simple little class that converts an entire directory from avdl files to avsc files. Avro-tools ships with a useful class called IdlToSchemataTool that will convert a single file for you, so converting an entire directory is just a simple wrapper around that. There is a bit of improvement that could be done here, but this gets the job done assuming your directory only has avdl files in it.

package main;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.avro.tool.IdlToSchemataTool;

/**
 * Converts an entire directory from Avro IDL (.avdl) to schema (.avsc)
 */
public class ConvertIdl {

	public static void main(String [] args) throws Exception {
		IdlToSchemataTool tool = new IdlToSchemataTool();
		
		File inDir = new File(args[0]);
		File outDir = new File(args[1]);
		
		for (File inFile: inDir.listFiles()) {
			List<String> toolArgs = new ArrayList<String>();
			toolArgs.add(inFile.getAbsolutePath());
			toolArgs.add(outDir.getAbsolutePath());
			
			tool.run(System.in, System.out, System.err, toolArgs);
		}
	}
}

Finally, you add the following to the plugins section of pom.xml to actually generate the avsc files. This uses the exec-maven-plugin to run the class we created above during compilation. This configuration assumes that you are storing your avdl files in src/main/avro, and that you want to place the files in schemata. Obviously you can reconfigure this however you want.

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>exec-maven-plugin</artifactId>
    <version>1.3</version>
    <executions>
        <execution>
            <phase>compile</phase>
            <goals>
                <goal>java</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <mainClass>main.ConvertIdl</mainClass>
        <arguments>
            <argument>${project.basedir}/src/main/avro/</argument>
            <argument>${project.basedir}/schemata/</argument>
        </arguments>
    </configuration>
</plugin>

And that’s it! To actually convert your avdl files to avsc files, run ‘mvn compile’ and the output directory should be filled with avsc files containing the JSON schema for your avro containers. Hope this helps you out, let me know if you find any bugs or have improvements.