Obsessive Web Statistics (OWS) analysis plugin tutorial

This is a short tutorial on how you can write an analysis plugin for Obsessive Website Statistics (OWS). OWS is designed first and foremost to be plugin friendly, and as you will see, adding useful functionality in the form of plugins is not hard at all, and can be done in just a few lines of code. We are going to add DNS hostname resolving to OWS.

What is an analysis plugin?

An analysis plugin performs analysis on the parsed logfile data, and stores that information in the database dimensions. OWS has wrapped all of this stuff in a nice easy to use abstraction layer so that you won’t need to make actual SQL queries if you don’t want to.

Implementation

All OWS plugins are implemented as PHP classes. This is the bare skeleton that all OWS plugins should define.

class OWSDNS implements iPlugin{

	// this should return a unique ID identifying the plugin, should start with an alpha,
	// should use basename instead of just __FILE__ otherwise it could expose path information
	public function getPluginId();//{
		return 'p'. md5(basename(__FILE__) . get_class());
	}

	// returns an associative array describing the plugin
	public function getPluginInformation(){

		return array(

			'pluginName' => 'Name of plugin',
			'aboutUrl' => 'http://information.about.plugin',

			'author' => 'author',
			'url' => 'http://developers.website',

			'description' => 'Description of what plugin does'
		);
	}
}

You should notice we define two functions — getPluginId() and getPluginInformation(). These must be defined by any OWS plugin, and are used to identify the plugin in a number of instances. This plugin also implements iPlugin. All interfaces are defined (with plenty of comments) in include/plugin_interfaces.inc.php. A plugin can implement as many interfaces as it needs to. There are a few types, but the one we are going to implement is iAnalysisPlugin. We will do so by changing the first part to:

class OWSDNS implements iPlugin, iAnalysisPlugin {

Additionally, we need to register the plugin with OWS so that it knows what kind of plugin you are defining. Add this to the end of your source file:

register_plugin('analysis',new OWSDNS());

An analysis plugin needs to implement the following functions:

define_dimensions
InitializeAnalysis
preAnalysis
getPrimaryNode
getAttributes
postAnalysis

All of these functions are documented in include/plugin_interfaces.inc.php if you need more comprehensive information.

Now, OWS stores data in multiple dimensions. Each dimension has a ‘primary node’ which is the main data element of the dimension. Each primary node can have mutliple attributes which are defined about it, and always has the same name as the dimension. Plugins can define new dimensions or extend existing dimensions.

Right now, OWS stores only the host address — which is an IP address representing the visitor. What our plugin needs to do is resolve this address, and store it as an attribute of the dimension. So, we need to extend the dimension ‘host’, which we can do using the function define_dimensions().

// this function should return a set of arrays that define the dimensions
// and attributes that this plugin defines. You should not specify an attribute
// that another plugin defines. This is not website dependent.
public function define_dimensions(){

	return array(
		'host' => array(
			'hostname' => attribute_defn('varchar',254,16)
		)
	);
}

Pretty simple, eh? See, the array returned means that we are defining inside dimension ‘host’, an attribute named ‘hostname’. The function attribute_defn is used to define the SQL type that our attribute has, so the installer can create it for us. Now, we can write the actual analysis part.

At the beginning of analysis, the function InitializeAnalysis is called in case the plugin needs to do something before the analysis begins. This function is called once per website analyzed. Our plugin isn’t going to need this, so we just return true.

public function InitializeAnalysis($website){
	return true;
}

Now, after all plugins are initialized, then the logfile lines are read from the logfile (or from the database in the case of an install or in the case of reanalysis). It is read in phases, which consist of 4 steps:

preAnalysis
getPrimaryNode
getAttributes
postAnalysis

Now, preAnalysis and postAnalysis are only called once per phase, but getPrimaryNode is typically called at least once per logfile line. Our plugin doesn’t use getPrimaryNode — getPrimaryNode is only used for plugins that define new dimensions in define_dimensions. If you don’t define a primary node, then you should return false and show an error.

It should also be noted that our plugin doesn’t need to do any preAnalysis or postAnalysis, so we can just return true.

public function preAnalysis($website,&$ids){
	return true;
}

public function getPrimaryNode($website, $dimension, $line){
	return show_error("Invalid dimension passed to plugin\"" . get_class() . "\"");
}

public function postAnalysis($website,&$ids){
	return true;
}

Now we get to the part that actually does the work. The function getAttributes needs to return an array representing the attributes that the plugin defines per dimension. The $dimension argument is passed in to the function, and we should only do analysis on the primary node. The contents of the primary node are passed in to the function as well. This makes sense, because attributes of the primary node should be discernable by only looking at the primary node itself. If this is not the case, then you should probably be defining a new dimension instead.

This function should return an array of attributes/values in the form of:

	array('attribute' => 'value', ...)

Note: The returned values can be cached (for performance reasons), so this function may NOT always be called for each row. You should ALWAYS return an array with the same keys each time, in the same order that you defined them in define_dimensions. Of course, if you do not define any attributes in the dimension passed in the $dimension parameter, or if there is an error, then return false.

Anyways, heres the code for this function:


public function getAttributes($website, $dimension, $pnode){

	if ($dimension != 'host')
		return show_error("Invalid dimension passed to plugin\"" . get_class() . "\" in getAttributes!");

	// return the hostname
	return array('hostname' => gethostbyaddr($pnode));

}

And thats it! Wasn’t that easy? Of course, theres a lot more useful things we could probably implement, and make this more polished. Now, after you install the plugin and run the analysis, the only filter you’ll be able to use on your new dimension attribute in the web interface is the manual analysis, since it allows analysis on all defined dimensions. But, it would be a pretty trivial matter to either modify an existing filter plugin or create a new filter plugin. We’ll discuss this in the future.

Hope this helps you out. If you need help with OWS, or developing for OWS, don’t hesitate to ask! Leave your comments, or join the obsessive-compulsive mailing list!

Download this
Obsessive Website Statistics Website

Leave a Reply