VMWare and my carputer

September 5th, 2007

The biggest problem with the carputer I have is that its almost impossible to configure it easily — especially when its mounted in my car. So, what I’ve done is used VMWare to create a virtual machine that runs Linux on it, and then I used rsync to do a lil something like so from the virtual machine:

/usr/bin/rsync -apzv --delete --exclude=/dev --exclude=/sys --exclude=/var/log --exclude=/var/lock --exclude=/var/tmp --exclude=/var/run --exclude=/proc --exclude=/tmp -e "ssh" root@carputer_address:/ /

Which, of course, this command copies practically everything onto the virtual machine. Its a great solution so far (since you can easily copy changes back), the only real challenge was that I had to recompile the kernel to support the VMWare hardware. Haven’t gotten X working yet either, but I’m pretty sure that will be trivial compared to the fact that my carputer supports SSE2, but the host computer doesn’t, so I had to do the following

emerge -e world

after adjusting my build settings… grr. Recompiling 632 packages right now actually. Kept getting ‘invalid instruction’ errors all over the place.

Obsessive Web Statistics (OWS) analysis plugin tutorial

September 1st, 2007

This is a short tutorial on how you can write an analysis plugin for Obsessive Website Statistics (OWS). OWS is designed first and foremost to be plugin friendly, and as you will see, adding useful functionality in the form of plugins is not hard at all, and can be done in just a few lines of code. We are going to add DNS hostname resolving to OWS.

What is an analysis plugin?

An analysis plugin performs analysis on the parsed logfile data, and stores that information in the database dimensions. OWS has wrapped all of this stuff in a nice easy to use abstraction layer so that you won’t need to make actual SQL queries if you don’t want to.

Implementation

All OWS plugins are implemented as PHP classes. This is the bare skeleton that all OWS plugins should define.

class OWSDNS implements iPlugin{

	// this should return a unique ID identifying the plugin, should start with an alpha,
	// should use basename instead of just __FILE__ otherwise it could expose path information
	public function getPluginId();//{
		return 'p'. md5(basename(__FILE__) . get_class());
	}

	// returns an associative array describing the plugin
	public function getPluginInformation(){

		return array(

			'pluginName' => 'Name of plugin',
			'aboutUrl' => 'http://information.about.plugin',

			'author' => 'author',
			'url' => 'http://developers.website',

			'description' => 'Description of what plugin does'
		);
	}
}

You should notice we define two functions — getPluginId() and getPluginInformation(). These must be defined by any OWS plugin, and are used to identify the plugin in a number of instances. This plugin also implements iPlugin. All interfaces are defined (with plenty of comments) in include/plugin_interfaces.inc.php. A plugin can implement as many interfaces as it needs to. There are a few types, but the one we are going to implement is iAnalysisPlugin. We will do so by changing the first part to:

class OWSDNS implements iPlugin, iAnalysisPlugin {

Additionally, we need to register the plugin with OWS so that it knows what kind of plugin you are defining. Add this to the end of your source file:

register_plugin('analysis',new OWSDNS());

An analysis plugin needs to implement the following functions:

define_dimensions
InitializeAnalysis
preAnalysis
getPrimaryNode
getAttributes
postAnalysis

All of these functions are documented in include/plugin_interfaces.inc.php if you need more comprehensive information.

Now, OWS stores data in multiple dimensions. Each dimension has a ‘primary node’ which is the main data element of the dimension. Each primary node can have mutliple attributes which are defined about it, and always has the same name as the dimension. Plugins can define new dimensions or extend existing dimensions.

Right now, OWS stores only the host address — which is an IP address representing the visitor. What our plugin needs to do is resolve this address, and store it as an attribute of the dimension. So, we need to extend the dimension ‘host’, which we can do using the function define_dimensions().

// this function should return a set of arrays that define the dimensions
// and attributes that this plugin defines. You should not specify an attribute
// that another plugin defines. This is not website dependent.
public function define_dimensions(){

	return array(
		'host' => array(
			'hostname' => attribute_defn('varchar',254,16)
		)
	);
}

Pretty simple, eh? See, the array returned means that we are defining inside dimension ‘host’, an attribute named ‘hostname’. The function attribute_defn is used to define the SQL type that our attribute has, so the installer can create it for us. Now, we can write the actual analysis part.

At the beginning of analysis, the function InitializeAnalysis is called in case the plugin needs to do something before the analysis begins. This function is called once per website analyzed. Our plugin isn’t going to need this, so we just return true.

public function InitializeAnalysis($website){
	return true;
}

Now, after all plugins are initialized, then the logfile lines are read from the logfile (or from the database in the case of an install or in the case of reanalysis). It is read in phases, which consist of 4 steps:

preAnalysis
getPrimaryNode
getAttributes
postAnalysis

Now, preAnalysis and postAnalysis are only called once per phase, but getPrimaryNode is typically called at least once per logfile line. Our plugin doesn’t use getPrimaryNode — getPrimaryNode is only used for plugins that define new dimensions in define_dimensions. If you don’t define a primary node, then you should return false and show an error.

It should also be noted that our plugin doesn’t need to do any preAnalysis or postAnalysis, so we can just return true.

public function preAnalysis($website,&$ids){
	return true;
}

public function getPrimaryNode($website, $dimension, $line){
	return show_error("Invalid dimension passed to plugin\"" . get_class() . "\"");
}

public function postAnalysis($website,&$ids){
	return true;
}

Now we get to the part that actually does the work. The function getAttributes needs to return an array representing the attributes that the plugin defines per dimension. The $dimension argument is passed in to the function, and we should only do analysis on the primary node. The contents of the primary node are passed in to the function as well. This makes sense, because attributes of the primary node should be discernable by only looking at the primary node itself. If this is not the case, then you should probably be defining a new dimension instead.

This function should return an array of attributes/values in the form of:

	array('attribute' => 'value', ...)

Note: The returned values can be cached (for performance reasons), so this function may NOT always be called for each row. You should ALWAYS return an array with the same keys each time, in the same order that you defined them in define_dimensions. Of course, if you do not define any attributes in the dimension passed in the $dimension parameter, or if there is an error, then return false.

Anyways, heres the code for this function:


public function getAttributes($website, $dimension, $pnode){

	if ($dimension != 'host')
		return show_error("Invalid dimension passed to plugin\"" . get_class() . "\" in getAttributes!");

	// return the hostname
	return array('hostname' => gethostbyaddr($pnode));

}

And thats it! Wasn’t that easy? Of course, theres a lot more useful things we could probably implement, and make this more polished. Now, after you install the plugin and run the analysis, the only filter you’ll be able to use on your new dimension attribute in the web interface is the manual analysis, since it allows analysis on all defined dimensions. But, it would be a pretty trivial matter to either modify an existing filter plugin or create a new filter plugin. We’ll discuss this in the future.

Hope this helps you out. If you need help with OWS, or developing for OWS, don’t hesitate to ask! Leave your comments, or join the obsessive-compulsive mailing list!

Download this
Obsessive Website Statistics Website

Innovative Thoughts

August 27th, 2007

I’ve been thinking a lot about innovation lately — what it means, why its important, and how can I be more innovative professionally and personally. And, I wrote a statement that I really think describes this.

Innovation is just a problem away.

I would say that at the base of all innovation is a problem that needed to be solved. Asking questions like “Why not?” or “Why can’t I…” allows us to work in a totally new direction and create solutions that are not only innovative, but useful.

Let me know what you think.

Web Interface to the fortune program

August 26th, 2007

I got bored last night, so I created a wrapper around the fortune program on my Gentoo box… then made it better with jQuery AJAX goodness. And then I combined it with my rndsay wrapper to make the fortunes be echoed by cows. ๐Ÿ™‚ Of course, getting it to work on my host here has been annoying, but I finally got it working! Enjoy!

Random Fortune Generator

Source Code

Another antivirus complaint

August 22nd, 2007

I was recently talking to a co-worker who was complaining that their machine ran a virus scan every Wednesday at noon, and made the machine totally unusable. Of course, the virus scanner was our corporate version of Symantec and the scan cannot be disabled. Which in my mind, brings up an interesting question:

If the antivirus “Auto-Protect” feature actually works, why the heck do you need to run a virus scan??!

Seriously. And, a lot of helpdesk documentation on the web recommends that users run virus scans weekly/daily. But at the same time, if you have the autoprotect working, then theoretically isn’t it going to stop anything from getting onto your machine, and thus making the virus scan useless?

It used to be that running a virus scan wouldn’t kill the machine, but with todays bloated and slow antivirus products, it just seems silly to run the scan. But maybe thats just me.

Another MySQL Cluster Lesson

August 17th, 2007

I learned a great truth about MySQL Cluster today, and I think MySQL in general actually:

Apparently, memory really does matter!

Heh.ย  Imagine that. I mean, I knew it was true, but didn’t full realize it until I tried setting up a 8-node MySQL cluster on Pentium 4’s with 256MB of RAM. The performance was absolutely horrible. Seriously. So I switched some things around and the performance was way better with 2 P4’s with 512MB of RAM. We have a lab on campus with 2gb memory and Core Duos… with gigabit. That would be nice… ๐Ÿ™‚

Obsessive Web Stats Demo on the Virtual Roadside!

August 17th, 2007

A demo of OWS is now on the virtual roadside! If you’re interested in seeing OWS in action, then you can visit it at http://obsessive.virtualroadside.com/, it details the traffic to the OWS sourceforge site. The only limitation is that it only tracks the main page… so you can’t really do any in-depth analysis. But it shows you the key concepts behind OWS in any case.

MySQL Cluster Tips

August 17th, 2007

Well, I setup a 9-computer MySQL cluster to do some experimentation with OWS. Its pretty neat, I have DDNS setup with DHCP, and a neat thing setup with rsync where every single machine syncs its configuration to the ‘primary’ machine each hour. Its pretty cool, I’ll have to write some more posts about it.

Anyways, if you ever use MySQL cluster, theres one important tip that they don’t really mention in the manual:

MAKE SURE ALL OF YOUR STORAGE NODES ARE UP, OTHERWISE THE CLUSTER WONT START.ย 

See, I had this issue with one of the network cards on the machines, so I decided just to try and get the thing to work without messing with the machine. Which, has worked pretty well until I got around to screwing with the MySQL cluster.ย  And, you would think this is perfectly obvious — but its not. So thats my tip.

Of course, after talking to the guys on efnet #mysql, turns out that MySQL cluster probably won’t benefit OWS anyways. But, we shall see, right? ๐Ÿ™‚

OWS v0.8.0.1 released

August 14th, 2007

There was a huge issue with the ows_aggregate plugin in version v0.8..
sorting just did not work at all. v0.8.0.1 has been released to resolve
this issue. Thanks to Jon for pointing this out.

OWS Download Linkย 

Major Release of Obsessive Website Statistics

August 14th, 2007

Note: This announcement can also be found in the obsessive-compulsive mailing list and the OWS news archives at sourceforge.

The first open source Web 2.0 website log analyzer, Obsessive Website Statistics (OWS) uses PHP and jQuery to provide a powerful and intuitive interface to manipulate website log data stored in a MySQL database via easy to create plugins.

This is a major release of OWS. All users are strongly encouraged to upgrade. v0.7.x is completely not compatible with v0.8, as the database structure has totally changed for performance and flexibility reasons. You will need to totally delete your old databases and upload logfiles from scratch. This is not expected to happen again in the future.

OWS v0.8 now stores its data in a multidimensional OLAP-style data schema that has shown huge performance gains for data retrieval in our initial testing, and also promises to scale better than the previous releases of OWS. Additionally, OWS plugins have been enhanced to take advantage of the new data schema, and the manual analysis option is now much more intuitive to use for individuals not familiar with SQL.

Download link for OWS
Sourceforge Project Page