I use 1and1.com for my web hosting, and they provide raw dumps of the apache logs for the website. Which is great, because you can do whatever analysis on them and it works nicely.. except for two things:
- They don’t use a standard LogFormat directive
- After about 6 weeks, the oldest log files get deleted
So, after much trial and error, I figured out a decent way to use Awstats to do my log analysis. I wrote a cron job that runs every monday morning (had to modify my crontab from the default), which basically downloads the latest logfile from my 1and1 account, and then runs awstats on it. It sounds pretty trivial — but 1and1 names their logfiles in the format access.log.[week].tar.gz.. which works until the year rolls over. So I added some logic in there to intelligently rename the files like access.log.[year].[week].tar.gz.
You can download the awstats 1and1 cron job here.
To use it, you need to be running Linux, and have awstats installed. You will need to modify some of the parameters of the cron script, but its decently commented. Additionally, your awstats configuration file needs to have the following two directives in it:
LogFormat= “%host %other %logname %time1 %methodurl %code %bytesd %virtualname %refererquot %uaquot %otherquot”
LogFile=”gunzip -c $(for i in `ls -rt /home/awstats/roadside/`; do echo /home/awstats/roadside/$i; done) |”
Just make sure you replace /home/awstats/roadside with the directory that you want to place your logfiles. Refer to the script for more documentation.
Note: The biggest problem I have with it is that it only updates once a week. I played with doing it every day, but then awstats tended to drop/ignore records that were out of sync, and it seemed to be better to have complete stats instead of having them updated every day. If you have a good solution for this, let me know!
I was contemplating trying to write this script my self, but ask google and it will provide.
I haven’t set it up yet, but will likely do to later today to take a break from studying.