Archive for the ‘MySQL’ Category

MySQL Cluster Tips

Friday, August 17th, 2007

Well, I setup a 9-computer MySQL cluster to do some experimentation with OWS. Its pretty neat, I have DDNS setup with DHCP, and a neat thing setup with rsync where every single machine syncs its configuration to the ‘primary’ machine each hour. Its pretty cool, I’ll have to write some more posts about it.

Anyways, if you ever use MySQL cluster, theres one important tip that they don’t really mention in the manual:

MAKE SURE ALL OF YOUR STORAGE NODES ARE UP, OTHERWISE THE CLUSTER WONT START. 

See, I had this issue with one of the network cards on the machines, so I decided just to try and get the thing to work without messing with the machine. Which, has worked pretty well until I got around to screwing with the MySQL cluster.  And, you would think this is perfectly obvious — but its not. So thats my tip.

Of course, after talking to the guys on efnet #mysql, turns out that MySQL cluster probably won’t benefit OWS anyways. But, we shall see, right? 🙂

Improving the speed of large MySQL inserts

Monday, August 6th, 2007

Obsessive Web Stats (OWS) has been a surprisingly addictive project for me. I had started doing some (limited) work on it before Boys State in June, and I figured it would be something simple that would be fun to work on… then I started trying to optimize for performance.

The next version of OWS (v0.8) will definitely be using a Star Schema to store its data. I’ve found that by implementing this, it reduces the database size by up to 50-75%. Which, is definitely a positive thing. And its cut some query times by 75% as well, which I’m pretty excited about. Check out this console screenshot from show_info.php:

Name                          Rows    Data      Idx
==========================================================
virtualroadside_com           105798  7.9 MB    26.9 MB
virtualroadside_com_agent     1558    196.6 KB  114.7 KB
virtualroadside_com_bytes     6336    229.4 KB  163.8 KB
virtualroadside_com_config    4       16.4 KB   16.4 KB
virtualroadside_com_date      292     16.4 KB   16.4 KB
virtualroadside_com_host      5517    245.8 KB  180.2 KB
virtualroadside_com_method    5       16.4 KB   16.4 KB
virtualroadside_com_protocol  2       16.4 KB   16.4 KB
virtualroadside_com_referrer  2773    409.6 KB  491.5 KB
virtualroadside_com_request   3893    540.7 KB  786.4 KB
virtualroadside_com_status    1       16.4 KB   16.4 KB
virtualroadside_com_time      46409   1.6 MB    3.2 MB
virtualroadside_com_user      2       16.4 KB   16.4 KB

Total Data:     11.2 MB
Total Indexes:  31.9 MB

Checking dimensions..............OK.

Dimension  Rows   Unique  Status
==================================
host       5605   5605    OK
user       2      2       OK
date       292    292     OK
time       45710  45710   OK
method     5      5       OK
request    3768   3768    OK
protocol   2      2       OK
status     1      1       OK
bytes      6769   6769    OK
referrer   2690   2690    OK
agent      1470   1470    OK

However, inserts are currently horribly slow. As in, almost unbearably slow. Actually, its not so bad initially: 36 seconds for 10,000 logfile lines, which ends up being around 100,000 SQL queries to insert/retrieve data. However, in-memory caching measures reduce the number of actual SQL queries to around 5-10,000 or so. Not too shabby for my Pentium III.

Once the main ‘fact table’ gets to around 100,000 rows, the insert times start declining… the insert times were around 900 seconds at the 1 millionth row.

Right now, I’m inserting data like so: There are 12 tables holding the different dimensions of the data. Each insert to a new row, I check to see if the dimension key already exists, in which case I reuse it. Of course, this brings up the question of whether I’m properly denormalizing the data or not. One of the useful things I found was the following magic command:

SET UNIQUE_CHECKS=0

This cut my insert time by about 1/6 or so. Pretty awesome. I also tried

ALTER TABLE table DISABLE KEYS

But unfortunately, this makes zero difference on an InnoDB table. I did try switching to a MyISAM based table, but that didn’t seem to make much of a difference either. I’ve been busy scouring the web for performance tips, but I think one of the biggest barriers at the moment is my hardware: Dual Pentium III 500 Mhz with 1GB of ram, and 20gb/40gb IDE disks. I moved the MySQL database to my roommates computer (Pentium D, 1GB RAM, Raid IV) and performance for retrievals went way up, but the inserts are still pretty slow — though not as slow as on the PIII.

In conclusion, I’ve been able to get pretty decent data retrieval speeds from switching to the OLAP data layout, but inserts suck — and as such, OWS still doesn’t scale well. If you use OWS, let me know how its working for you! I’m always interested in hearing other opinions. 🙂

Note: I hate the tag feature of WordPress. It should ask me to tag the article AFTER I’ve written it, otherwise I just forget to do so. Theres some auto-tagging plugins but I haven’t tried them yet. Yes, I realize its open source, but I really don’t feel like hacking on WordPress right now… lol.

MySQL and indexes

Thursday, July 26th, 2007

 I was able to obtain a 1.6GB apache combined logfile from a colleague, and have been using it to see how good/bad OWS performance is on this size of logs. Unfortunately, it looks like OWS does not work well with data this size. What makes it really disappointing is that the site in question only gets around 1000-5000 unique visitors a day.

The biggest performance-related problem right now is that MySQL is ignoring the indexes that I have set up. Through some research, apparently this is an InnoDB related problem where it tries to use the primary key for everything, as opposed to using the secondary indexes the same way. This has been evidenced with normal index usage on my tables with only 100,000 rows or so on it, while it trying to use the primary key for the table with 6.5 million rows in it (and performing a table scan, which is definitely BAD). Then when I use FORCE INDEX then it seems things work better, but I can’t imagine thas the proper way to do it. What I’m going to do is try and use clustered indexes, and use the date as the primary key (since almost every single query deals with the date in some way), and see what kind of performance increases I get.

I think when it comes right down to it though, using a flat MySQL table ends up having the same types of problems you have with flat files — browsing gigabytes of data is slow. Of course, some of this can probably be eliminated with better queries, but I haven’t quite figured out how to do that.

All of this analysis stuff has brought me into examining OLAP and other multidimensional ways of representing this kinds of data. Right now, I’m thinking I want to redo the backend storage model of OWS so its more efficient and fast using a different type of data representation (still using MySQL), while maintaining the same easy to use interface.

By the way, a great link I’ve found thats helped me with some of these random issues  is http://www.xaprb.com/blog/ , though most of the useful articles were published last year when he wasn’t so busy. I encourage you to check it out.

Fixing MySQL InnoDB problems on Gentoo

Sunday, July 1st, 2007

For some reason, transactions weren’t working on my MySQL installation on my Gentoo server, despite me creating the tables as InnoDB tables. I poked around and discovered that InnoDB wasn’t enabled for some reason, so I figured upgrading MySQL should do the trick. So I went ahead and upgraded my version to 5.0.40 and I got the following great error:

InnoDB: Error: ib_logfiles are too small for innodb_thread_concurrency 8.
InnoDB: The combined size of ib_logfiles should be bigger than
InnoDB: 200 kB * innodb_thread_concurrency.
InnoDB: To get mysqld to start up, set innodb_thread_concurrency in my.cnf
InnoDB: to a lower value, for example, to 8. After an ERROR-FREE shutdown
InnoDB: of mysqld you can adjust the size of ib_logfiles, as explained in
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/adding-and-removing.html
InnoDB: Cannot continue operation. Calling exit(1).

After doing a number of things including deleting the databases (I don’t have anything important in it at the moment, just for testing) and rebuilding them and other weirdness, its turns out all I needed to do was increase innodb_log_file_size to 2M from 1M and it was fine. So thats ok, and that error went away… in retrospect, it seems like an obvious solution. Very annoying that it wasn’t set to that by default.

But still “SHOW VARIABLES LIKE ‘have_innodb'” is showing DISABLED. So InnoDB is still not enabled. So to fix this thing, I go back in and rm -rf /var/lib/mysql , and then run emerge –config =dev-db/mysql-5.0.40… and when I log back into the mysql console, it works. Seriously, wtf?

Conclusion: InnoDB is a picky thing. Make sure your config is correct.