{"id":73,"date":"2007-08-06T01:50:47","date_gmt":"2007-08-06T05:50:47","guid":{"rendered":"http:\/\/www.virtualroadside.com\/blog\/index.php\/2007\/08\/06\/improving-the-speed-of-large-mysql-inserts\/"},"modified":"2007-08-06T01:52:07","modified_gmt":"2007-08-06T05:52:07","slug":"improving-the-speed-of-large-mysql-inserts","status":"publish","type":"post","link":"https:\/\/www.virtualroadside.com\/blog\/index.php\/2007\/08\/06\/improving-the-speed-of-large-mysql-inserts\/","title":{"rendered":"Improving the speed of large MySQL inserts"},"content":{"rendered":"<p><a href=\"http:\/\/obsessive.sourceforge.net\">Obsessive Web Stats (OWS)<\/a> has been a surprisingly addictive project for me. I had started doing some (limited) work on it before <a href=\"http:\/\/www.michiganboysstate.org\/\">Boys State<\/a> in June, and I figured it would be something simple that would be fun to work on&#8230; then I started trying to optimize for performance.<\/p>\n<p>The next version of OWS (v0.8) will definitely be using a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Star_schema\">Star Schema<\/a> to store its data. I&#8217;ve found that by implementing this, it reduces the database size by up to 50-75%. Which, is definitely a positive thing. And its cut some query times by 75% as well, which I&#8217;m pretty excited about. Check out this console screenshot from show_info.php:<\/p>\n<pre>\r\nName                          Rows    Data      Idx\r\n==========================================================\r\nvirtualroadside_com           105798  7.9 MB    26.9 MB\r\nvirtualroadside_com_agent     1558    196.6 KB  114.7 KB\r\nvirtualroadside_com_bytes     6336    229.4 KB  163.8 KB\r\nvirtualroadside_com_config    4       16.4 KB   16.4 KB\r\nvirtualroadside_com_date      292     16.4 KB   16.4 KB\r\nvirtualroadside_com_host      5517    245.8 KB  180.2 KB\r\nvirtualroadside_com_method    5       16.4 KB   16.4 KB\r\nvirtualroadside_com_protocol  2       16.4 KB   16.4 KB\r\nvirtualroadside_com_referrer  2773    409.6 KB  491.5 KB\r\nvirtualroadside_com_request   3893    540.7 KB  786.4 KB\r\nvirtualroadside_com_status    1       16.4 KB   16.4 KB\r\nvirtualroadside_com_time      46409   1.6 MB    3.2 MB\r\nvirtualroadside_com_user      2       16.4 KB   16.4 KB\r\n\r\nTotal Data:     11.2 MB\r\nTotal Indexes:  31.9 MB\r\n\r\nChecking dimensions..............OK.\r\n\r\nDimension  Rows   Unique  Status\r\n==================================\r\nhost       5605   5605    OK\r\nuser       2      2       OK\r\ndate       292    292     OK\r\ntime       45710  45710   OK\r\nmethod     5      5       OK\r\nrequest    3768   3768    OK\r\nprotocol   2      2       OK\r\nstatus     1      1       OK\r\nbytes      6769   6769    OK\r\nreferrer   2690   2690    OK\r\nagent      1470   1470    OK<\/pre>\n<p>However, inserts are currently horribly slow. As in, almost unbearably slow. Actually, its not so bad initially: 36 seconds for 10,000 logfile lines, which ends up being around 100,000 SQL queries to insert\/retrieve data. However, in-memory caching measures reduce the number of actual SQL queries to around 5-10,000 or so. Not too shabby for my Pentium III.<\/p>\n<p>Once the main &#8216;fact table&#8217; gets to around 100,000 rows, the insert times start declining&#8230; the insert times were around 900 seconds at the 1 millionth row.<\/p>\n<p>Right now, I&#8217;m inserting data like so: There are 12 tables holding the different dimensions of the data. Each insert to a new row, I check to see if the dimension key already exists, in which case I reuse it. Of course, this brings up the question of whether I&#8217;m properly denormalizing the data or not. One of the useful things I found was the following magic command:<\/p>\n<pre>\r\nSET UNIQUE_CHECKS=0<\/pre>\n<p>This cut my insert time by about 1\/6 or so. Pretty awesome. I also tried<\/p>\n<pre>\r\nALTER TABLE table DISABLE KEYS<\/pre>\n<p>But unfortunately, this makes zero difference on an InnoDB table. I did try switching to a MyISAM based table, but that didn&#8217;t seem to make much of a difference either. I&#8217;ve been busy scouring the web for performance tips, but I think one of the biggest barriers at the moment is my hardware: Dual Pentium III 500 Mhz with 1GB of ram, and 20gb\/40gb IDE disks. I moved the MySQL database to my roommates computer (Pentium D, 1GB RAM, Raid IV) and performance for retrievals went way up, but the inserts are still pretty slow &#8212; though not as slow as on the PIII.<\/p>\n<p>In conclusion, I&#8217;ve been able to get pretty decent data retrieval speeds from switching to the OLAP data layout, but inserts suck &#8212; and as such, OWS still doesn&#8217;t scale well. If you use OWS, let me know how its working for you! I&#8217;m always interested in hearing other opinions. \ud83d\ude42<\/p>\n<p>Note: I hate the tag feature of WordPress. It should ask me to tag the article AFTER I&#8217;ve written it, otherwise I just forget to do so. Theres some auto-tagging plugins but I haven&#8217;t tried them yet. Yes, I realize its open source, but I really don&#8217;t feel like hacking on WordPress right now&#8230; lol.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Obsessive Web Stats (OWS) has been a surprisingly addictive project for me. I had started doing some (limited) work on it before Boys State in June, and I figured it would be something simple that would be fun to work on&#8230; then I started trying to optimize for performance. The next version of OWS (v0.8) [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,21],"tags":[],"_links":{"self":[{"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/73"}],"collection":[{"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=73"}],"version-history":[{"count":0,"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/73\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=73"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=73"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.virtualroadside.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=73"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}