Optimizing a really nasty SQL query

Ok, so while working on Obsessive Website Statistics (OWS), I’ve hit a situation. See, OWS tries to be semi-intelligent and combine the parameters of all the installed plugins into really giant/nasty SQL queries that make you shudder, but tend to work.

So right now, I’m trying to select the following (at the same time):

  • All pages, grouped
  • COUNT() of all hosts
  • COUNT() of all pages that end with .html, .htm, .php, /
  • COUNT() of all pages
  • SUM() of filesizes

So of course, getting the 1st, 2nd, 4th, and 5th items is pretty easy. However, the third item is getting annoying. I tried using a subquery, but considering the table is 100,000 rows this is a particularly slow query:

SELECT 
	virtualroadside_com.request_str,
	COUNT(DISTINCT virtualroadside_com.host),
	c.b,
	COUNT(virtualroadside_com.filename),
	SUM(virtualroadside_com.bytes) 
FROM 
	(	
		SELECT 
			virtualroadside_com.request_str AS a,
			COUNT(virtualroadside_com.id) AS b 
		FROM 
			virtualroadside_com 
		WHERE 
			(virtualroadside_com.filename LIKE '%.html' 
			OR virtualroadside_com.filename LIKE '%/' 
			OR virtualroadside_com.filename LIKE '%.htm' 
			OR virtualroadside_com.filename LIKE '%.php') 
		GROUP BY virtualroadside_com.request_str 
		ORDER BY virtualroadside_com.request_str DESC 
		LIMIT 0,100
	) c,
	virtualroadside_com 
WHERE 
	virtualroadside_com.request_str = c.a
GROUP BY virtualroadside_com.request_str 
ORDER BY virtualroadside_com.request_str DESC 
LIMIT 0,100;

So, the big question here is: Is there better ways to accomplish this sort of thing without using subqueries? This particular query takes around 40 seconds on 105,000 rows to execute on my computer (Dual PIII, 500Mhz). I’m positive theres a way to do with with a JOIN of some kind, but I can’t get any of those to work correctly either. Let me know if you have any good ideas! I’ll publish a better way to do this hopefully in the next few days once I figure it out. 🙂

One Response to “Optimizing a really nasty SQL query”

  1. Jon Anderson says:

    SELECT col, (col LIKE ‘%.htm’)

    That should get you off to a good start.

Leave a Reply