Ok, so while working on Obsessive Website Statistics (OWS), I’ve hit a situation. See, OWS tries to be semi-intelligent and combine the parameters of all the installed plugins into really giant/nasty SQL queries that make you shudder, but tend to work.
So right now, I’m trying to select the following (at the same time):
- All pages, grouped
- COUNT() of all hosts
- COUNT() of all pages that end with .html, .htm, .php, /
- COUNT() of all pages
- SUM() of filesizes
So of course, getting the 1st, 2nd, 4th, and 5th items is pretty easy. However, the third item is getting annoying. I tried using a subquery, but considering the table is 100,000 rows this is a particularly slow query:
SELECT
virtualroadside_com.request_str,
COUNT(DISTINCT virtualroadside_com.host),
c.b,
COUNT(virtualroadside_com.filename),
SUM(virtualroadside_com.bytes)
FROM
(
SELECT
virtualroadside_com.request_str AS a,
COUNT(virtualroadside_com.id) AS b
FROM
virtualroadside_com
WHERE
(virtualroadside_com.filename LIKE '%.html'
OR virtualroadside_com.filename LIKE '%/'
OR virtualroadside_com.filename LIKE '%.htm'
OR virtualroadside_com.filename LIKE '%.php')
GROUP BY virtualroadside_com.request_str
ORDER BY virtualroadside_com.request_str DESC
LIMIT 0,100
) c,
virtualroadside_com
WHERE
virtualroadside_com.request_str = c.a
GROUP BY virtualroadside_com.request_str
ORDER BY virtualroadside_com.request_str DESC
LIMIT 0,100;
So, the big question here is: Is there better ways to accomplish this sort of thing without using subqueries? This particular query takes around 40 seconds on 105,000 rows to execute on my computer (Dual PIII, 500Mhz). I’m positive theres a way to do with with a JOIN of some kind, but I can’t get any of those to work correctly either. Let me know if you have any good ideas! I’ll publish a better way to do this hopefully in the next few days once I figure it out. 🙂