Fastest way to do MySQL query select with WHERE clauses on an unindexed table -
i have big unindexed table called table rows this:
ip entrypoint timestamp 171.128.123.179 /page-title/?kw=abc 2016-04-14 11:59:52 170.45.121.111 /another-page/?kw=123 2016-04-12 04:13:20 169.70.121.101 /a-third-page/ 2016-05-12 09:43:30
i want make fastest query that, given 30 ips , 1 date, search rows far week before date , return recent row contains "?kw=" each ip. want distinct entrypoints recent one.
i'm stuck know it's relatively simple inner join don't know fastest way it.
by way: can't add index right because it's big , on db serves website. i'm going replace indexed table don't worry.
rows table
select ... very_big_unindexed_table t
only within past week...
t.timestamp >= now() + interval - 1 week
that contains '?kw=' in entry point
, t.entrypoint '%?kw=%'
only latest row each ip. there's couple of approaches that. correlated subquery on big unindexed table going eat lunch , lunch box. , without index, there's no getting around full scan of table , "using filesort" operation.
given unfortunate circumstances, our best bet performance going getting set whittled down small can, , perform sort, , avoid join operations (back table) , avoid correlated subqueries.
so, let's start this, return all of rows past week '?kw=' in entry point. going full scan of table, , sort operation...
select t.ip , t.timestamp , t.entry_point very_big_unindexed_table t t.timestamp >= now() + interval -1 week , t.entrypoint '%?kw=%' order t.ip desc, t.timestamp desc
we can use unsupported trick user-defined variables. (the mysql reference manual warns against using pattern this, because behavior (officially) undefined. unofficially, optimizer in mysql 5.1 , 5.5 (at least) predictable.
i think going going get, if number of rows past week significant subset of entire table. going create sizable intermediate resultset (derived table), if there lot of rows satisfy predicates.
select q.ip , q.entrypoint , q.timestamp ( select if(t.ip = @prev_ip, 0, 1) new_ip , @prev_ip := t.ip ip , t.timestamp timestamp , t.entrypoint entrypoint (select @prev_ip := null) cross join very_big_unindexed_table t t.timestamp >= now() + interval -1 week , t.entrypoint '%?kw=%' order t.ip desc, t.timestamp desc ) q q.new_ip
execution of query require (in terms of what's going take time)
- a full scan of table (there's no way around that)
- a sort operation (again, there's no way around that)
- materializing derived table containing of rows satisfy predicates
- a pass through derived table pull out "latest" row each ip
Comments
Post a Comment