Fastest way to do MySQL query select with WHERE clauses on an unindexed table -


i have big unindexed table called table rows this:

ip              entrypoint              timestamp 171.128.123.179 /page-title/?kw=abc     2016-04-14 11:59:52 170.45.121.111  /another-page/?kw=123   2016-04-12 04:13:20 169.70.121.101  /a-third-page/          2016-05-12 09:43:30 

i want make fastest query that, given 30 ips , 1 date, search rows far week before date , return recent row contains "?kw=" each ip. want distinct entrypoints recent one.

i'm stuck know it's relatively simple inner join don't know fastest way it.

by way: can't add index right because it's big , on db serves website. i'm going replace indexed table don't worry.

rows table

select ...   very_big_unindexed_table t 

only within past week...

 t.timestamp >= now() + interval - 1 week 

that contains '?kw=' in entry point

   , t.entrypoint '%?kw=%' 

only latest row each ip. there's couple of approaches that. correlated subquery on big unindexed table going eat lunch , lunch box. , without index, there's no getting around full scan of table , "using filesort" operation.

given unfortunate circumstances, our best bet performance going getting set whittled down small can, , perform sort, , avoid join operations (back table) , avoid correlated subqueries.

so, let's start this, return all of rows past week '?kw=' in entry point. going full scan of table, , sort operation...

         select t.ip               , t.timestamp               , t.entry_point            very_big_unindexed_table t           t.timestamp >= now() + interval -1 week             , t.entrypoint '%?kw=%'           order t.ip desc, t.timestamp desc 

we can use unsupported trick user-defined variables. (the mysql reference manual warns against using pattern this, because behavior (officially) undefined. unofficially, optimizer in mysql 5.1 , 5.5 (at least) predictable.

i think going going get, if number of rows past week significant subset of entire table. going create sizable intermediate resultset (derived table), if there lot of rows satisfy predicates.

select q.ip      , q.entrypoint      , q.timestamp   (          select if(t.ip = @prev_ip, 0, 1) new_ip               , @prev_ip  := t.ip         ip               , t.timestamp               timestamp               , t.entrypoint              entrypoint            (select @prev_ip := null)           cross            join very_big_unindexed_table t           t.timestamp >= now() + interval -1 week             , t.entrypoint '%?kw=%'           order t.ip desc, t.timestamp desc        ) q  q.new_ip 

execution of query require (in terms of what's going take time)

  • a full scan of table (there's no way around that)
  • a sort operation (again, there's no way around that)
  • materializing derived table containing of rows satisfy predicates
  • a pass through derived table pull out "latest" row each ip

Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -