Web lists-archives.com

Re: Help improving query performance




Hi Larry,

On 2/1/2015 4:49 PM, Larry Martell wrote:
I have 2 queries. One takes 4 hours to run and returns 21 rows, and
the other, which has 1 additional where clause, takes 3 minutes and
returns 20 rows. The main table being selected from is largish
(37,247,884 rows with 282 columns). Caching is off for my testing, so
it's not related to that. To short circuit anyone asking, these
queries are generated by python code, which is why there's an IN
clause with 1 value, as oppose to an =.

Here are the queries and their explains. The significant difference is
that the faster query has "Using
intersect(data_cst_bbccbce0,data_cst_fba12377)" in the query plan -
those 2 indexes are on the 2 columns in the where clause, so that's
why the second one is faster. But I am wondering what I can do to make
the first one faster.


4 hour query:

SELECT MIN(data_tool.name) as tool,
        MIN(data_cst.date_time) "start",
        MAX(data_cst.date_time) "end",
        MIN(data_target.name) as target,
        MIN(data_lot.name) as lot,
        MIN(data_wafer.name) as wafer,
        MIN(measname) as measname,
        MIN(data_recipe.name) as recipe
FROM data_cst
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
WHERE data_target.id IN (172) AND
       data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59'
GROUP BY wafer_id, data_cst.lot_id, target_name_id


... snipped ...



Faster query:

SELECT MIN(data_tool.name) as tool,
        MIN(data_cst.date_time) "start",
        MAX(data_cst.date_time) "end",
        MIN(data_target.name) as target,
        MIN(data_lot.name) as lot,
        MIN(data_wafer.name) as wafer,
        MIN(measname) as measname,
        MIN(data_recipe.name) as recipe
FROM data_cst
INNER JOIN data_tool ON data_tool.id = data_cst.tool_id
INNER JOIN data_target ON data_target.id = data_cst.target_name_id
INNER JOIN data_lot ON data_lot.id = data_cst.lot_id
INNER JOIN data_wafer ON data_wafer.id = data_cst.wafer_id
INNER JOIN data_measparams ON data_measparams.id = data_cst.meas_params_name_id
INNER JOIN data_recipe ON data_recipe.id = data_cst.recipe_id
WHERE data_target.id IN (172) AND
       data_recipe.id IN (148) AND
       data_cst.date_time BETWEEN '2015-01-26 00:00:00' AND '2015-01-26 23:59:59'
GROUP BY wafer_id, data_cst.lot_id, target_name_id

... snip ...

Thanks for taking the time to read this, and for any help or pointers
you can give me.


The biggest difference is the added selectivity generated by the WHERE term against the data_recipe table.

Compare the two EXPLAINS, in the faster query you see that data_recipe is listed second. This allows the additional term a chance to reduce the number of row combinations for the entire query.

To really get at the logic behind how the Optimizer chooses its execution plan, get an optimizer trace. Look at the "cost" estimates for each phase being considered.
http://dev.mysql.com/doc/refman/5.6/en/optimizer-trace-table.html
http://dev.mysql.com/doc/internals/en/optimizer-tracing.html

Yours,
--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql