2010-02-04 3 views
3

Comment optimiser cette requête? Le schéma:Optimisation SQL complexe et langage généraliste

mysql> show columns from transactionlog; 
+---------------+-------------------------------------------+------+-----+---------+----------------+ 
| Field   | Type          | Null | Key | Default | Extra   | 
+---------------+-------------------------------------------+------+-----+---------+----------------+ 
| id   | int(11)         | NO | PRI | NULL | auto_increment | 
| transactionid | varchar(10)        | NO | MUL | NULL |    | 
| queryid  | tinyint(4)        | NO |  | NULL |    | 
| tableid  | varchar(30)        | NO | MUL | NULL |    | 
| tupleid  | int(11)         | NO |  | NULL |    | 
| querytype  | enum('select','insert','delete','update') | NO |  | NULL |    | 
| schemaname | varchar(20)        | YES |  | NULL |    | 
| partition  | tinyint(3) unsigned      | YES |  | NULL |    | 
+---------------+-------------------------------------------+------+-----+---------+----------------+ 
8 rows in set (0.04 sec) 

La requête:

select concat(weight, ' ', ids, '\n') 
from (
    select 
    tableid, 
    tupleid, 
    group_concat(id separator ' ') as ids, 
    (
     select count(distinct transactionid) 
     from transactionlog 
     where transactionid in (
     select transactionid 
     from transactionlog 
     where (tableid, tupleid, querytype) = 
       (t.tableid, t.tupleid, 'update') 
     group by transactionid 
     having count(*) > 0 
    ) 
    ) weight 
    from transactionlog t 
    group by tableid, tupleid 
    having weight > 0 and count(*) > 1 
) u; 

Ceci est la sortie et EXPLIQUER mk-visuelle expliquer:

+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+------------------------------ 
----------------+ 
| id | select_type  | table   | type | possible_keys | key   | key_len | ref  | rows | Extra          | 
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+ 
| 1 | PRIMARY   | <derived2>  | ALL | NULL   | NULL   | NULL | NULL  | 13 |            | 
| 2 | DERIVED   | t    | ALL | NULL   | NULL   | NULL | NULL  | 68 | Using filesort        | 
| 3 | DEPENDENT SUBQUERY | transactionlog | index | NULL   | transactionid | 12  | NULL  | 68 | Using where; Using index      | 
| 4 | DEPENDENT SUBQUERY | transactionlog | ref | tableid  | tableid  | 36  | func,func | 2 | Using where; Using temporary; Using filesort | 
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+ 
Table scan 
rows   13 
+- DERIVED 
    table   derived(t,transactionlog,temporary(transactionlog)) 
    +- DEPENDENT SUBQUERY 
     +- DEPENDENT SUBQUERY 
     | +- Filesort 
     | | +- TEMPORARY 
     | |  table   temporary(transactionlog) 
     | |  +- Filter with WHERE 
     | |  +- Bookmark lookup 
     | |   +- Table 
     | |   | table   transactionlog 
     | |   | possible_keys tableid 
     | |   +- Index lookup 
     | |    key   transactionlog->tableid 
     | |    possible_keys tableid 
     | |    key_len  36 
     | |    ref   func,func 
     | |    rows   2 
     | +- Filter with WHERE 
     |  +- Index scan 
     |  key   transactionlog->transactionid 
     |  key_len  12 
     |  rows   68 
     +- Filesort 
     +- Table scan 
      rows   68 
      +- Table 
       table   t 

C'est beaucoup de travail. Je peux écrire la logique équivalent en Python tout en un seul passage:

results = query(""" 
    select tableid, tupleid, transactionid, id, querytype 
    from transactionlog_2warehouse 
""") 
_tab, _tup = None 
ids = [] 
weight = 0 
saw_upd = False 
for tab, tup, txn, id, qt in results: 
    if (_tab, _tup) != (tab, tup): 
    if len(ids) > 1 and weight > 0: 
     print weight, ids 
    weight = 0 
    ids = [] 
    _txn = None 
    if _txn != txn: 
    saw_upd = False 
    if qt == 'update' and not saw_upd: 
    weight += 1 
    saw_upd = True 
    ids += [id] 

Est-il possible d'atteindre les performances monopasse Python en utilisant SQL pur? Merci d'avance!

+0

Modifier au pl/sql et vous pouvez avoir une chance: P – glasnt

+0

Pouvez-vous nous donner quelques informations quelles sont les informations dont vous avez besoin pour extraire. Cela ressemble à un problème intéressant, mais sans les exigences je ne peux pas passer le temps à deviner/faire le travail de détective .. merci. – lexu

Répondre

0

Utilisation:

SELECT CONCAT(x.weight, ' ', GROUP_CONCAT(t.id SEPARATOR ' '), '\n') 
    FROM TRANSACTIONLOG t 
    JOIN (SELECT tl.tableid, 
       tl.tupleid, 
       COUNT(DISTINCT tl.transactionid) AS weight 
      FROM TRANSACTIONLOG tl 
      WHERE tl.querytype = 'update' 
     GROUP BY tl.tableid, tl.tupleid) x ON x.tableid = t.tableid 
              AND x.tupleid = t.tupleid 
              AND x.weight > 0 
GROUP BY t.tableid, t.tupleid, x.weight 
    HAVING COUNT(*) > 1