Je veux lancer la recherche suivante:distinct plus rapidement que par groupe
schema->resultset('Entity')->search({
-or => { "me.user_id" => $user_id, 'set_to_user.user_id' => $user_id }
}, {
'distinct' => 1,
'join' => {'entity_to_set' => {'entity_set' => 'set_to_user'}},
'order_by' => {'-desc' => 'modified'},
'page' => 1,'rows' => 100
});
Sur une base de données avec des tables comme indiqué ci-dessous.
CREATE TABLE entity (
id varchar(500) NOT NULL,
user_id varchar(100) NOT NULL,
modified timestamp NOT NULL,
PRIMARY KEY (id, user_id),
FOREIGN KEY (user_id) REFERENCES user(id) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE TABLE entity_to_set (
set_id varchar(100) NOT NULL,
user_id varchar(100) NOT NULL,
entity_id varchar(500) NOT NULL,
PRIMARY KEY (set_id, user_id, entity_id),
FOREIGN KEY (entity_id, user_id) REFERENCES entity(id, user_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (set_id) REFERENCES entity_set(id) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE TABLE entity_set (
id varchar(100) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE set_to_user (
set_id varchar(100) NOT NULL,
user_id varchar(100) NOT NULL,
PRIMARY KEY (set_id, user_id),
FOREIGN KEY (user_id) REFERENCES user(id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (set_id) REFERENCES entity_set(id) ON DELETE CASCADE ON UPDATE CASCADE
);
CREATE TABLE user (
id varchar(100) NOT NULL,
PRIMARY KEY (id)
);
J'ai environ 6000 entity
, 6000 entity_to_set
, 10 entity_set
et 50 set_to_user
.
Maintenant, cette requête prend un certain temps (une seconde ou deux) ce qui est regrettable. Lorsque vous effectuez des requêtes sur la table d'entités, y compris un ORDER BY
, le résultat est presque instantané. Dans un premier temps pour le débogage, je trouve la requête SQL réelle que le code DBIC devient:
SELECT me.id, me.user_id, me.modified FROM entity me
LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id)
LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id
LEFT JOIN set_to_user set_to_user ON set_to_user.set_id = entity_set.id
WHERE ((set_to_user.user_id = 'Craigy' OR me.user_id = 'Craigy'))
GROUP BY me.id, me.user_id, me.modified ORDER BY modified DESC LIMIT 100;
et voici les résultats de EXPLAIN QUERY PLAN
0|0|0|SCAN TABLE entity AS me USING INDEX sqlite_autoindex_entity_1 (~1000000 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING COVERING INDEX entity_to_set_idx_cover (entity_id=? AND user_id=?) (~9 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
0|3|3|SEARCH TABLE set_to_user AS set_to_user USING COVERING INDEX sqlite_autoindex_set_to_user_1 (set_id=?) (~5 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
où entity_to_set_idx_cover
est
CREATE INDEX entity_to_set_idx_cover ON entity_to_set (entity_id, user_id, set_id);
Maintenant, le problème est le b-tree utilisé pour le tri, au lieu d'un index qui est utilisé quand je ne fais pas les jointures.
J'ai remarqué que DBIx :: Class a converti 'distinct' => 1
en une instruction GROUP BY
(I believe the documentation says they are equivalent here). J'ai retiré la déclaration GROUP BY
et utilisé SELECT DISTINCT
à la place, avec la requête suivante
SELECT DISTINCT me.id, me.user_id, me.modified FROM entity me
LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id)
LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id
LEFT JOIN set_to_user set_to_user ON set_to_user.set_id = entity_set.id
WHERE ((set_to_user.user_id = 'Craigy' OR me.user_id = 'Craigy'))
ORDER BY modified DESC LIMIT 100;
que je crois donne le même résultat. Le EXPLAIN QUERY PLAN
pour cette requête est
0|0|0|SCAN TABLE entity AS me USING COVERING INDEX entity_sort_modified_user_id (~1000000 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING COVERING INDEX entity_to_set_idx_cover (entity_id=? AND user_id=?) (~9 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
0|3|3|SEARCH TABLE set_to_user AS set_to_user USING COVERING INDEX sqlite_autoindex_set_to_user_1 (set_id=?) (~5 rows)
où entity_sort_modified_user_id
est un indice créé à l'aide
CREATE INDEX entity_sort_modified_user_id ON entity (modified, user_id, id);
Cela va presque instantanément (pas b-arbre).
EDIT: pour démontrer que le problème se produit toujours lorsque le ORDER BY
est dans l'ordre croissant et l'effet que l'index a sur ces requêtes, voici une requête similaire pour les mêmes tables. Les deux premières requêtes sont sans index en utilisant respectivement SELECT DISTINCT
et GROUP BY
, et les deux autres sont avec les mêmes requêtes et l'index.
sqlite> EXPLAIN QUERY PLAN SELECT DISTINCT me.id, me.user_id, me.modified FROM entity me LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id) LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id WHERE (me.user_id = 'Craigy' AND entity_set.id = 'SetID') ORDER BY modified LIMIT 100;
0|0|0|SCAN TABLE entity AS me (~100000 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING AUTOMATIC COVERING INDEX (entity_id=? AND user_id=?) (~7 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|0|USE TEMP B-TREE FOR ORDER BY
sqlite> EXPLAIN QUERY PLAN SELECT me.id, me.user_id, me.modified FROM entity me LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id) LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id WHERE (me.user_id = 'Craigy' AND entity_set.id = 'SetID') GROUP BY me.id, me.user_id, me.modified ORDER BY modified LIMIT 100;
0|0|0|SCAN TABLE entity AS me USING INDEX sqlite_autoindex_entity_1 (~100000 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING AUTOMATIC COVERING INDEX (entity_id=? AND user_id=?) (~7 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
sqlite> CREATE INDEX entity_idx_user_id_modified_id ON entity (user_id, modified, id);
sqlite> EXPLAIN QUERY PLAN SELECT DISTINCT me.id, me.user_id, me.modified FROM entity me LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id) LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id WHERE (me.user_id = 'Craigy' AND entity_set.id = 'SetID') ORDER BY modified LIMIT 100;
0|0|0|SEARCH TABLE entity AS me USING COVERING INDEX entity_idx_user_id_modified_id (user_id=?) (~10 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING AUTOMATIC COVERING INDEX (entity_id=? AND user_id=?) (~7 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
sqlite> EXPLAIN QUERY PLAN SELECT me.id, me.user_id, me.modified FROM entity me LEFT JOIN entity_to_set entity_to_set ON (entity_to_set.entity_id = me.id AND entity_to_set.user_id = me.user_id) LEFT JOIN entity_set entity_set ON entity_set.id = entity_to_set.set_id WHERE (me.user_id = 'Craigy' AND entity_set.id = 'SetID') GROUP BY me.id, me.user_id, me.modified ORDER BY modified LIMIT 100;
0|0|0|SEARCH TABLE entity AS me USING COVERING INDEX entity_idx_user_id_modified_id (user_id=?) (~10 rows)
0|1|1|SEARCH TABLE entity_to_set AS entity_to_set USING AUTOMATIC COVERING INDEX (entity_id=? AND user_id=?) (~7 rows)
0|2|2|SEARCH TABLE entity_set AS entity_set USING COVERING INDEX sqlite_autoindex_entity_set_1 (id=?) (~1 rows)
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY
Ma question est: comment puis-je réparer mon DBIx :: Code de classe pour qu'il fonctionne aussi bien que la requête SELECT DISTINCT
. Ou comment ajouter un index pour qu'il fonctionne aussi bien? Ou est-ce qu'un autre type de correctif est nécessaire?
Je pense qu'il n'y a pas grand-chose à faire sur le côté sqlite: GROUP BY est traité dans l'ordre croissant, mais votre ORDER BY est dans l'ordre décroissant. Même lorsque vous créez les index en tant que DESC (possible dans la version récente de sqlite), vous obtenez toujours l'arbre binaire temp. Notez qu'il disparaît lorsque vous avez ... GROUP BY me.modified, me.user_id, me.id ORDER BY me.modified, me.user_id, moi.id ASC LIMIT 100 mais pas quand vous utilisez DESC. Donc, je dirais que vous devez aborder le sujet du côté de la génération SQL. – Fabian
(Discuté sur la liste de diffusion sqlite [ici] (http://article.gmane.org/gmane.comp.db.sqlite.general/84279)) – Fabian
@Fabian Merci pour l'attention! J'ai ajouté un exemple utilisant ascendant 'ORDER BY' qui montre le même problème. Il montre également l'effet que l'index a (il supprime le B-TREE dans la requête 'SELECT DISTINCT' et ajoute un autre B-TREE dans la requête' GROUP BY'). – Craigy