Je suis aux prises avec la performance d'une requête qui implique une gauche « simple » se joindre à un int
-column et tstzrange
:Simple rejoindre sur int + tstzrange colonne très lente sur seulement ~ 1 million de lignes
SELECT
table_1.id_col
, table_1.time_range
, table_1.other_col_1
, table_2.other_col_2
FROM table_1
LEFT JOIN table_2
ON table_1.id_col = table_2.id_col
AND table_1.time_range = table_2.time_range
Cette requête prend ~ 80-100 secondes pour exécuter pour un ensemble de résultats final de ~ 1 million de lignes (table_1
et table_2
sont sur le même ordre)
Cette requête fait partie d'un complexe plus CTE
requête (qui sélectionne en fait un petit sous-ensemble de ces 1 million
lignes) mais j'ai soulevé la partie qui présente un goulot d'étranglement.
J'ai ajouté (ce que je pense) l'index approprié (GIST
-index) pour la combinaison de ces colonnes, mais à partir de l'explication je suppose que cela est rejeté quand je rejoins pratiquement toutes les lignes.
Existe-t-il un moyen d'améliorer les performances? Comme trier les lignes physiquement pour le balayage séquentiel?
Mes tableaux:
CREATE TABLE data.table_1 (
table_1_id SERIAL NOT NULL,
id_col INTEGER NOT NULL,
time_range TSTZRANGE NOT NULL,
other_col_1 INTEGER,
PRIMARY KEY (table_1_id),
);
CREATE INDEX idx_table_1_id_col ON data.table_1 (id_col);
CREATE INDEX idx_table_1_time_range ON data.table_1 USING gist (time_range);
CREATE INDEX idx_table_1_id_col_time_range ON data.table_1 USING gist (id_col, time_range);
CREATE TABLE data.table_2 (
table_2_id SERIAL NOT NULL,
id_col INTEGER NOT NULL,
time_range TSTZRANGE NOT NULL,
other_col_2 DOUBLE PRECISION,
PRIMARY KEY (table_2_id),
);
CREATE INDEX idx_table_2_id_col ON data.table_2 (id_col);
CREATE INDEX idx_table_2_time_range ON data.table_2 USING gist (time_range);
CREATE INDEX idx_table_2_id_col_time_range ON data.table_2 USING gist (id_col, time_range);
Voici Explain détaillée:
[
{
"Plan": {
"Node Type": "Hash Join",
"Join Type": "Left",
"Startup Cost": 198185.10,
"Total Cost": 4163704.54,
"Plan Rows": 73508636,
"Plan Width": 20,
"Actual Startup Time": 31055.086,
"Actual Total Time": 89488.540,
"Actual Rows": 1015568,
"Actual Loops": 1,
"Output": ["table_1.id_col", "table_1.other_col_1", "table_2.other_col_2"],
"Hash Cond": "((table_1.id_col = table_2.id_col) AND (table_1.time_range = table_2.time_range))",
"Shared Hit Blocks": 165149,
"Shared Read Blocks": 632793,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 38220,
"Temp Written Blocks": 37966,
"I/O Read Time": 0.000,
"I/O Write Time": 0.000,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Relation Name": "table_1",
"Schema": "data",
"Alias": "table_1",
"Startup Cost": 0.00,
"Total Cost": 1492907.36,
"Plan Rows": 73508636,
"Plan Width": 34,
"Actual Startup Time": 24827.453,
"Actual Total Time": 77143.930,
"Actual Rows": 904431,
"Actual Loops": 1,
"Output": ["table_1.id_col", "table_1.other_col_1", "table_1.time_range"],
"Shared Hit Blocks": 165147,
"Shared Read Blocks": 592674,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 0,
"Temp Written Blocks": 0,
"I/O Read Time": 0.000,
"I/O Write Time": 0.000
},
{
"Node Type": "Hash",
"Parent Relationship": "Inner",
"Startup Cost": 88292.64,
"Total Cost": 88292.64,
"Plan Rows": 4817164,
"Plan Width": 34,
"Actual Startup Time": 6204.927,
"Actual Total Time": 6204.927,
"Actual Rows": 4817085,
"Actual Loops": 1,
"Output": ["table_2.other_col_2", "table_2.id_col", "table_2.time_range"],
"Hash Buckets": 65536,
"Original Hash Buckets": 65536,
"Hash Batches": 128,
"Original Hash Batches": 128,
"Peak Memory Usage": 2930,
"Shared Hit Blocks": 2,
"Shared Read Blocks": 40119,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 0,
"Temp Written Blocks": 31422,
"I/O Read Time": 0.000,
"I/O Write Time": 0.000,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Relation Name": "table_2",
"Schema": "data",
"Alias": "table_2",
"Startup Cost": 0.00,
"Total Cost": 88292.64,
"Plan Rows": 4817164,
"Plan Width": 34,
"Actual Startup Time": 0.650,
"Actual Total Time": 3769.157,
"Actual Rows": 4817085,
"Actual Loops": 1,
"Output": ["table_2.other_col_2", "table_2.id_col", "table_2.time_range"],
"Shared Hit Blocks": 2,
"Shared Read Blocks": 40119,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 0,
"Temp Written Blocks": 0,
"I/O Read Time": 0.000,
"I/O Write Time": 0.000
}
]
}
]
},
"Planning Time": 0.350,
"Triggers": [
],
"Execution Time": 89689.809
}
]
Ne pouvez-vous pas mettre des conditions où (je suppose que vous allez filtrer ces résultats plus tard) directement dans cette requête? –
@LorenzoCatalano, mais c'est fait indirectement via les conditions découlant de la CTE. J'ai essentiellement d'autres tables où des sous-ensembles de ce qui précède est joint. (Si cela a du sens) – salient
ressemble à une jointure normale, je ne peux pas dire exactement ce que la plaine dit mais je vois comme "Plan Rangs": 73508636, quoi cela signifie t-il? –