Contexte: J'écris une maison crawler SEO pour vérifier nos positions dans Google. Le crawling fonctionne brillamment, le stockage est bien, mais j'ai maintenant des problèmes de performance avec l'affichage des données (actuellement la table de stockage a plus de 11 millions d'enregistrements et est> 6 Go.)Tableau JOIN avec augmentation de la limite et la performance
J'essaie de créer une requête SQL qui va me montrer tous les enregistrements de la table input_keywords
, puis le dernier résultat de la table rank_result
(pour le CompanyName donné) avec le résultat précédent de la table rank_result
(qui nous montrera notre mouvement, soit vers le bas)
Les tableaux sont les suivants
Tableau: input_keywords
-------------------------------------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-------------------------------------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-------------------------------------------------------------------------------------------------------
| keyword | char(150) | YES | UNI | NULL | |
-------------------------------------------------------------------------------------------------------
| last_check | timestamp | YES | MUL | 2000-01-01 00:00:00 | |
-------------------------------------------------------------------------------------------------------
| CREATION | timestamp | YES | | CURRENT_TIMESTAMP | |
-------------------------------------------------------------------------------------------------------
| MODIFICATION | timestamp | YES | | NULL | on update CURRENT_TIMESTAMP |
-------------------------------------------------------------------------------------------------------
| p_deep | int(1) | YES | | 5 | |
-------------------------------------------------------------------------------------------------------
| check_freq_days | int(11) | YES | | 3 | |
-------------------------------------------------------------------------------------------------------
| type | char(50) | YES | | NULL | |
-------------------------------------------------------------------------------------------------------
| competitor | char(100) | YES | MUL | CompanyName | |
-------------------------------------------------------------------------------------------------------
Tableau: rank_result:
-----------------------------------------------------------------------------
| Field | Type | Null | Key | Default | Extra |
-----------------------------------------------------------------------------
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
-----------------------------------------------------------------------------
| keyword | char(150) | YES | MUL | | |
-----------------------------------------------------------------------------
| result_url | text | YES | | NULL | |
-----------------------------------------------------------------------------
| position | int(11) | YES | | NULL | |
-----------------------------------------------------------------------------
| check_time | timestamp | YES | MUL | NULL | |
-----------------------------------------------------------------------------
| useragent_used | char(255) | YES | | NULL | |
-----------------------------------------------------------------------------
| proxy_log | text | YES | | NULL | |
-----------------------------------------------------------------------------
| check_date | date | YES | | NULL | |
-----------------------------------------------------------------------------
| competitor | tinytext | YES | | NULL | |
-----------------------------------------------------------------------------
Quelques exemples de données pour montrer ce que je suis en train de réaliser
contenu Exemple: input_keywords
-----------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | last_check | CREATION | MODIFICATION | p_deep | check_freq_days | type | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 2 | guitar accessories | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 3 | guitar amps | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 4 | guitar strings | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 5 | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5 | 3 | NULL | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------
contenu Exemple: rank_result (modifié pour afficher uniquement les données pertinentes)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | keyword | result_url | position | check_time | useragent_used | proxy_log | check_date | competitor |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 723 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1572 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1672 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2511 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2656 | guitar accessories | https://www.companyname.com/gui… | 33 | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2809 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201 | 2017-01-19 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3147 | guitar accessories | https://www.companyname.com/gui… | 36 | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3490 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4530 | guitar accessories | https://www.companyname.com/gui… | 31 | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5277 | guitar accessories | https://www.companyname.com/gui… | 34 | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281 | 2017-01-20 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 5480 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 9953 | guitar accessories | https://www.companyname.com/gui… | 37 | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40 | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38 | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281 | 2017-01-23 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32 | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281 | 2017-01-24 | CompanyName |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sortie Exemple:
---------------------------------------------------------------------------------------------------------------------------------------------
| search_keyword | p_deep | check_freq_days | CREATION | last_check | current_position | previous_position |
---------------------------------------------------------------------------------------------------------------------------------------------
| guitar accessories | 5 | 3 | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37 | 39 |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitar strings | 5 | 3 | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| acoustic guitars | 5 | 1 | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14 | 14 |
---------------------------------------------------------------------------------------------------------------------------------------------
| bass guitars | 5 | 1 | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41 | 44 |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar Amp1 Nanotube | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| Bluguitar NanoCab | 5 | 1 | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL | NULL |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a bass guitar | 5 | 3 | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5 | 4 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing a guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24 | 24 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an acoustic guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12 | 12 |
---------------------------------------------------------------------------------------------------------------------------------------------
| choosing an electric guitar | 5 | 3 | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10 | 11 |
---------------------------------------------------------------------------------------------------------------------------------------------
Actuellement ma requête est que suit:
SELECT i.`keyword` AS 'search_keyword', i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`,
(SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position',
(SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position'
FROM input_keywords i
WHERE i.keyword LIKE "%s"
order by i.keyword ASC
LIMIT 0,100
Mes questions sont les suivantes:
- Y at-il une meilleure façon d'écrire cette requête
- Je dois limiter à 100 résultats, ou la requête est trop longue, et expire, cela peut-il être résolu?
- Si je n'ai pas besoin de l'ORDER BY rr.check_time DESC, la requête est des centaines de fois plus rapide mais ne retournera évidemment pas l'information correcte car elle n'obtient pas les derniers enregistrements mais la première, alors je peux le faire d'une manière différente?
- J'aimerais beaucoup ne pas avoir le
WHERE KEYWORD LIKE
et juste retourner tous mesinput_keywords
avec leur rang actuel et le rang précédent.
Informations complémentaires:
pour retourner le grade actuel pour un mot clé:
***input_keywords rank_result***
keyword == keyword
last_check == check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure)
competitor == competitor (this allows us to monitor us and our competitors.)
Pour revenir au rang précédent pour un mot clé
***input_keywords rank_result***
keyword == keyword
competitor == competitor (this allows us to monitor us and our competitors.)
ORDER BY check_time desc
LIMIT 1,1 (to get the last but one result)
S'il vous plaît être gentil - je Je suis autodidacte sur tout ça!
EDIT 1.
Explain Extended sur ma requête en cours (j'ai inclus la déclaration crée ainsi)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | i | ALL | NULL | NULL | NULL | NULL | 1682 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3 | DEPENDENT SUBQUERY | rr | ref | keyword | keyword | 451 | func | 32 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 2 | DEPENDENT SUBQUERY | r | ref | keyword,idx_rank_result_che… | keyword | 609 | func,GoogleCrawler.i.last_c… | 2 | 100.00 | Using where; Using filesort |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE `input_keywords` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT NULL COMMENT 'the keyword....',
`last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly',
`CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5',
`check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3',
`type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc',
`competitor` tinytext,
PRIMARY KEY (`id`),
UNIQUE KEY `UNQ_Keyword` (`keyword`),
KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100))
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8;
CREATE TABLE `rank_result` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`keyword` char(150) DEFAULT '',
`result_url` text,
`position` int(11) DEFAULT NULL,
`check_time` timestamp NULL DEFAULT NULL,
`useragent_used` char(255) DEFAULT NULL,
`proxy_log` text,
`check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting',
`competitor` tinytext,
PRIMARY KEY (`id`),
KEY `keyword` (`keyword`,`check_time`,`competitor`(50)),
KEY `idx_rank_result_check_time` (`check_time`)
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8;
EDIT 2:
A partir des deux réponses à ce jour, je J'ai ajusté mon index sur le rank_result
et ajouté dans la limite par échelle de temps. Je reçois maintenant mes résultats en < 1s ce qui est un résultat incroyable.
CEPENDANT. J'ai toujours l'impression que ma requête est vraiment «hacky» et pense qu'il doit y avoir une solution meilleure et plus propre - y en a-t-il une?
(requête en cours qui est actuellement en production)
SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days,
i.CREATION, i.last_check,
(SELECT r.position
FROM rank_result r
WHERE r.`keyword` = search_keyword AND
r.`competitor` = 'Absolute' AND
i.`last_check` = r.`check_time`
ORDER BY r.check_time DESC
LIMIT 0,1
) AS 'current_position',
(SELECT rr.`position`
FROM rank_result rr
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK)
ORDER BY rr.check_time DESC
LIMIT 1, 1
) AS 'previous_position'
FROM input_keywords i
ORDER BY i.keyword ASC
Pour vous aider, nous avons besoin des informations supplémentaires suivantes: indexes information et extended explain (do 'explain select i ....'). Merci d'avoir posé votre question avec brio. Appréciez-le. –
@PrabhatG informations mises à jour comme demandé –