2017-07-17 2 views
1

Contexte: J'écris une maison crawler SEO pour vérifier nos positions dans Google. Le crawling fonctionne brillamment, le stockage est bien, mais j'ai maintenant des problèmes de performance avec l'affichage des données (actuellement la table de stockage a plus de 11 millions d'enregistrements et est> 6 Go.)Tableau JOIN avec augmentation de la limite et la performance

J'essaie de créer une requête SQL qui va me montrer tous les enregistrements de la table input_keywords, puis le dernier résultat de la table rank_result (pour le CompanyName donné) avec le résultat précédent de la table rank_result (qui nous montrera notre mouvement, soit vers le bas)

Les tableaux sont les suivants

Tableau: input_keywords

------------------------------------------------------------------------------------------------------- 
| Field   | Type    | Null | Key | Default    | Extra      | 
------------------------------------------------------------------------------------------------------- 
| id    | int(11) unsigned | NO | PRI | NULL    | auto_increment    | 
------------------------------------------------------------------------------------------------------- 
| keyword   | char(150)  | YES | UNI | NULL    |        | 
------------------------------------------------------------------------------------------------------- 
| last_check  | timestamp  | YES | MUL | 2000-01-01 00:00:00 |        | 
------------------------------------------------------------------------------------------------------- 
| CREATION  | timestamp  | YES |  | CURRENT_TIMESTAMP |        | 
------------------------------------------------------------------------------------------------------- 
| MODIFICATION | timestamp  | YES |  | NULL    | on update CURRENT_TIMESTAMP | 
------------------------------------------------------------------------------------------------------- 
| p_deep   | int(1)   | YES |  | 5     |        | 
------------------------------------------------------------------------------------------------------- 
| check_freq_days | int(11)   | YES |  | 3     |        | 
------------------------------------------------------------------------------------------------------- 
| type   | char(50)   | YES |  | NULL    |        | 
------------------------------------------------------------------------------------------------------- 
| competitor  | char(100)  | YES | MUL | CompanyName   |       | 
------------------------------------------------------------------------------------------------------- 

Tableau: rank_result:

----------------------------------------------------------------------------- 
| Field   | Type    | Null | Key | Default | Extra   | 
----------------------------------------------------------------------------- 
| id    | int(11) unsigned | NO | PRI | NULL | auto_increment | 
----------------------------------------------------------------------------- 
| keyword  | char(150)  | YES | MUL |   |    | 
----------------------------------------------------------------------------- 
| result_url  | text    | YES |  | NULL |    | 
----------------------------------------------------------------------------- 
| position  | int(11)   | YES |  | NULL |    | 
----------------------------------------------------------------------------- 
| check_time  | timestamp  | YES | MUL | NULL |    | 
----------------------------------------------------------------------------- 
| useragent_used | char(255)  | YES |  | NULL |    | 
----------------------------------------------------------------------------- 
| proxy_log  | text    | YES |  | NULL |    | 
----------------------------------------------------------------------------- 
| check_date  | date    | YES |  | NULL |    | 
----------------------------------------------------------------------------- 
| competitor  | tinytext   | YES |  | NULL |    | 
----------------------------------------------------------------------------- 

Quelques exemples de données pour montrer ce que je suis en train de réaliser

contenu Exemple: input_keywords

----------------------------------------------------------------------------------------------------------------------------------------------- 
| id | keyword    | last_check   | CREATION   | MODIFICATION  | p_deep | check_freq_days | type | competitor | 
----------------------------------------------------------------------------------------------------------------------------------------------- 
| 2 | guitar accessories | 2017-04-06 10:34:36 | 2017-01-20 12:27:27 | 2017-04-06 08:21:02 | 5  | 3    | NULL | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------- 
| 3 | guitar amps   | 2017-04-06 10:46:42 | 2017-01-20 12:27:33 | 2017-04-06 08:33:08 | 5  | 3    | NULL | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------- 
| 4 | guitar strings  | 2017-04-06 10:50:30 | 2017-01-20 12:27:42 | 2017-04-06 08:36:56 | 5  | 3    | NULL | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------- 
| 5 | guitar effects pedals | 2017-04-06 11:01:44 | 2017-01-20 12:27:50 | 2017-04-06 08:48:11 | 5  | 3    | NULL | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------- 

contenu Exemple: rank_result (modifié pour afficher uniquement les données pertinentes)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| id | keyword   | result_url       | position | check_time   | useragent_used      | proxy_log    | check_date | competitor | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 723 | guitar accessories | https://www.companyname.com/gui… | 33  | 2017-01-19 17:23:20 | Mozilla/5.0 (X11; OpenBSD i386) App… | NULL     | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 1572 | guitar accessories | https://www.companyname.com/gui… | 37  | 2017-01-19 19:03:45 | Mozilla/5.0 (Windows NT 6.1; rv:21.… | 88.150.147.201  | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 1672 | guitar accessories | https://www.companyname.com/gui… | 37  | 2017-01-19 19:08:22 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201  | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 2511 | guitar accessories | https://www.companyname.com/gui… | 37  | 2017-01-19 19:51:25 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 88.150.147.201  | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 2656 | guitar accessories | https://www.companyname.com/gui… | 33  | 2017-01-19 19:58:08 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 5.152.200.181   | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 2809 | guitar accessories | https://www.companyname.com/gui… | 37  | 2017-01-19 20:02:51 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 88.150.147.201  | 2017-01-19 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 3147 | guitar accessories | https://www.companyname.com/gui… | 36  | 2017-01-20 09:19:40 | Mozilla/5.0 (Windows NT 5.1; rv:21.… | 5.152.200.181   | 2017-01-20 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 3490 | guitar accessories | https://www.companyname.com/gui… | 31  | 2017-01-20 11:26:39 | Mozilla/5.0 (compatible; MSIE 10.0;… | 185.17.148.252  | 2017-01-20 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 4530 | guitar accessories | https://www.companyname.com/gui… | 31  | 2017-01-20 11:37:53 | Mozilla/5.0 (Macintosh; U; Intel Ma… | 185.17.148.252  | 2017-01-20 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 5277 | guitar accessories | https://www.companyname.com/gui… | 34  | 2017-01-20 16:57:30 | Mozilla/5.0 (Windows NT 5.1) AppleW… | 5.152.200.181:27281 | 2017-01-20 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 5480 | guitar accessories | https://www.companyname.com/gui… | 38  | 2017-01-23 12:33:32 | Mozilla/5.0 (X11; OpenBSD i386) App… | 5.152.200.181:27281 | 2017-01-23 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 9953 | guitar accessories | https://www.companyname.com/gui… | 37  | 2017-01-23 16:02:19 | Mozilla/5.0 (Windows NT 6.2; rv:22.… | 149.255.105.142:27281 | 2017-01-23 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 12836 | guitar accessories | https://www.companyname.com/gui… | 40  | 2017-01-23 18:03:58 | Mozilla/5.0 (X11; Linux x86_64; rv:… | 88.150.147.201:27281 | 2017-01-23 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 14470 | guitar accessories | https://www.companyname.com/gui… | 38  | 2017-01-23 23:03:55 | Mozilla/5.0 (Windows NT 6.1; WOW64;… | 185.10.202.64:27281 | 2017-01-23 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 39524 | guitar accessories | https://www.companyname.com/gui… | 32  | 2017-01-24 13:03:09 | Mozilla/5.0 (Windows; U; Windows NT… | 185.10.201.77:27281 | 2017-01-24 | CompanyName | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

sortie Exemple:

--------------------------------------------------------------------------------------------------------------------------------------------- 
| search_keyword    | p_deep | check_freq_days | CREATION   | last_check   | current_position | previous_position | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| guitar accessories   | 5  | 3    | 2017-01-20 12:27:27 | 2017-07-17 09:03:43 | 37    | 39    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| acoustic guitar strings  | 5  | 3    | 2017-06-23 17:44:52 | 2017-07-15 01:03:56 | NULL    | NULL    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| acoustic guitars   | 5  | 1    | 2017-01-20 12:27:17 | 2017-07-16 23:03:44 | 14    | 14    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| bass guitars    | 5  | 1    | 2017-01-20 12:31:56 | 2017-07-16 22:03:51 | 41    | 44    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| Bluguitar Amp1 Nanotube  | 5  | 1    | 2017-01-30 17:48:34 | 2017-07-17 09:30:29 | NULL    | NULL    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| Bluguitar NanoCab   | 5  | 1    | 2017-01-30 17:48:34 | 2017-07-17 09:30:26 | NULL    | NULL    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| choosing a bass guitar  | 5  | 3    | 2017-05-24 22:21:40 | 2017-07-15 16:04:01 | 5    | 4     | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| choosing a guitar   | 5  | 3    | 2017-04-10 15:25:37 | 2017-07-17 00:19:02 | 24    | 24    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| choosing an acoustic guitar | 5  | 3    | 2017-04-10 15:25:37 | 2017-07-17 00:18:33 | 12    | 12    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 
| choosing an electric guitar | 5  | 3    | 2017-04-10 15:25:37 | 2017-07-17 00:18:51 | 10    | 11    | 
--------------------------------------------------------------------------------------------------------------------------------------------- 

Actuellement ma requête est que suit:

SELECT i.`keyword` AS 'search_keyword', i.`p_deep`, i.`check_freq_days`, i.`CREATION`, i.`last_check`, 
       (SELECT r.position AS 'current_position' FROM rank_result r where r.`keyword` = search_keyword AND r.`competitor` = 'CompanyName' AND i.`last_check` = r.`check_time` ORDER BY r.check_time DESC LIMIT 0,1) AS 'current_position', 
       (SELECT rr.`position` AS 'previous_position' FROM rank_result rr WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' ORDER BY rr.check_time DESC LIMIT 1,1) AS 'previous_position' 
       FROM input_keywords i 
       WHERE i.keyword LIKE "%s" 
       order by i.keyword ASC 
       LIMIT 0,100 

Mes questions sont les suivantes:

  1. Y at-il une meilleure façon d'écrire cette requête
  2. Je dois limiter à 100 résultats, ou la requête est trop longue, et expire, cela peut-il être résolu?
  3. Si je n'ai pas besoin de l'ORDER BY rr.check_time DESC, la requête est des centaines de fois plus rapide mais ne retournera évidemment pas l'information correcte car elle n'obtient pas les derniers enregistrements mais la première, alors je peux le faire d'une manière différente?
  4. J'aimerais beaucoup ne pas avoir le WHERE KEYWORD LIKE et juste retourner tous mes input_keywords avec leur rang actuel et le rang précédent.

Informations complémentaires:

pour retourner le grade actuel pour un mot clé:

***input_keywords   rank_result*** 
    keyword   == keyword 
    last_check  == check_time (this make sure that if we drop off the search results I don't keep returning an incorrect figure) 
    competitor  == competitor (this allows us to monitor us and our competitors.) 

Pour revenir au rang précédent pour un mot clé

***input_keywords   rank_result*** 
    keyword   == keyword 
    competitor  == competitor (this allows us to monitor us and our competitors.) 
    ORDER BY check_time desc 
    LIMIT 1,1 (to get the last but one result) 

S'il vous plaît être gentil - je Je suis autodidacte sur tout ça!

EDIT 1.

Explain Extended sur ma requête en cours (j'ai inclus la déclaration crée ainsi)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| id | select_type  | table | type | possible_keys    | key  | key_len | ref       | rows | filtered | Extra      | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 1 | PRIMARY   | i  | ALL | NULL       | NULL | NULL | NULL       | 1682 | 100.00 | Using where; Using filesort | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 3 | DEPENDENT SUBQUERY | rr | ref | keyword      | keyword | 451  | func       | 32 | 100.00 | Using where; Using filesort | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
| 2 | DEPENDENT SUBQUERY | r  | ref | keyword,idx_rank_result_che… | keyword | 609  | func,GoogleCrawler.i.last_c… | 2 | 100.00 | Using where; Using filesort | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

CREATE TABLE `input_keywords` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT, 
    `keyword` char(150) DEFAULT NULL COMMENT 'the keyword....', 
    `last_check` timestamp NULL DEFAULT '2000-01-01 00:00:00' COMMENT 'Last check timestamp, default to years ago so we check immediatly', 
    `CREATION` timestamp NULL DEFAULT CURRENT_TIMESTAMP, 
    `MODIFICATION` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, 
    `p_deep` int(1) DEFAULT '5' COMMENT 'how many pages deep to search - default 5', 
    `check_freq_days` int(11) DEFAULT '3' COMMENT 'how often to check this keyword in DAYS default 3', 
    `type` char(50) DEFAULT NULL COMMENT 'Product, Category, other etc', 
    `competitor` tinytext, 
    PRIMARY KEY (`id`), 
    UNIQUE KEY `UNQ_Keyword` (`keyword`), 
    KEY `keyword` (`keyword`(100),`last_check`,`competitor`(100)) 
) ENGINE=InnoDB AUTO_INCREMENT=6001 DEFAULT CHARSET=utf8; 


CREATE TABLE `rank_result` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT, 
    `keyword` char(150) DEFAULT '', 
    `result_url` text, 
    `position` int(11) DEFAULT NULL, 
    `check_time` timestamp NULL DEFAULT NULL, 
    `useragent_used` char(255) DEFAULT NULL, 
    `proxy_log` text, 
    `check_date` date DEFAULT NULL COMMENT 'date of the check - easier for graph plotting', 
    `competitor` tinytext, 
    PRIMARY KEY (`id`), 
    KEY `keyword` (`keyword`,`check_time`,`competitor`(50)), 
    KEY `idx_rank_result_check_time` (`check_time`) 
) ENGINE=InnoDB AUTO_INCREMENT=11444318 DEFAULT CHARSET=utf8; 

EDIT 2:

A partir des deux réponses à ce jour, je J'ai ajusté mon index sur le rank_result et ajouté dans la limite par échelle de temps. Je reçois maintenant mes résultats en < 1s ce qui est un résultat incroyable.

CEPENDANT. J'ai toujours l'impression que ma requête est vraiment «hacky» et pense qu'il doit y avoir une solution meilleure et plus propre - y en a-t-il une?

(requête en cours qui est actuellement en production)

SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days, 
i.CREATION, i.last_check, 
     (SELECT r.position 
     FROM rank_result r 
     WHERE r.`keyword` = search_keyword AND 
       r.`competitor` = 'Absolute' AND 
       i.`last_check` = r.`check_time` 
     ORDER BY r.check_time DESC 
     LIMIT 0,1 
     ) AS 'current_position', 
     (SELECT rr.`position` 
     FROM rank_result rr 
     WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'Absolute' AND check_time > (NOW() - INTERVAL 2 WEEK) 
     ORDER BY rr.check_time DESC 
     LIMIT 1, 1 
     ) AS 'previous_position' 
     FROM input_keywords i 
     ORDER BY i.keyword ASC 
+0

Pour vous aider, nous avons besoin des informations supplémentaires suivantes: indexes information et extended explain (do 'explain select i ....'). Merci d'avoir posé votre question avec brio. Appréciez-le. –

+0

@PrabhatG informations mises à jour comme demandé –

Répondre

1

Je voudrais ajouter à la sous-requête

SELECT rr.`position` AS 'previous_position' 
FROM rank_result rr 
WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' 
ORDER BY rr.check_time DESC LIMIT 1,1 

une limitation, si possible, par exemple

AND rr.check_time>NOW - 1 WEEK 

ou quelque chose comme ceci pour limiter quantité d'enregistrements à traiter

Voir également le déplacement de la requête à la de la section et le joindre à la requête principale

+0

J'ai ajouté ce qui suit à la sous-requête '' 'SELECT rr.'position' AS 'previous_position' FROM rang_result rr WHERE rr.'keyword' = mot-clé de recherche AND rr.'competitor' = 'absolute 'AND rr.check_time> MAINTENANT() - INTERVALLE 1 SEMAINE DE COMMANDE PAR rr.check_time DESC LIMITE 1,1) AS' previous_position'''' Mais je ne suis pas sûr de savoir comment 'joindre' à la requête principale, il doesnt semble faire une énorme différence à la vitesse lorsque je supprime la clause 'LIKE' –

+0

Cela a également augmenté la vitesse à laquelle les résultats sont retournés, merci. –

2

Pour cette requête:

SELECT i.`keyword` AS search_keyword, i.p_deep, i.check_freq_days, i.CREATION, i.last_check, 
     (SELECT r.position 
     FROM rank_result r 
     WHERE r.`keyword` = search_keyword AND 
       r.`competitor` = 'CompanyName' AND 
       i.`last_check` = r.`check_time` 
     ORDER BY r.check_time DESC 
     LIMIT 0,1 
     ) AS current_position, 
     (SELECT rr.`position` 
     FROM rank_result rr 
     WHERE rr.`keyword` = search_keyword AND rr.`competitor` = 'CompanyName' 
     ORDER BY rr.check_time DESC 
     LIMIT 1, 1 
     ) AS previous_position 
FROM input_keywords i 
WHERE i.keyword LIKE "%s" 
ORDER BY i.keyword ASC 
LIMIT 0, 100; 

Vous voulez un index sur rank_result(keyword, competitor, check_time, position).

+0

Cela a aidé massivement - merci! –