seul le dernier chunck de contenu est retourné

function curl_get($url){ 
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
     $data = curl_exec($ch); 

     print_r(curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD)); 

     curl_close($ch); 
     return $data; 
}

J'essayais de faire correspondre une chaîne à cette page "wikipedia.sfstate.us/Scarves". J'utilise la fonction pour obtenir le contenu:seul le dernier chunck de contenu est retourné

$url = "http://wikipedia.sfstate.us/Scarves"; 
$html = curl_get($url); 
var_dump($html);

Le résultat ressemble à ceci:

812 //CURLINFO_SIZE_DOWNLOAD 
string(812) "..." //$html string where the content is stored

Cependant, le fichier entier est 64612 octets (résultats par web-sniffer.net). Et 64612 = 1024 * 63 + 812. C'est-à-dire, Je reçois seulement les derniers 812 octets du fichier.

Pourquoi cela est-il possible? Des idées sur la façon d'obtenir le contenu entier? Merci.

P.S .: J'ai également essayé sth. comme ci-dessous mais n'aide pas

if(strlen($html) < 1024){ 
    $html = ''; 
    $i = 0; 
    while($content = file_get_contents($url, FILE_TEXT, NULL, $i, $i + 1023)){ 
      $html .= $content; 
      $i += 1023; 
    } 
}

Source

2012-08-16 ethanator

La page que vous essayez de gratter a une protection basée sur l'user-agent. Ajouter un agent d'utilisateur approprié à votre demande et cela fonctionne:

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1");

Bien sûr, s'ils ont une telle protection, il est probablement parce qu'ils ne veulent pas que vous grattez leur contenu.

Source

2012-08-16 16:05:51 Tchoupi

essayer ceci est mon code testé fonctionne bien

sortie: - enter image description here

<?php 

function curl_get($url){ 
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_HEADER, true); 
     curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1"); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); 
     $data = curl_exec($ch); 

     print_r(curl_getinfo($ch, CURLINFO_SIZE_DOWNLOAD)); 

     curl_close($ch); 
     return $data; 
} 


$url = "http://wikipedia.sfstate.us/Scarves"; 
$html = curl_get($url); 
var_dump($html);

Essayez aussi un autre exemple

$ch = curl_init("http://wikipedia.sfstate.us/Scarves"); 
$fp = fopen("example_htmlpage.html", "w"); 

curl_setopt($ch, CURLOPT_FILE, $fp); 
curl_setopt($ch, CURLOPT_HEADER, 0); 
curl_exec($ch); 
curl_close($ch); 
fclose($fp);

Source

2012-08-16 17:50:57

seul le dernier chunck de contenu est retourné

Répondre

Questions connexes