Récemment, j'apprenais Python. Quand j'utilise BeautifulSoup et demande d'obtenir le code HTML, je reçois un statut qui est 405. De plus, la soupe est fausse. J'ai visité le URL.Python web crawler, quand je crawl une URL, le status_code affiche 405
Voici mon code:
def craw(url):
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0'
headers = {'User_Agent':user_agent,
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Host':'www.qichacha.com',
'Referer':'http://www.qichacha.com/search?key=%E5%A9%9A%E5%BA%86',
}
response = requests.get(url,headers = headers)
if response.status_code != 200:
response.encoding = 'utf-8'
print(response.status_code)
print('ERROR')
soup = BeautifulSoup(response.text,'lxml')
print(soup)
if __name__ == '__main__':
url = r'http://www.qichacha.com/search?key=%E5%A9%9A%E5%BA%86'
s1 = craw(url)
la sortie:
405
ERROR
<!DOCTYPE html>
<html lang="zh-cn">
<head>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="a3c0e" name="data-spm"/>
<title>405</title>
<style>
html, body, div, a, h2, p { margin: 0; padding: 0; font-family: 微软
雅黑; }
a { text-decoration: none; color: #3b6ea3; }
.container { width: 1000px; margin: auto; color: #696969; }
.header { padding: 50px 0; }
.header .message { height: 36px; padding-left: 120px; background: ur
l(https://errors.aliyun.com/images/TB1TpamHpXXXXaJXXXXeB7nYVXX-104-162.png) no-r
epeat 0 -128px; line-height: 36px; }
.main { padding: 50px 0; background: #f4f5f7; }
.main img { position: relative; left: 120px; }
.footer { margin-top: 30px; text-align: right; }
.footer a { padding: 8px 30px; border-radius: 10px; border: 1px soli
d #4babec; }
.footer a:hover { opacity: .8; }
.alert-shadow { display: none; position: absolute; top: 0; left: 0;
width: 100%; height: 100%; background: #999; opacity: .5; }
.alert { display: none; position: absolute; top: 200px; left: 50%; w
idth: 600px; margin-left: -300px; padding-bottom: 25px; border: 1px solid #ddd;
box-shadow: 0 2px 2px 1px rgba(0, 0, 0, .1); background: #fff; font-size: 14px;
color: #696969; }
.alert h2 { margin: 0 2px; padding: 10px 15px 5px 15px; font-size:
14px; font-weight: normal; border-bottom: 1px solid #ddd; }
.alert a { display: block; position: absolute; right: 10px; top: 8px
; width: 30px; height: 20px; text-align: center; }
.alert p { padding: 20px 15px; }
</style>
</head>
<body data-spm="7663354">
<div data-spm="1998410538">
<div class="header">
<div class="container">
<div class="message">很抱歉,由于您访问的URL有可能对网站造成安全威胁,您的访问被
阻断。</div>
</div>
</div>
<div class="main">
<div class="container">
<img src="https://errors.aliyun.com/images/TB15QGaHpXXXXXOaXXXXia39XXX-660-117.p
ng"/>
</div>
</div>
<div class="footer">
<div class="container">
<a data-spm-click="gostr=/waf.123.123;locaid=d001;" href="javascript:;" id="repo
rt" target="_blank">误报反馈</a>
</div>
</div>
</div>
<div class="alert-shadow" id="alertShadow"></div>
<div class="alert" id="alertContainer">
<h2>提示:<a href="javascript:;" id="closeAlert" title="关闭">X</a></h2>
<p>感谢您的反馈,应用防火墙会尽快进行分析和确认。</p>
</div>
<script>
function show() {
var g = function(ele) { return document.getElementById(ele); };
var reportHandle = g('report');
var alertShadow = g('alertShadow');
var alertContainer = g('alertContainer');
var closeAlert = g('closeAlert');
var own = {};
own.report = function() {
// SPM
own.alert();
};
own.alert = function() {
alertShadow.style.display = 'block';
alertContainer.style.display = 'block';
};
own.close = function() {
alertShadow.style.display = 'none';
alertContainer.style.display = 'none';
};
};
</script>
<script charset="utf-8" src="https://errors.aliyun.com/error.js?s=3" type="text/
javascript"></script>
</body>
</html>
------------------
(program exited with code: 0)
请按任意键继续. . .
Selon ma sortie, je sais que je suis la soupe est pas la page que je veux. Mais où est le problème? Je suis un novice.
try sans en-tête. –
Cette balise d'image ressemble à l'image que Cloudflare affiche lorsqu'elle bloque une menace DOS. Avez-vous spammé cette URL avec beaucoup de demandes? –
@ MD.KhairulBasar, je l'ai essayé, mais malheureusement, il est toujours faux. – hey6775