2012-10-14 3 views
4

J'essaye d'analyser ce xml (http://www.reddit.com/r/videos/top/.rss) et ai des ennuis faisant ainsi. J'essaie d'enregistrer les liens YouTube dans chacun des éléments, mais j'ai des problèmes à cause du nœud enfant "channel". Comment puis-je atteindre ce niveau pour pouvoir ensuite parcourir les éléments?Comment analyser un flux XML en utilisant python?

#reddit parse 
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') 
#convert to string: 
reddit_data = reddit_file.read() 
#close file because we dont need it anymore: 
reddit_file.close() 

#entire feed 
reddit_root = etree.fromstring(reddit_data) 
channel = reddit_root.findall('{http://purl.org/dc/elements/1.1/}channel') 
print channel 

reddit_feed=[] 
for entry in channel: 
    #get description, url, and thumbnail 
    desc = #not sure how to get this 

    reddit_feed.append([desc]) 

Répondre

5

Vous pouvez findall('channel/item')

import urllib2 
from xml.etree import ElementTree as etree 
#reddit parse 
reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') 
#convert to string: 
reddit_data = reddit_file.read() 
print reddit_data 
#close file because we dont need it anymore: 
reddit_file.close() 

#entire feed 
reddit_root = etree.fromstring(reddit_data) 
item = reddit_root.findall('channel/item') 
print item 

reddit_feed=[] 
for entry in item: 
    #get description, url, and thumbnail 
    desc = entry.findtext('description') 
    reddit_feed.append([desc]) 
3

Je l'ai écrit pour vous en utilisant les expressions Xpath (testé avec succès):

from lxml import etree 
import urllib2 

headers = { 'User-Agent' : 'Mozilla/5.0' } 
req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers) 
reddit_file = urllib2.urlopen(req).read() 

reddit = etree.fromstring(reddit_file) 

for item in reddit.xpath('/rss/channel/item'): 
    print "title =", item.xpath("./title/text()")[0] 
    print "description =", item.xpath("./description/text()")[0] 
    print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0] 
    print "link =", item.xpath("./link/text()")[0] 
    print "-" * 100 
Questions connexes