2009-05-06 15 views

Répondre

8

Supposons que la test_html variable a le contenu HTML suivant:

<html> 
<head><title>Test title</title></head> 
<body> 
<p>Some paragraph</p> 
Useless Text 
<a href="http://stackoverflow.com">Some link</a>not a link 
<a href="http://python.org">Another link</a> 
</body></html> 

Il suffit de faire ceci:

from BeautifulSoup import BeautifulSoup 

test_html = load_html_from_above() 
soup = BeautifulSoup(test_html) 

for t in soup.findAll(text=True): 
    text = unicode(t) 
    for vowel in u'aeiou': 
     text = text.replace(vowel, u'') 
    t.replaceWith(text) 

print soup 

qui imprime:

<html> 
<head><title>Tst ttl</title></head> 
<body> 
<p>Sm prgrph</p> 
Uslss Txt 
<a href="http://stackoverflow.com">Sm lnk</a>nt lnk 
<a href="http://python.org">Anthr lnk</a> 
</body></html> 

Notez que les balises et les attributs sont intacts .

Questions connexes