2010-04-19 12 views
3

Je souhaite trouver tous les noeuds d'un fichier XML ayant un certain nom de tag, disons "foo". Si ces foo-tags ont eux-mêmes des nœuds enfants avec un nom de noeud "bar", alors je veux supprimer ces nœuds. Le résultat devrait être écrit dans un fichier.XML: supprimer le noeud enfant d'un noeud

<myDoc> 
    <foo> 
    <bar/> // remove this one 
    </foo> 
    <foo> 
    <anyThing> 
     <bar/> // don't remove this one 
    </anyThing> 
    </foo> 
</myDoc> 

Merci pour tous les conseils. Comme le tag l'indique, je voudrais le faire avec python.

Répondre

3

Vous pouvez utiliser ElementTree:

from xml.etree.ElementTree import ElementTree 
tree = ElementTree() 
tree.parse('in.xml') 

foos = tree.findall('foo') 
for foo in foos: 
    bars = foo.findall('bar') 
    for bar in bars: 
    foo.remove(bar) 

tree.write('out.xml') 
0

Merci beaucoup à miles82 de me donner l'idée de résoudre mon problème. Voici une suppression multi-nœuds et/ou multi-éléments de mon fichier xml. Les données d'échantillons de mon énorme fichier d'origine ressemblent à ceci:

<Data> 
    <horsedata> 
      <horse_name>DO NOT DELETE</horse_name> 
      <stats_data> 
        <stat type="ALL_WEATHR"> 
           GOT TO GO 
      </stats_data> 

      <sire><sirename>GOT TO GO</sirename><tmmark>M</tmmark><stud_fee>5000</stud_fee> 

      </sire> 
      <dam><damname>GOT TO GO</damname><damsire>WISED UP</damsire> 

      </dam> 

      <jockey><stat_breed>GOT TO GO</stat_breed><jock_disp>Lopez Charles C</jock_disp> 
        <stats_data> 
          <stat type="LAST30"> 
            GOT TO GO 
          </stat>                            
        </stats_data> 
      </jockey> 

      <workoutdata>Some More Text ... </workoutdata> 
      <workoutdata>Yet More Text</workoutdata> 

      <ppdata><racedate>20150801</racedate><trackcode>DO NOT DELETE</trackcode><trackname>Monmouth Park</trackname> 
      <ppdata><racedate>20150715</racedate><trackcode>DO NOT DELET</trackcode><trackname>Belmont Park</trackname>er>2</racenumber><racebreed>TB</racebreed><country>USA</country><racetype>MCL</racetype><raceclass>MC</raceclass><claimprice>20000</claimprice><purse>29000</purse><classratin>76</classratin><trackcondi>FT</trackcondi><distance>600</distance><disttype>F</disttype><aboutdist/><courseid>D</courseid><surface>D</surface><pulledofft>0</pulledofft><winddirect/><windspeed>0</windspeed><trackvaria>12</trackvaria><sealedtrac/><racegrade>0</racegrade><agerestric>3U</agerestric><sexrestric/><statebredr/><abbrevcond/><postpositi>4</postpositi><favorite>0</favorite><weightcarr>119</weightcarr><jockfirst>Eduardo</jockfirst><jockmiddle/><jocklast>Ulloa</jocklast><jocksuffix/><jockdisp>Ulloa Eduardo</jockdisp><equipment>BF</equipment><medication/><fieldsize>6</fieldsize><posttimeod>54.25</posttimeod><shortcomme>3w upper, gave way</shortcomme><longcommen>slight bobble st, vied 4p early, cleared 2p turn, 3w 1/4,gave way</longcommen><gatebreak>3</gatebreak><position1>1</position1><lenback1>-50.00</lenback1><horsetime1>22.49</horsetime1><leadertime>22.49</leadertime><pacefigure>108</pacefigure><position2>1</position2><lenback2>-150.00</lenback2><horsetime2>46.56</horsetime2><leadertim2>46.56</leadertim2><pacefigur2>77</pacefigur2><positionst>5</positionst><lenbackstr>810.00</lenbackstr><horsetimes>60.75</horsetimes><leadertim3>59.40</leadertim3><dqindicato/><positionfi>6</positionfi><lenbackfin>1700.00</lenbackfin><horsetimef>75.50</horsetimef><leadertim4>72.67</leadertim4><speedfigur>33</speedfigur><turffigure>0.0</turffigure><winnersspe>71</winnersspe><foreignspe>-97</foreignspe><horseclaim>0</horseclaim><biasstyle>F</biasstyle><biaspath>N</biaspath><complineho>Lightning Ron</complineho><complinele>275.00</complinele><complinewe>124</complinewe><complinedq/><complineh2>Thomas Knight</complineh2><complinel2>25.00</complinel2><complinew2>119</complinew2><complined2/><complineh3>Heavy Hitter</complineh3><complinel3>450.00</complinel3><complinew3>124</complinew3><complined3/><linebefore/><lineafter/><domesticpp>1</domesticpp><oflfinish>6</oflfinish><runup_dist>64</runup_dist><rail_dist>-1</rail_dist><apprweight>0</apprweight><vd_claim/><vd_reason/></ppdata> 
    </horsedata> 
</Data> 

Ce que je me suis intéressé pour ne garder que deux trois éléments-à-dire horsename, workoutdata et ppdata ...

<?xml version="1.0"?> 
tree=ET.parse('bel.xml') 
root=tree.getroot() 

horses=tree.findall('.//horsedata') 

for horse in horses: 
    stats = horse.findall('stats_data') 
    sires = horse.findall('sire') 
    dams = horse.findall('dam') 
    jockeys = horse.findall('jockey') 
    trainers = horse.findall('trainer') 

    for stat in stats: 
     horse.remove(stat) 
    for sire in sires: 
     horse.remove(sire) 
    for dam in dams: 
     horse.remove(dam) 
    for jockey in jockeys: 
     horse.remove(jockey) 
    for trainer in trainers: 
     horse.remove(trainer) 

tree.write('junk.xml')`` 

Et voici la sortie finale:

<Data> 
    <horsedata> 
      <horse_name>DO NOT DELETE</horse_name> 
      <workoutdata>Some More Text ... </workoutdata> 
      <workoutdata>Yet More Text</workoutdata> 

      <ppdata><racedate>20150801</racedate><trackcode>DO NOT DELETE</trackcode><trackname>Monmouth Park</trackname></ppdata> 
      <ppdata><racedate>20150715</racedate><trackcode>DO NOT DELET</trackcode><trackname>Belmont Park</trackname></ppdata> 
    </horsedata> 
</Data> 
Questions connexes