2017-04-17 2 views
0

Bonne journée. J'ai une tâche où j'ai besoin de convertir le document Word en HTML. Cela peut être fait en utilisant interop et enregistrer le document en tant que htmlC#: HtmlAgilityPack Descendants

Mais j'ai besoin de nettoyer la sortie html de l'interop

Mais j'ai un problème avec htmlagilitypack. Je pensais que son semblable à XmlDocument C#

c'est mon code C#

HtmlDocument doc = new HtmlDocument(); 
doc.Load(htmlLocation); 
     foreach (var item in doc.DocumentNode.Descendants("p")) 
     { 

     if (item.HasChildNodes) 
      { 
      foreach (var itm in item.Descendants("span").ToList()) 
       { 
        Console.WriteLine(itm.InnerText); 
       } 
      } 
     } 

Voici le code html

<html> 

<head> 
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 
<meta name=Generator content="Microsoft Word 12 (filtered)"> 

</head> 

<body lang=EN-US link="#0066CC" vlink=purple style='text-justify-trim:punctuation'> 

<div class=WordSection1> 

<p class=Heading61 style='margin-bottom:0in;margin-bottom:.0001pt;text-indent: 
.5in;line-height:normal;page-break-after:avoid;background:transparent'><span 
class=Heading6><span style='font-size:12.0pt;color:black;background:yellow'>Epilogue</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>&nbsp;</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child.</span></span></p> 

<p class=MsoBodyText style='text-indent:.5in;line-height:normal;background: 
transparent'><span class=BodytextItalic2><span style='font-size:12.0pt; 
color:black;font-style:normal'>Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day.</span></span></p> 

</div> 

</body> 

</html> 

c'est la sortie du code ci-dessus

Epilogue 
Epilogue 
&nbsp; 
&nbsp; 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they day. 

ce que je m'attends à ce que la seconde dépende de chaque élément. mais pourquoi répète-t-il le texte?

Répondre

1

Vous avez une étiquette de 4 p et chaque étiquette a deux plages. Descendants, obtient tous les nœuds descendants avec le nom correspondant afin que vos répétitions foreach internes pour deux travées

votre foreach intérieure pourrait être

foreach (var itm in item.ChildNodes) 
    { 
     Console.WriteLine(itm.InnerText); 
    }