2013-03-07 1 views
2

Voici un extrait de fichier XML J'utilise donné:R XML: Comment récupérer un noeud avec une valeur

<page> 
    <title>AccessibleComputing</title> 
    <ns>0</ns> 
    <id>10</id> 
    <redirect title="Computer accessibility" /> 
    <revision> 
    <id>381202555</id> 
    <parentid>381200179</parentid> 
    <timestamp>2010-08-26T22:38:36Z</timestamp> 
    <contributor> 
     <username>OlEnglish</username> 
     <id>7181920</id> 
    </contributor> 
    <minor /> 
    <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment> 
    <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text> 
    <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1> 
    <model>wikitext</model> 
    <format>text/x-wiki</format> 
    </revision> 
</page> 
<page> 
    <title>AfghanistanGeography</title> 
    <ns>0</ns> 
    <id>14</id> 
    <redirect title="Geography of Afghanistan" /> 
    <revision> 
    <id>407008307</id> 
    <parentid>74466619</parentid> 
    <timestamp>2011-01-10T03:56:19Z</timestamp> 
    <contributor> 
     <username>Graham87</username> 
     <id>194203</id> 
    </contributor> 
    <minor /> 
    <comment>1 revision from [[:nost:AfghanistanGeography]]: import old edit, see [[User:Graham87/Import]]</comment> 
    <text xml:space="preserve">#REDIRECT [[Geography of Afghanistan]] {{R from CamelCase}}</text> 
    <sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1> 
    <model>wikitext</model> 
    <format>text/x-wiki</format> 
    </revision> 
</page> 
<page> 
    <title>AfghanistanPeople</title> 
    <ns>0</ns> 
    <id>15</id> 
    <redirect title="Demography of Afghanistan" /> 
    <revision> 
    <id>135089040</id> 
    <parentid>74466558</parentid> 
    <timestamp>2007-06-01T13:59:37Z</timestamp> 
    <contributor> 
     <username>RussBot</username> 
     <id>279219</id> 
    </contributor> 
    <minor /> 
    <comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of Afghanistan&quot; +&quot;Demography of Afghanistan&quot;</comment> 
    <text xml:space="preserve">#REDIRECT [[Demography of Afghanistan]] {{R from CamelCase}}</text> 
    <sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1> 
    <model>wikitext</model> 
    <format>text/x-wiki</format> 
    </revision> 
</page> 

Maintenant, compte tenu de la valeur « AccessibleComputing » comment puis-je récupérer les XMLInternalElementNode (ce qui correspond à 'AccessibleComputing'? J'ai essayé d'utiliser getNodeSet sans succès.

Merci.

question Mise à jour

Je devrais mentionner le fichier sample.xml entier en premier lieu. Voici c'est. Le problème que je suis confronté suit:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/ http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en"> 
    <siteinfo> 
    <sitename>Wikipedia</sitename> 
    <base>http://en.wikipedia.org/wiki/Main_Page</base> 
    <generator>MediaWiki 1.21wmf8</generator> 
    <case>first-letter</case> 
    <namespaces> 
     <namespace key="-2" case="first-letter">Media</namespace> 
     <namespace key="-1" case="first-letter">Special</namespace> 
     <namespace key="0" case="first-letter" /> 
     <namespace key="1" case="first-letter">Talk</namespace> 
     <namespace key="2" case="first-letter">User</namespace> 
     <namespace key="3" case="first-letter">User talk</namespace> 
     <namespace key="4" case="first-letter">Wikipedia</namespace> 
     <namespace key="5" case="first-letter">Wikipedia talk</namespace> 
     <namespace key="6" case="first-letter">File</namespace> 
     <namespace key="7" case="first-letter">File talk</namespace> 
     <namespace key="8" case="first-letter">MediaWiki</namespace> 
     <namespace key="9" case="first-letter">MediaWiki talk</namespace> 
     <namespace key="10" case="first-letter">Template</namespace> 
     <namespace key="11" case="first-letter">Template talk</namespace> 
     <namespace key="12" case="first-letter">Help</namespace> 
     <namespace key="13" case="first-letter">Help talk</namespace> 
     <namespace key="14" case="first-letter">Category</namespace> 
     <namespace key="15" case="first-letter">Category talk</namespace> 
     <namespace key="100" case="first-letter">Portal</namespace> 
     <namespace key="101" case="first-letter">Portal talk</namespace> 
     <namespace key="108" case="first-letter">Book</namespace> 
     <namespace key="109" case="first-letter">Book talk</namespace> 
     <namespace key="446" case="first-letter">Education Program</namespace> 
     <namespace key="447" case="first-letter">Education Program talk</namespace> 
     <namespace key="710" case="first-letter">TimedText</namespace> 
     <namespace key="711" case="first-letter">TimedText talk</namespace> 
    </namespaces> 
    </siteinfo> 
    <page> 
    <title>AccessibleComputing</title> 
    <ns>0</ns> 
    <id>10</id> 
    <redirect title="Computer accessibility" /> 
    <revision> 
     <id>381202555</id> 
     <parentid>381200179</parentid> 
     <timestamp>2010-08-26T22:38:36Z</timestamp> 
     <contributor> 
     <username>OlEnglish</username> 
     <id>7181920</id> 
     </contributor> 
     <minor /> 
     <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment> 
     <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text> 
     <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1> 
     <model>wikitext</model> 
     <format>text/x-wiki</format> 
    </revision> 
    </page> 
    <page> 
    <title>History</title> 
    <ns>0</ns> 
    <id>13</id> 
    <redirect title="History of " /> 
    <revision> 
     <id>74466652</id> 
     <parentid>15898948</parentid> 
     <timestamp>2006-09-08T04:15:52Z</timestamp> 
     <contributor> 
     <username>Rory096</username> 
     <id>750223</id> 
     </contributor> 
     <comment>cat rd</comment> 
     <text xml:space="preserve">#REDIRECT [[History of ]] {{R from CamelCase}}</text> 
     <sha1>d4tdz2eojqzamnuockahzcbrgd1t9oi</sha1> 
     <model>wikitext</model> 
     <format>text/x-wiki</format> 
    </revision> 
    </page> 
    <page> 
    <title>Geography</title> 
    <ns>0</ns> 
    <id>14</id> 
    <redirect title="Geography of " /> 
    <revision> 
     <id>407008307</id> 
     <parentid>74466619</parentid> 
     <timestamp>2011-01-10T03:56:19Z</timestamp> 
     <contributor> 
     <username>Graham87</username> 
     <id>194203</id> 
     </contributor> 
     <minor /> 
     <comment>1 revision from [[:nost:Geography]]: import old edit, see [[User:Graham87/Import]]</comment> 
     <text xml:space="preserve">#REDIRECT [[Geography of ]] {{R from CamelCase}}</text> 
     <sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1> 
     <model>wikitext</model> 
     <format>text/x-wiki</format> 
    </revision> 
    </page> 
    <page> 
    <title>People</title> 
    <ns>0</ns> 
    <id>15</id> 
    <redirect title="Demography of " /> 
    <revision> 
     <id>135089040</id> 
     <parentid>74466558</parentid> 
     <timestamp>2007-06-01T13:59:37Z</timestamp> 
     <contributor> 
     <username>RussBot</username> 
     <id>279219</id> 
     </contributor> 
     <minor /> 
     <comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of &quot; +&quot;Demography of &quot;</comment> 
     <text xml:space="preserve">#REDIRECT [[Demography of ]] {{R from CamelCase}}</text> 
     <sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1> 
     <model>wikitext</model> 
     <format>text/x-wiki</format> 
    </revision> 
    </page> 
</mediawiki> 

Comment j'obtiens le noeud de page qui a la valeur d'élément de titre en tant que «AccessibleComputing». J'ai essayé les éléments suivants:

doc = xmlTreeParse('sample.xml',useInternalNodes=TRUE) 
getNodeSet(doc, "//page[title=\"AccessibleComputing\"]") 

il est revenu

list() 
attr(,"class") 
[1] "XMLNodeSet" 

Sortie prévue:

[[1]] 
<page> 
    <title>AccessibleComputing</title> 
    <ns>0</ns> 
    <id>10</id> 
    <redirect title="Computer accessibility"/> 
    <revision> 
    <id>381202555</id> 
    <parentid>381200179</parentid> 
    <timestamp>2010-08-26T22:38:36Z</timestamp> 
    <contributor> 
     <username>OlEnglish</username> 
     <id>7181920</id> 
    </contributor> 
    <minor/> 
    <comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment> 
    <text xml:space="preserve">#REDIRECT [[Computer accessibility]] {{R from CamelCase}} </text> 
    <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1> 
    <model>wikitext</model> 
    <format>text/x-wiki</format> 
    </revision> 
</page> 

attr(,"class") 
[1] "XMLNodeSet" 

Je suppose que j'ai ma requête XPath incorrecte - la seule fois apparaissant 'siteinfo' breaks de noeuds ce que j'ai essayé. Aucune suggestion.

+0

Vous devriez aussi poster ce que vous avez essayé. et quelle erreur vous obtenez pour rendre la question plus complète. –

+0

Pouvez-vous également poster quel est le résultat attendu? – agstudy

+0

@agstudy J'ai mis à jour la question avec l'information que vous avez demandée. –

Répondre

2

Pour vous analyser le fichier ajouter une nouvelle balise

<pages> 
.... 
</pages> 

Ensuite, en utilisant xpathSApply, je peux récupérer tous, tous les éléments du titre:

library(XML) 
doc = xmlTreeParse('c:/temp/testxml.xml',useInternalNodes=TRUE) 
xpathSApply(doc,'//page/title',xmlValue) 
"AccessibleComputing" "AfghanistanGeography" "AfghanistanPeople" 

vous pouvez aussi getNodeSet:

getNodeSet(doc,'//page/title') 
[[1]] 
<title>AccessibleComputing</title> 

[[2]] 
<title>AfghanistanGeography</title> 

[[3]] 
<title>AfghanistanPeople</title> 
+0

Je pense que OP veut obtenir le noeud 'page' qui a la valeur de l'élément de titre comme AccessibleComputing', bien que ce ne soit pas clair à partir de la question. Obtenir simplement l'élément 'title' qui contient la valeur que nous voulons ne semble pas être très utile. –

+0

Comme @geektrader l'a fait remarquer, je veux obtenir un nœud de page qui a une valeur d'élément title accessibleComputing. Aucune suggestion? –

+0

+1 Cela a été très utile pour un problème que j'ai eu. Souhaitez-vous recommander un bon tutoriel pour cela? – Shambho

0

Si vous cherchez une page qui possède la valeur du titre AccessibleComputing alors vous devriez utiliser getNodeSet(doc,'//page[title="AccessibleComputing"]')

Si vous voulez obtenir un nœud qui a noeud enfant immédiat titre appelé dont la valeur est AccessibleComputing alors vous devriez utiliser getNodeSet(doc,'//node()[title="AccessibleComputing"]')

library(XML) 

xml <- "<pages><page>\n<title>AccessibleComputing</title>\n<ns>0</ns>\n<id>10</id>\n<redirect title=\"Computer accessibility\" />\n<revision>\n<id>381202555</id>\n<parentid>381200179</parentid>\n<timestamp>2010-08-26T22:38:36Z</timestamp>\n<contributor>\n<username>OlEnglish</username>\n<id>7181920</id>\n</contributor>\n<minor />\n<comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>\"\n<sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page>\n<page>\n<title>AfghanistanGeography</title>\n<ns>0</ns>\n<id>14</id>\n<redirect title=\"Geography of Afghanistan\" />\n<revision>\n<id>407008307</id>\n<parentid>74466619</parentid>\n<timestamp>2011-01-10T03:56:19Z</timestamp>\n<contributor>\n<username>Graham87</username>\n<id>194203</id>\n</contributor>\n<minor />\n<comment>1 revision from [[:nost:AfghanistanGeography]]: import old edit, see [[User:Graham87/Import]]</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Geography of Afghanistan]] {{R from CamelCase}}</text>\"\n<sha1>0uwuuhiam59ufbu0uzt9lookwtx9f4r</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page>\n<page>\n<title>AfghanistanPeople</title>\n<ns>0</ns>\n<id>15</id>\n<redirect title=\"Demography of Afghanistan\" />\n<revision>\n<id>135089040</id>\n<parentid>74466558</parentid>\n<timestamp>2007-06-01T13:59:37Z</timestamp>\n<contributor>\n<username>RussBot</username>\n<id>279219</id>\n</contributor>\n<minor />\n<comment>Robot: Fixing [[Special:DoubleRedirects|double-redirect]] -&quot;Demographics of Afghanistan&quot; +&quot;Demography of Afghanistan&quot;</comment>\n<text xml:space=\"preserve\"> %InLiNe_IdEnTiFiEr% \"#REDIRECT [[Demography of Afghanistan]] {{R from CamelCase}}</text>\"\n<sha1>744dgrl7ef5p53yffn2a989ly1dyr8f</sha1>\n<model>wikitext</model>\n<format>text/x-wiki</format>\n</revision>\n</page></pages>" 


doc = xmlTreeParse(xml, useInternalNodes = TRUE) 


# If you want to get page which has immediate child node called title whose 
# value is 'AccessibleComputing' 
getNodeSet(doc, "//page[title=\"AccessibleComputing\"]") 
## [[1]] 
## <page> 
## <title>AccessibleComputing</title> 
## <ns>0</ns> 
## <id>10</id> 
## <redirect title="Computer accessibility"/> 
## <revision><id>381202555</id><parentid>381200179</parentid><timestamp>2010-08-26T22:38:36Z</timestamp><contributor><username>OlEnglish</username><id>7181920</id></contributor><minor/><comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment><text xml:space="preserve"> %InLiNe_IdEnTiFiEr% "#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>" 
## <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1><model>wikitext</model><format>text/x-wiki</format></revision> 
## </page> 
## 
## attr(,"class") 
## [1] "XMLNodeSet" 



# If you want to get any node which has immediate child node called title whose 
# value is 'AccessibleComputing' 
getNodeSet(doc, "//node()[title=\"AccessibleComputing\"]") 
## [[1]] 
## <page> 
## <title>AccessibleComputing</title> 
## <ns>0</ns> 
## <id>10</id> 
## <redirect title="Computer accessibility"/> 
## <revision><id>381202555</id><parentid>381200179</parentid><timestamp>2010-08-26T22:38:36Z</timestamp><contributor><username>OlEnglish</username><id>7181920</id></contributor><minor/><comment>[[Help:Reverting|Reverted]] edits by [[Special:Contributions/76.28.186.133|76.28.186.133]] ([[User talk:76.28.186.133|talk]]) to last version by Gurch</comment><text xml:space="preserve"> %InLiNe_IdEnTiFiEr% "#REDIRECT [[Computer accessibility]] {{R from CamelCase}}</text>" 
## <sha1>lo15ponaybcg2sf49sstw9gdjmdetnk</sha1><model>wikitext</model><format>text/x-wiki</format></revision> 
## </page> 
## 
## attr(,"class") 
## [1] "XMLNodeSet" 
+0

Merci @geektrader. Le schéma actuel du fichier XML que j'ai est le suivant: ... ... ... ... ... En raison de l'intermédiaire (mais un seul) 'siteinfo' le modèle "// page [title = \" AccessibleComputing \ "]" ne fonctionne pas. Aucune suggestion? Désolé de vous déranger. –

+0

Modifier la question. Ne mettez pas autant de code dans le commentaire –

+0

J'ai mis à jour la question avec le sample.xml entier et le problème auquel je suis confronté. Aucune suggestion? Je suis nouveau XPath. –

Questions connexes