Comment analyser les "nœuds" multiniveaux dans le texte?

J'ai un format de configuration similaire au format * .sln, prenez donc ce qui suit comme un exemple:Comment analyser les "nœuds" multiniveaux dans le texte?

DCOM Productions Configuration File, Format Version 1.0 

BeginSection:Global 
    GlobalKeyA = AnswerOne 

    .: Stores the global configuration key 
    :: for the application. This key is used 
    :: to save the current state of the app. 
    :: as well as prevent lockups 
    GlobalKey3 = AnswerTwo 

    .: Secondary Key. See above setting 
    GlobalKeyC = AnswerThree 

    BeginSection: UpdateSystem 
     NestedKeyA = One 
     NestedKeyB = Two 
     NestedKeyC = { A set of multiline data 
         where we will show how 
         to write a multiline 
         paragraph } 
     NestedKeyD = System.Int32, 100 
    EndSection 
EndSection 

BeginSection:Application 
    InstallPath = C:\Program Files\DCOM Productions\BitFlex 
EndSection

Je sais que je vais avoir besoin d'une fonction récursive probablement que prend un segment de texte en tant que paramètre ainsi, Par exemple, passez-y une section entière et analysez-la récursivement de cette façon. Je n'arrive pas à comprendre comment faire cela. Chaque section peut potentiellement avoir plus de sections enfants. C'est comme un document Xml .. Je ne demande pas vraiment de code ici, juste une méthodologie sur la façon de procéder pour analyser un document comme celui-ci.

Je pensais à utiliser les onglets (spécifie l'index) pour déterminer avec quelle section je travaille, mais cela échouerait si le document n'était pas correctement formaté. De meilleures pensées?

Source

2009-07-25 David Anderson - DCOM

Peut-être pouvez-vous dessiner un parallèle entre ce format et XML. dire beginsection < ==> "< ouverture>" EndSection < ==> "</fermeture>"

le considérer comme fichier XML avec de nombreux éléments racine. Ce qui est à l'intérieur de BeginSection et EndSection sera votre noeud xml interne avec par exemple NestedKeyA = comme nom de noeud et "One" comme valeur.

.: Semble être un commentaire, donc vous pouvez l'ignorer. System.Int32, 100 - peut être un attribut et une valeur d'un nœud

{Un ensemble de données multilignes où nous montrerons comment écrire un multiligne paragraphe} - vous pouvez venir avec l'algorithme pour analyser ça aussi.

Source

2009-07-25 01:10:47 Sorantis

Ouais, Begin et de EndSection sont essentiellement des noeuds commencent d'arrêt de fin, mais comment pourrais-je faire la différence entre ce qui appartient à EndSection qui beginsection? Je ne pouvais pas saisir le premier, car il pourrait s'agir de la fin d'un nœud imbriqué et non du premier analysé. –

Ecrivez un analyseur qui analyse un objet BeginSection et, s'il rencontre un objet BeginSection dans BeginSection, appelle lui-même le début de la nouvelle sous-section. Passer le résultat en tant que réf de hachage, qui peut être ajouté au hachage dans la fonction d'appel – Sorantis

Bon, merci pour la perspicacité. Je pense que je sais comment y aller maintenant et je suppose que je posterai si j'ai d'autres questions surgissent. Merci! –

Tout à fait, je l'ai fait. * *

ouf

/// <summary> 
/// Reads and parses xdf strings 
/// </summary> 
public sealed class XdfReader { 
    /// <summary> 
    /// Instantiates a new instance of the DCOMProductions.BitFlex.IO.XdfReader class. 
    /// </summary> 
    public XdfReader() { 
     // 
     // TODO: Any constructor code here 
     // 
    } 

    #region Constants 

    /// <devdoc> 
    /// This regular expression matches against a section beginning. A section may look like the following: 
    /// 
    ///  SectionName:Begin 
    ///  
    /// Where 'SectionName' is the name of the section, and ':Begin' represents that this is the 
    /// opening tag for the section. This allows the parser to differentiate between open and 
    /// close tags. 
    /// </devdoc> 
    private const String SectionBeginRegularExpression = @"[0-9a-zA-Z]*:Begin"; 

    /// <devdoc> 
    /// This regular expression matches against a section ending. A section may look like the following: 
    /// 
    ///  SectionName:End 
    ///  
    /// Where 'SectionName' is the name of the section, and ':End' represents that this is the 
    /// closing tag for the section. This allows the parser to differentiate between open and 
    /// close tags. 
    /// </devdoc> 
    private const String SectionEndRegularExpression = @"[0-9a-zA-Z]*:End"; 

    /// <devdoc> 
    /// This regular expression matches against a key and it's value. A key may look like the following: 
    /// 
    ///  KeyName=KeyValue 
    ///  KeyName = KeyValue 
    ///  KeyName =KeyValue 
    ///  KeyName= KeyValue 
    ///  KeyName =  KeyValue 
    ///     
    /// And so on so forth. This regular expression matches against all of these, where the whitespace 
    /// former and latter of the assignment operator are optional. 
    /// </devdoc> 
    private const String KeyRegularExpression = @"[0-9a-zA-Z]*\s*?=\s*?[^\r]*"; 

    #endregion 

    #region Methods 

    public void Flush() { 
     throw new System.NotImplementedException(); 
    } 

    private String GetSectionName(String xdf) { 
     Match sectionMatch = Regex.Match(xdf, SectionBeginRegularExpression); 

     if (sectionMatch.Success) { 
      String retVal = sectionMatch.Value; 
      retVal = retVal.Substring(0, retVal.IndexOf(':')); 
      return retVal; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section."); 
     } 
    } 

    public XdfFile ReadFile(String fileName) { 
     throw new System.NotImplementedException(); 
    } 

    public XdfKey ReadKey(String xdf) { 
     Match keyMatch = Regex.Match(xdf, KeyRegularExpression); 

     if (keyMatch.Success) { 
      String name = keyMatch.Value.Substring(0, keyMatch.Value.IndexOf('=')); 
      name = name.TrimEnd(' '); 

      XdfKey retVal = new XdfKey(name); 

      String value = keyMatch.Value.Remove(0, keyMatch.Value.IndexOf('=') + 1); 
      value = value.TrimStart(' '); 

      retVal.Value = value; 
      return retVal; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid key."); 
     } 
    } 

    public XdfSection ReadSection(String xdf) { 
     if (ValidateSection(xdf)) { 
      String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries); 
      XdfSection rootSection = new XdfSection(GetSectionName(rows[0])); System.Diagnostics.Debug.WriteLine(rootSection.Name); 

      do { 
       Match beginMatch = Regex.Match(xdf, SectionBeginRegularExpression); 
       beginMatch = beginMatch.NextMatch(); 

       if (beginMatch.Success) { 
        Match endMatch = Regex.Match(xdf, String.Format("{0}:End", GetSectionName(beginMatch.Value))); 

        if (endMatch.Success) { 
         String sectionXdf = xdf.Substring(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index); 
         xdf = xdf.Remove(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index); 

         XdfSection section = ReadSection(sectionXdf); System.Diagnostics.Debug.WriteLine(section.Name); 

         rootSection.Sections.Add(section); 
        } 
        else { 
         throw new BitFlex.IO.XdfException(String.Format("There is a missing section ending at index {0}.", endMatch.Index)); 
        } 
       } 
       else { 
        break; 
       } 
      } while (true); 

      MatchCollection keyMatches = Regex.Matches(xdf, KeyRegularExpression); 

      foreach (Match item in keyMatches) { 
       XdfKey key = ReadKey(item.Value); 
       rootSection.Keys.Add(key); 
      } 

      return rootSection; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section."); 
     } 
    } 

    private Boolean ValidateSection(String xdf) { 
     String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.None); 

     if (Regex.Match(rows[0], SectionBeginRegularExpression).Success) { 
      if (Regex.Match(rows[rows.Length - 1], SectionEndRegularExpression).Success) { 
       return true; 
      } 
      else { 
       return false; 
      } 
     } 
     else { 
      return false; 
     } 
    } 

    #endregion 
}

}

Source

2009-07-28 20:04:29

Comment analyser les "nœuds" multiniveaux dans le texte?

Répondre

Questions connexes