Création du programme de tokenisation

J'ai besoin d'aide. Je dois écrire un programme de tokenisation. Je charge un fichier texte et le divise en jetons, mais j'ai aussi besoin d'afficher la position initiale et finale des mots et la longueur du mot (à partir du fichier texte). Je vous serai très reconnaissant pour toute aide. J'ai essayé de le faire pour les 3 derniers jours sans chance, voici ce que je l'ai fait:Création du programme de tokenisation

import java.util.StringTokenizer; 
import java.io.*; 

public class Tokenizer1 { 

public static void main(String[] args) throws FileNotFoundException, IOException { 
    BufferedReader br = new BufferedReader(new FileReader("C://text.txt")); 
    FileWriter fw=new FileWriter("C://result.txt"); 
    PrintWriter pw=new PrintWriter(fw); 
    StringTokenizer st = new StringTokenizer(br.readLine()," "); 
    while (st.hasMoreTokens()) { 
     System.out.println(st.nextToken()); 
    } 
    String[] tokens = "".split(","); 
    int tokenStartIndex = 0; 
    for (String token : tokens) { 
     for (String token : str.split(", ")) { 
      System.out.println("token: " + token + ", tokenStartIndex: " + tokenStartIndex); 
      tokenStartIndex += token.length() + 1; 
     } 
    } 
}

Source

2016-10-16 Lana

Quelle est votre véritable question ou problème? –

Essayez celui-ci si vous n'avez pas besoin de traiter la ligne de fichiers en ligne:

public static void main(String[] args) throws FileNotFoundException, IOException { 
    FileInputStream fis = new FileInputStream("C:/text.txt"); 
    StringBuilder sb = new StringBuilder(); 

    int c; 
    while((c = fis.read()) != -1) { 
     sb.append((char)c); 
    } 
    fis.close(); 

    System.out.println(sb.toString()); 
    System.out.println("---------------------"); 

    int start = 0; 

    // OPTION 1: using String.split method 
    String[] tokens = sb.toString().split("[\\s,]+"); 
    for(String t : tokens) { 
     System.out.println("START: " + start + "\tLENGTH: " + t.length() + "\tWORD: " + t); 
     start += t.length(); 
    } 

    start = 0; 

    // OPTION 2: using StringTokenizer class 
    StringTokenizer st = new StringTokenizer(sb.toString(), ",\t\n\f\r"); 
    while(st.hasMoreTokens()) { 
     String next = st.nextToken(); 
     System.out.println("START: " + start + "\tLENGTH: " + next.length() + "\tWORD: " + next); 
     start += next.length(); 
    } 
}

Si vous avez besoin de traiter la ligne de fichiers en ligne, vous voudrez peut-être essayer celui-ci:

public static void main(String[] args) throws FileNotFoundException, IOException { 
    BufferedReader br = new BufferedReader(new FileReader("C:/text.txt")); 

    StringBuilder sb = new StringBuilder(); 
    String line; 
    int lineNumber = -1; 
    while ((line = br.readLine()) != null) { 
     ++lineNumber; 
     sb.append(line); 
     System.out.println("\nLINE: " + lineNumber); 
     int elementPosition = 0; 

     // OPTION 1: using String.split method 
     /*String[] lineContents = line.split("[\\s,]+"); 
     for (String content : lineContents) { 
      System.out.println("\tSTART: " + elementPosition + "\tLENGTH: " + content.length() + "\tWORD: " + content); 
      elementPosition += content.length(); 
     }*/ 

     // OPTION 2: using StringTokenizer class 
     StringTokenizer st = new StringTokenizer(sb.toString(), ",\t\n\f\r"); 
     while(st.hasMoreTokens()) { 
      String next = st.nextToken(); 
      System.out.println("\tSTART: " + elementPosition + "\tLENGTH: " + next.length() + "\tWORD: " + next); 
      elementPosition += next.length(); 
     } 
    } 
    br.close(); 
}

J'espère que cela aide.

Source

2016-10-16 11:01:26

Oui, ça marche bien, merci beaucoup. Mais est-il possible d'effectuer la segmentation à partir du fichier en utilisant la classe StringTokenizer? – Lana

C'est possible, cependant StringTokenizer est une classe héritée: StringTokenizer est une classe héritée qui est conservée pour des raisons de compatibilité bien que son utilisation soit déconseillée dans le nouveau code. Il est recommandé que toute personne recherchant cette fonctionnalité utilise à la place la méthode split de String ou le package java.util.regex. http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html –

J'ai également modifié la réponse pour inclure les solutions StringTokenizer. –

Création du programme de tokenisation

Répondre

Questions connexes