2016-10-22 2 views
3

Je veux vérifier l'orthographe d'une phrase en python en utilisant NLTK. Le spell checker intégré ne fonctionne pas correctement. Il donne with et "et" comme une fausse orthographe.Le correcteur orthographique de NLTK ne fonctionne pas correctement

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
     for i in tokens(line): 
      strip = i.rstrip() 
      if not WN.synsets(strip): 
       print("Wrong spellings : " +i) 
      else: 
       print("No mistakes :" + i) 

def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 
noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct) 

Quelqu'un peut-il me donner la raison?

Répondre

3

Il est de donner orthographes mal car ce sont stopwords qui ne figurent pas dans WordNet (vérifier FAQs)

Ainsi, vous pouvez utiliser à la place de corpus NLTK mots vides pour vérifier ces mots.

#Add these lines: 
import nltk 
from nltk.corpus import wordnet as WN 
from nltk.corpus import stopwords 
stop_words_en = set(stopwords.words('english')) 

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
    for i in tokens(line): 
     strip = i.rstrip() 
     if not WN.synsets(strip): 
      if strip in stop_words_en: # <--- Check whether it's in stopword list 
       print("No mistakes :" + i) 
      else: 
       print("Wrong spellings : " +i) 
     else: 
      print("No mistakes :" + i) 


def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 

noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct)