2017-07-24 3 views
1

J'ai une trame de données commemarquage POS pour chaque enregistrement R

Task Response 

1 NA 
2 NA 
3 EFFICACY 
4 I was sent to external vendor for solution (PDA parts), but at PDA parts they identified within few minites that new battery would not solve the issue. I wonder why this diagnosis part could no have been done at the locla IS service in the Amgen office. Now I spent time to visit PDA parts at their place, while this finally did not bring any solution. 
5 Issue could not be resolved 

Lorsque les 2 colonnes sont des tâches et des réponses. Et la réponse a certaines valeurs NA.

Maintenant, je cherche à créer le marquage de point de vente pour chaque enregistrement et extraire uniquement les substantifs

Où les 5 dossiers de marquage de point de vente créé devrait être comme -

Task POSTagged 
1  NA/NNP 

2  NA/NNP 
3  EFFICACY/NNP 
4  vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN 
5  Issue/NN 

Il doit donc être la matrice de 2 colonnes et 5 enregistrements

Je suis en train d'utiliser la fonction

tagPOS = function(x) { 
    s <- as.String(x) 

    sent_token_annotator = Maxent_Sent_Token_Annotator() 
    word_token_annotator = Maxent_Word_Token_Annotator() 
    a2 = annotate(s, list(sent_token_annotator, word_token_annotator)) 
    pos_tag_annotator = Maxent_POS_Tag_Annotator() 
    a3 = annotate(s, pos_tag_annotator, a2) 
    a3w = subset(a3, type == "word") 
    POStags = unlist(lapply(a3w$features, `[[`, "POS")) 
    gc() 
    return(paste(POStags,collapse = " ")) 
} 

J'ai essayé lapply, avec, de faire une boucle sur les enregistrements mais tous donnent le POStagged combiné pour tous les 5 enregistrements contre chaque enregistrement.

I.e. pour chaque disque que je reçois le POStagged comme

NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 

Qu'est-ce que je reçois est

Task Response 
1 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 
2 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 

3 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 

4 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 

5 NA/NNP NA/NNP EFFICACY/NNP vendor/NN solution/NN PDA/NN parts/NNS PDA/NNP parts/NNS minites/NNS battery/NN issue/NN diagnosis/NN part/NN locla/NN service/NN Amgen/NNP office/NN time/NN PDA/NNP parts/NNS place/NN solution/NN Issue/NN 

Ce qui est pas ce que je cherche. codes ont essayé

lapply(df2$Task, tagPOS (df2$Response), data = df2) 
resultset <- group_by(df2, Task) %>% do(tagPOS (df2$Response)) 
df2[,c("Keywords"):= tagPOS(strip(df2$Response)),by = Task] 
Responsedf<-lapply(Response, extractPOS, "NN") 
df2$noun <- with(df2, extractPOS(df2$Response, "NN")) 

Mais rien ne fonctionnait jusqu'à présent espoir i avait un sens.

Toute suggestion serait appréciée

Répondre

0

trouvé la solution -

for (i in 0:nrow(df2)) { 
    df2$noun[i]<-lapply(df2$short_description[i], extractPOS, "NN") 
    gc() 
} 

Merci.