2017-06-28 1 views
0

J'ai consulté this mais je n'arrive pas à le faire fonctionner.Comment faire Chi-Sqr Test sur un cadre/dataframe en R

Voici mon Tibble:

Source: local data frame [10 x 4] 
Groups: word [10] 

     word Detractor Passive Promoter 
*  <chr>  <int> <int> <int> 
1 broadband  833  766  507 
2  call  441  348  118 
3 cheaper  641  949  182 
4 customer  1563 1128  758 
5 internet  297  277  195 
6  line  389  392  182 
7  price  1022 1212  549 
8 reliable  230  316  743 
9 service  1546 1231  2119 
10  speed  262  228  194 

Voici ce que j'ai essayé:

csv%>% 
     select(word,NPS_Level,total_word_count_by_cust)%>% 
     spread(NPS_Level,total_word_count_by_cust)%>% 
     rowwise()%>% 
    mutate(
    test_stat = chisq.test(c(word, Detractor))$statistic, 
    p_val = chisq.test(c(word, Detractor))$p.value 
    ) 

obtenir l'erreur suivante:

Error in mutate_impl(.data, dots) : invalid 'type' (character) of argument 

est ici la sortie dput():

structure(list(word = c("broadband", "call", "cheaper", "customer", 
"internet", "line", "price", "reliable", "service", "speed"), 
    Detractor = c(833L, 441L, 641L, 1563L, 297L, 389L, 1022L, 
    230L, 1546L, 262L), Passive = c(766L, 348L, 949L, 1128L, 
    277L, 392L, 1212L, 316L, 1231L, 228L), Promoter = c(507L, 
    118L, 182L, 758L, 195L, 182L, 549L, 743L, 2119L, 194L)), .Names = c("word", 
"Detractor", "Passive", "Promoter"), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, 10L), vars = list(
    word), drop = TRUE, indices = list(0L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L), group_sizes = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
    word = c("broadband", "call", "cheaper", "customer", "internet", 
    "line", "price", "reliable", "service", "speed")), class = "data.frame", row.names = c(NA, 
-10L), vars = list(word), drop = TRUE, .Names = "word")) 

Y a-t-il un moyen de montrer les résultats de toutes les variables comme le fait Minitab? dire

Chi-Square Test for Association: word, Worksheet columns 

Rows: word Columns: Worksheet columns 

      Detractor Passive Promoter All 

broadband   833  766  507 2106 
       775.5 735.0  595.5 
       4.263 1.305 13.145 

call    441  348  118 907 
       334.0 316.6  256.5 
       34.288 3.123 74.749 

cheaper   641  949  182 1772 
       652.5 618.5  501.0 
       0.203 176.664 203.145 

customer   1563  1128  758 3449 
       1270.0 1203.8  975.2 
       67.579 4.768 48.378 

internet   297  277  195 769 
       283.2 268.4  217.4 
       0.675 0.276  2.315 

line    389  392  182 963 
       354.6 336.1  272.3 
       3.335 9.296 29.939 

price    1022  1212  549 2783 
       1024.8 971.3  786.9 
       0.008 59.642 71.921 

reliable   230  316  743 1289 
       474.7 449.9  364.5 
       126.103 39.842 393.147 

service   1546  1231  2119 4896 
       1802.9 1708.8 1384.3 
       36.598 133.590 389.870 

speed    262  228  194 684 
       251.9 238.7  193.4 
       0.407 0.482  0.002 

All    7224  6847  5547 19618 

Cell Contents:  Count 
        Expected count 
        Contribution to Chi-square 


Pearson Chi-Square = 1929.058, DF = 18, P-Value = 0.000 
Likelihood Ratio Chi-Square = 1898.013, DF = 18, P-Value = 0.000 
+0

S'il vous plaît donner un exemple reproductible. – Jimbou

+0

@Jimbou a ajouté la sortie 'dput()' – Shery

+0

'rownames (dat) = dat $ word; chisq.test (dat [-1]) 'donne les comptes/comptes attendus. – user20650

Répondre

2

grâce aux commentaires ci-dessous, voici comment je l'ai fait:

Pour faire le test Chi-Squared nous avons besoin de lignes et de colonnes de chiffres. Ensuite, convertir la colonne lignes dans rownames (Tibble ne permettra pas cette sorte devra convertir en dataframe avant d'appeler la méthode chi carré)

m = csv%>% 
     select(word,NPS_Level,total_word_count_by_cust)%>% 
     spread(NPS_Level,total_word_count_by_cust) 

rownames(m) = m$word 

m <- m%>% 
     as.data.frame() 

Xsq <- chisq.test(m[-1]) #exclude the row column i.e. word as already converted to rownames