2017-10-06 3 views
1

J'essaie d'utiliser mutate_if pour effectuer des calculs basés sur le nom de la variable. Par exemple, si les noms de variables contient « demo » calculer la moyenne, et si le nom contient « MEAS » calculer la médiane:
Utilisation de group_by avec mutate_if par le nom de colonne

library(tidyverse) 
library(stringr) 

exm_data <- data_frame(
    group = sample(letters[1:5], size = 50, replace = TRUE), 
    demo_age = rnorm(50), 
    demo_height = runif(50, min = 48, max = 80), 
    meas_score1 = rnorm(50), 
    meas_score2 = rnorm(50) 
) 
exm_data 
#> # A tibble: 50 x 5 
#> group demo_age demo_height meas_score1 meas_score2 
#> <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
#> 1  a -1.46539563 58.22435 -0.760692567 0.1077901 
#> 2  b 1.90983770 56.57976 0.262933462 -1.0186600 
#> 3  c 0.58502114 66.26322 2.283491647 0.3215542 
#> 4  b -0.97228337 74.82932 2.447551824 -0.4763201 
#> 5  a 0.65814161 72.19627 -0.592671739 -0.0521247 
#> 6  c -0.62133706 75.49976 0.005813255 -0.4195284 
#> 7  b 0.40650836 60.99083 0.809183477 -0.1127530 
#> 8  c -0.48251421 50.94077 -1.171749420 1.7268231 
#> 9  b 1.24476630 71.39803 1.786950340 0.7980217 
#> 10  c -0.09704469 69.52001 -0.511872217 -1.1465523 
#> # ... with 40 more rows 


exm_data %>% 
    mutate_if(str_detect(colnames(.), "demo"), mean) %>% 
    mutate_if(str_detect(colnames(.), "meas"), median) 
#> # A tibble: 50 x 5 
#> group demo_age demo_height meas_score1 meas_score2 
#> <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
#> 1  a -0.03250753 64.31412 -0.09909911 0.1307904 
#> 2  b -0.03250753 64.31412 -0.09909911 0.1307904 
#> 3  c -0.03250753 64.31412 -0.09909911 0.1307904 
#> 4  b -0.03250753 64.31412 -0.09909911 0.1307904 
#> 5  a -0.03250753 64.31412 -0.09909911 0.1307904 
#> 6  c -0.03250753 64.31412 -0.09909911 0.1307904 
#> 7  b -0.03250753 64.31412 -0.09909911 0.1307904 
#> 8  c -0.03250753 64.31412 -0.09909911 0.1307904 
#> 9  b -0.03250753 64.31412 -0.09909911 0.1307904 
#> 10  c -0.03250753 64.31412 -0.09909911 0.1307904 
#> # ... with 40 more rows 

Comme vous pouvez le voir, ce travail comme prévu. Cependant, je veux faire ces calculs par groupe, et quand j'ajoute la déclaration group_by il Pauses:

exm_data %>% 
    group_by(group) %>% 
    mutate_if(str_detect(colnames(.), "demo"), mean) %>% 
    mutate_if(str_detect(colnames(.), "meas"), median) 
#> Error: length(.p) == length(vars) is not TRUE 

Est-il possible d'utiliser mutate_if sur un Tibble groupé en utilisant les noms de colonnes?

Répondre

3

Vous pouvez utiliser mutate_at avec contains de dplyr comme suit,

library(dplyr) 

exm_data %>% 
    group_by(group) %>% 
    mutate_at(vars(contains('demo')), funs(mean)) %>% 
    mutate_at(vars(contains('meas')), funs(median)) 

qui donne,

# A tibble: 50 x 5 
# Groups: group [5] 
    group demo_age demo_height meas_score1 meas_score2 
    <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
1  d 0.12916082 60.26550 0.1932882 -0.5356818 
2  b -0.31142894 64.50839 0.3219514 -0.4777860 
3  b -0.31142894 64.50839 0.3219514 -0.4777860 
4  a -0.34373403 64.84180 0.1929516 -0.3821047 
5  a -0.34373403 64.84180 0.1929516 -0.3821047 
6  b -0.31142894 64.50839 0.3219514 -0.4777860 
7  d 0.12916082 60.26550 0.1932882 -0.5356818 
8  a -0.34373403 64.84180 0.1929516 -0.3821047 
9  d 0.12916082 60.26550 0.1932882 -0.5356818 
10  c -0.05963747 59.07845 -0.2395409 -0.4484245 

BONUS Vous n'avez pas besoin de charger stringr