I ont une structure de données qui comporte les sous séquences (groupes de lignes) et la condition d'identifier ces sous séquences est à surveiller une augmentation du différentiel de colonne. C'est ce que les données ressemble à:Ajouter identificateur de groupe conditionnel en utilisant des fonctions de cumul
> dput(test)
structure(list(vid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "2a38ebc2-dd97-43c8-9726-59c247854df5", class = "factor"),
events = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L,
2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L), .Label = c("click",
"mousedown", "mousemove", "mouseup"), class = "factor"),
deltas = structure(6:25, .Label = c("154875", "154878", "154880",
"155866", "155870", "38479", "38488", "38492", "38775", "45595",
"45602", "45606", "45987", "50280", "50285", "50288", "50646",
"54995", "55001", "55005", "55317", "59528", "59533", "59537",
"59921", "63392", "63403", "63408", "63822", "66706", "66710",
"66716", "67002", "73750", "73755", "73759", "74158", "77999",
"78003", "78006", "78076", "81360", "81367", "81371", "82381",
"93365", "93370", "93374", "93872"), class = "factor"),
serial = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20), diff = c(0, 9, 4, 283, 6820, 7, 4, 381, 4293, 5, 3, 358, 4349, 6, 4,
312, 4211, 5, 4, 384)),
.Names = c("vid", "events", "deltas", "serial", "diff"),
row.names = c(NA, 20L), class = "data.frame")
Je suis en train d'ajouter une colonne qui indique quand une nouvelle séquence secondaire est identifiée et affecter toute la séquence sous un identifiant unique. Je vais démontrer le critère de regroupement avec l'exemple suivant:
La valeur de diff de la ligne 5 est 6829, qui est 10 fois supérieure à la valeur maximale jusqu'à ce que cette ligne (283). Le résultat devrait être quelque chose comme ça df:
structure(list(vid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = "2a38ebc2-dd97-43c8-9726-59c247854df5", class = "factor"),
events = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L,
2L, 4L, 1L, 3L, 2L, 4L, 1L, 3L, 2L, 4L, 1L), .Label = c("click",
"mousedown", "mousemove", "mouseup"), class = "factor"),
deltas = structure(6:25, .Label = c("154875", "154878", "154880",
"155866", "155870", "38479", "38488", "38492", "38775", "45595",
"45602", "45606", "45987", "50280", "50285", "50288", "50646",
"54995", "55001", "55005", "55317", "59528", "59533", "59537",
"59921", "63392", "63403", "63408", "63822", "66706", "66710",
"66716", "67002", "73750", "73755", "73759", "74158", "77999",
"78003", "78006", "78076", "81360", "81367", "81371", "82381",
"93365", "93370", "93374", "93872"), class = "factor"), serial = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20),
diff = c(0, 9, 4, 283, 6820, 7, 4, 381, 4293, 5,
3, 358, 4349, 6, 4, 312, 4211, 5, 4, 384),
group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5)),
.Names = c("vid", "events", "deltas", "serial", "diff", "group"),
row.names = c(NA, 20L), class = "data.frame")
Toute aide grandement appréciée
Que diriez-vous de 'df $ group <- cumsum (df $ diff> 500) + 1' (tout ce que vous spécifiez). – Gopala
Ça marche! mais je ne comprends pas pourquoi :-) le cumsum ne fait que grossir alors que R suit la ligne df?! Je ne vois pas comment cela fonctionne, mais –