paquet RHmmde Markov cachés méthodes de modèles pour sélectionner le nombre optimal d'états
I ont un vecteur qui ma place dans un modèle de hmm dans un attemp pour sélectionner un nombre optimal d'états pour un modèle de Markov caché
x<-c(-0.0961421466,-0.0375458485,0.0681121271,0.0259201028,0.0016780785,0.0311860542,
0.0067940299,0.0126520055,0.0357599812,0.0007679569,0.0409759326,0.0560839083,-0.0272581160,-0.0439501404,0.0321578353,0.0196158110,-0.0097262133,-0.0226182376,0.0119897380,-0.0099522863,-0.0359443106,-0.0039363349,-0.0476283592,-0.0383203835,-0.0518624079,0.0187455678,0.0950535435,0.0057115192,-0.0307805051,-0.0272725295,-0.0254645538,-0.0102565781,-0.0267986024,-0.0482906267,-0.0256826510,-0.0414746754,-0.0470666997,0.0284912760,0.1021992517,0.0875572274,0.0064152031,0.0200731787,-0.0091688456,-0.0575608699,-0.0442028942,-0.0277449185,-0.0115369429,0.0084710328,0.0745290085,0.0159369842,-0.0784550401,-0.0934970644,-0.0978390888,0.0160188869,0.0275268626,-0.0552651617,0.0033928140,0.0468507896,0.0374087653,0.0521167410,-0.0177752833,-0.0592673076,0.0514406681,0.0847486437,0.0738066194,-0.0098354049,-0.0572274292,0.0478305465,0.0096885221,-0.0445535022,-0.0153455265,-0.0105375508,0.0100704249,-0.0035215994,0.0243363762,0.0504443519,0.0570023276,0.0395103033,-0.0612817210,-0.0557737453,-0.0273657697,-0.0220077940,0.0083501817,0.0275081574,0.0323161331,0.0385741087,0.0175820844-0.0410599399,-0.0071019642,0.0431060115,-0.0107360128,-0.0007280372,0.0360799385,-0.0061620858 0.0164458899 -0.0050461344 -0.0578381588 0.0097198169 0.0027277926 -0.0127642317,
-0.0037062560, -0.0045482803, 0.0367596953, 0.0021176710,-0.0319243533,-0.0194663776,0.00 91915981,0.0061495737,-0.0090424506,0.0127655251,0.0161735008,0.0193814765,-0.0208605478,-0.0598025722,0.0022554035,0.0473633792,0.0247213549,-0.0063206694,-0.0201626938,0.0207952819,0.0379032576,0.0151612333,0.0038692090,0.0111271847,0.0497851603,0.0273431360,-0.0172488883,-0.0038909126,0.0264670631,-0.0065249612,-0.0467169856,-0.0255090099,0.0082489658, 0.0352569415,0.0272149172,0.0074228928,-0.0040191315,-0.0170611558,-0.0309531801,-0.0327952044,-0.0239372287,-0.0212792531,-0.0132712774,0.0086866983,-0.0007553260,0.0107026497,0.0065106253,-0.0321813990,-0.0081734233,0.0296845524,0.0268925281,-0.0025994962,-0.0038915206, -0.0126335449,0.0040244308,0.0227324065,0.0114903822,-0.0031516422,0.0031563335,0.0137143092,0.0026222849,0.0035802606,0.0111382363,-0.0008037881, -0.0282458124, 0.0056121633, 0.0254201390,0.0033781147,-0.0166139097,-0.0124559340,0.0088520417,0.0072600174, -0.0050320069,-0.0114740312,-0.0066160556, -0.0042080799, -0.0205501042,0.0027078715, 0.0122158472,-0.0206261771,-0.0267682015,-0.0107602258,0.0088477499,0.0165057256, 0.0106637013,0.0115216769,0.0278296526,0.0026376283,-0.0231543960,-0.0141964203)
#partitions test/train
nhs <- c(2,3,4) #number of possible states
S<-runif(length (x))<= .66
train<-print(S)
# mean conditional density of log probability of seeing the partial sequence of obs
for(i in 1:length(nhs)){
pred <- vector("list", length(x))
for(fold in 1:length(x)){
fit <- HMMFit(x [which(train==TRUE)],dis="NORMAL",nStates=nhs[i],
asymptCov=FALSE)
pred[[fold]] <- forwardBackward(fit, x[which(train==FALSE)])
}
error[i] <- pred[[fold]]$LLH
}
nhs[which.max(error)] # Optimal number of hidden states (method max log-likehood)
Chaque fois que j'exécute le modèle en essayant d'obtenir le meilleur nombre d'états à utiliser du modèle markov caché, j'obtiens un nombre différent d'états car je pense que le modèle est entraîné sur de nouvelles valeurs sélectionnées. Cela n'arrive pas si je fais juste le modèle.
#score proportional to probability that a sequence is generated by a given model
nhs <- c(2,3,4)
for(i in 1:length(nhs)){
fit <- HMMFit(x, dis="NORMAL", nStates= nhs[i], asymptCov=FALSE)
VitPath = viterbi(fit, x)
error[i] <- fit[[3]]
}
error<-c(error)
error[is.na(error)] <- 10000
nhs[which.min(error)] # Optimal number of hidden states (method min AIC)
Cependant, les résultats sont très différents. Lequel est le meilleur, d'une part, j'ai un modèle où je peux tester sur de nouveaux échantillons. D'autre part, la seconde offre le meilleur ajustement sur les échantillons vus, mais les résultats sont très différents. Dans le cas du modèle si je répète le test étant donné que le jeu d'apprentissage/test change (aléatoirement), le nombre d'états qui en résulte change également. Dans ce cas, quel pourcentage échantillon/formation devrais-je utiliser pour être certain que ce choix fournira une généralisation dans le nombre d'états.
Quelles méthodes supplémentaires que je pourrais employer comme pour être en mesure de choisir un nombre optimal d'états
Un grand merci
recevrait un meilleur intérêt sur stats.stackexchange.com – RockScience
Merci J'ai aussi poster là. – Barnaby