tousLe dégradé disparaît lors de l'utilisation de la normalisation par lots dans caffe
Je rencontre des problèmes lorsque j'utilise la normalisation par lots dans Caffe. Voici le code que j'ai utilisé dans train_val.prototxt.
layer {
name: "conv1"
type: "Convolution"
bottom: "conv0"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.0589
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
batch_norm_param {
use_global_stats: false
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
bias_term: true
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv16"
type: "Convolution"
bottom: "conv1"
top: "conv16"
param {
lr_mult: 1
decay_mult: 1
}
Cependant, la formation n'est pas convergente. En supprimant les couches BN (batchnorm + scale), l'entraînement peut converger. J'ai donc commencé à comparer les fichiers journaux avec ou sans les couches BN. Voici les fichiers journaux avec DEBUG_INFO = true:
BN:
I0804 10:22:42.074671 8318 net.cpp:638] [Forward] Layer loadtestdata, top blob data data: 0.368457
I0804 10:22:42.074757 8318 net.cpp:638] [Forward] Layer loadtestdata, top blob label data: 0.514496
I0804 10:22:42.076117 8318 net.cpp:638] [Forward] Layer conv0, top blob conv0 data: 0.115678
I0804 10:22:42.076200 8318 net.cpp:650] [Forward] Layer conv0, param blob 0 data: 0.0455077
I0804 10:22:42.076273 8318 net.cpp:650] [Forward] Layer conv0, param blob 1 data: 0
I0804 10:22:42.076539 8318 net.cpp:638] [Forward] Layer relu0, top blob conv0 data: 0.0446758
I0804 10:22:42.078435 8318 net.cpp:638] [Forward] Layer conv1, top blob conv1 data: 0.0675479
I0804 10:22:42.078516 8318 net.cpp:650] [Forward] Layer conv1, param blob 0 data: 0.0470226
I0804 10:22:42.078589 8318 net.cpp:650] [Forward] Layer conv1, param blob 1 data: 0
I0804 10:22:42.079108 8318 net.cpp:638] [Forward] Layer bnorm1, top blob conv1 data: 0
I0804 10:22:42.079197 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 0 data: 0
I0804 10:22:42.079270 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 1 data: 0
I0804 10:22:42.079350 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 2 data: 0
I0804 10:22:42.079421 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 3 data: 0
I0804 10:22:42.079505 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 4 data: 0
I0804 10:22:42.080267 8318 net.cpp:638] [Forward] Layer scale1, top blob conv1 data: 0
I0804 10:22:42.080345 8318 net.cpp:650] [Forward] Layer scale1, param blob 0 data: 1
I0804 10:22:42.080418 8318 net.cpp:650] [Forward] Layer scale1, param blob 1 data: 0
I0804 10:22:42.080651 8318 net.cpp:638] [Forward] Layer relu1, top blob conv1 data: 0
I0804 10:22:42.082074 8318 net.cpp:638] [Forward] Layer conv16, top blob conv16 data: 0
I0804 10:22:42.082154 8318 net.cpp:650] [Forward] Layer conv16, param blob 0 data: 0.0485365
I0804 10:22:42.082226 8318 net.cpp:650] [Forward] Layer conv16, param blob 1 data: 0
I0804 10:22:42.082675 8318 net.cpp:638] [Forward] Layer loss, top blob loss data: 42.0327
Sans BN:
I0803 17:01:29.700850 30274 net.cpp:638] [Forward] Layer loadtestdata, top blob data data: 0.320584
I0803 17:01:29.700920 30274 net.cpp:638] [Forward] Layer loadtestdata, top blob label data: 0.236383
I0803 17:01:29.701556 30274 net.cpp:638] [Forward] Layer conv0, top blob conv0 data: 0.106141
I0803 17:01:29.701633 30274 net.cpp:650] [Forward] Layer conv0, param blob 0 data: 0.0467062
I0803 17:01:29.701692 30274 net.cpp:650] [Forward] Layer conv0, param blob 1 data: 0
I0803 17:01:29.701835 30274 net.cpp:638] [Forward] Layer relu0, top blob conv0 data: 0.0547961
I0803 17:01:29.702193 30274 net.cpp:638] [Forward] Layer conv1, top blob conv1 data: 0.0716117
I0803 17:01:29.702267 30274 net.cpp:650] [Forward] Layer conv1, param blob 0 data: 0.0473551
I0803 17:01:29.702327 30274 net.cpp:650] [Forward] Layer conv1, param blob 1 data: 0
I0803 17:01:29.702425 30274 net.cpp:638] [Forward] Layer relu1, top blob conv1 data: 0.0318472
I0803 17:01:29.702781 30274 net.cpp:638] [Forward] Layer conv16, top blob conv16 data: 0.0403702
I0803 17:01:29.702847 30274 net.cpp:650] [Forward] Layer conv16, param blob 0 data: 0.0474007
I0803 17:01:29.702908 30274 net.cpp:650] [Forward] Layer conv16, param blob 1 data: 0
I0803 17:01:29.703228 30274 net.cpp:638] [Forward] Layer loss, top blob loss data: 11.2245
il est étrange que, dans l'avenir, toutes les couches en commençant par batchnorm donne 0 !! ! Il convient également de mentionner que Relu (couche in situ) n'a que 4 lignes, mais batchnorm et l'échelle (supposées être également des couches in situ) ont 6 et 3 lignes dans le fichier journal. Savez-vous quel est le problème?
quelle version de caffe utilisez-vous? – Shai