L'apprentissage de l'implémentation SSD dans Keras s'arrête après quelques itérations sans aucune sortie ou erreur

Après quelques itérations de la première époque, le processus d'apprentissage s'arrête sans aucun message de sortie ou d'erreur. la mise en œuvre de SSD dans Keras a été utilisé à partir https://github.com/rykov8/ssd_keras L'apprentissage de l'implémentation SSD dans Keras s'arrête après quelques itérations sans aucune sortie ou erreur

base_lr = 3e-4 
#optim = keras.optimizers.Adam(lr=base_lr) 
optim = keras.optimizers.RMSprop(lr=base_lr) 
#optim = keras.optimizers.SGD(lr=base_lr, momentum=0.9, decay=decay, nesterov=True) 
model.compile(optimizer=optim, 
       loss=MultiboxLoss(NUM_CLASSES+1, neg_pos_ratio=2.0).compute_loss) 



nb_epoch = 10 
history = model.fit_generator(gen.generate(True), gen.train_batches, 
           nb_epoch, verbose=1, 
           callbacks=None, 
           validation_data=gen.generate(False), 
           nb_val_samples=gen.val_batches, 
           nb_worker=1 
           )

La sortie du programme est comme ci-dessous:

Epoch 1/10 
/home/deepesh/Documents/ssd_traffic/ssd_utils.py:119: RuntimeWarning: divide by zero encountered in log 
    assigned_priors_wh) 
2017-10-15 18:00:53.763886: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:02.602807: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:03.831092: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:03.831138: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:04.774444: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:05.897872: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.46GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:05.897923: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.94GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:09.133494: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:09.133541: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.15GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
2017-10-15 18:01:11.266114: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 
13/14 [==========================>...] - ETA: 9s - loss: 2.9617

Il n'y a pas de message de sortie ou d'erreur après.

Source

2017-10-15 Deepesh Lekhak

Vous n'avez pas assez de mémoire, chose que vous pouvez faire pour résoudre le problème:

réduire la taille du lot
réduire la taille des données du train
vos modèles dans les nuages (AMS, nuage Google et etc)
utiliser une autre carte GPU avec plus de mémoire
ou essayez CPU

Source

2017-10-15 14:35:22 Paddy

J'ai formé le modèle sur AMS g2.8xlarge instance, mais le problème n'est pas résolu. Lorsque je réduis la taille du lot à seulement 2, le problème est résolu. –

bon d'entendre :) – Paddy

L'apprentissage de l'implémentation SSD dans Keras s'arrête après quelques itérations sans aucune sortie ou erreur

Répondre

Questions connexes