2016-05-17 1 views
0

Utilisation du "Bitfusion Ubuntu 14 TensorFlow" AMI, toute tentative de préformer les opérations avec de grandes tenseurs tels queBitfusion Ubuntu 14 tensorflow AMI échoue avec les erreurs de MOO

sess.run(tf.argmax(y, 1), feed_dict={x: use_x}) 

lorsque use_x est un 28.000 tf.Tensor des flotteurs, des résultats dans

"Ressource Épuisée: OOM"

erreurs Cela rend l'AMI inutilisable pour moi

Y a-t-il un paramètre qui me manque pour éviter cela?

----------

I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16384):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (32768):  Total Chunks: 1, Chunks in use: 0 56.8KiB allocated for chunks. 3.1KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (65536):  Total Chunks: 1, Chunks in use: 0 111.2KiB allocated for chunks. 4B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8388608): Total Chunks: 2, Chunks in use: 0 23.73MiB allocated for chunks. 440.3KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (134217728):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (268435456):  Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin. 
I tensorflow/core/common_runtime/bfc_allocator.cc:656] Bin for 83.74MiB was 64.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0000 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0100 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0200 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0300 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0400 of size 8192 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a2400 of size 6144 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a3c00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a3d00 of size 3328 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a4a00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a4b00 of size 204800 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023d6b00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023d6c00 of size 25088000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703bc3c00 of size 8192 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703bc5c00 of size 12000000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704737700 of size 6144 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704738f00 of size 60160 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704747a00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704747b00 of size 8192 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749b00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749c00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749d00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749e00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749f00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a000 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a100 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a200 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704758600 of size 60160 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704767100 of size 76288 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779b00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779c00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779d00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779e00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779f00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a000 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a100 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a200 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a300 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a400 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a500 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a600 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a700 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a800 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a900 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477aa00 of size 3328 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477b700 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477b800 of size 204800 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7047ad800 of size 12000000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x705f67a00 of size 8192 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x705f69a00 of size 25088000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x707756a00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c8600 of size 6144 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c9e00 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c9f00 of size 6144 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082e7400 of size 256 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082e7500 of size 25088000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x709ad4500 of size 12000000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a646000 of size 3328 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a646d00 of size 204800 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a678d00 of size 87808000 
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70fa36500 of size 3703905024 
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x70474a300 of size 58112 
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x70531f300 of size 12879616 
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x707756b00 of size 12000000 
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x7082cb700 of size 113920 
I tensorflow/core/common_runtime/bfc_allocator.cc:689]  Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 35 Chunks of size 256 totalling 8.8KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 3328 totalling 9.8KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 6144 totalling 24.0KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 8192 totalling 32.0KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 60160 totalling 117.5KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 76288 totalling 74.5KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 204800 totalling 600.0KiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 12000000 totalling 34.33MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 25088000 totalling 71.78MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 87808000 totalling 83.74MiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 3703905024 totalling 3.45GiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 3.64GiB 
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats: 
Limit:     3928915968 
InUse:     3903864320 
MaxInUse:    3903864320 
NumAllocs:     418794 
MaxAllocSize:   3703905024 

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ******************************************************************************xxxxxxxxxxxxxxxxxxxxxx 
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 83.74MiB. See logs for memory state. 
W tensorflow/core/framework/op_kernel.cc:907] Resource exhausted: OOM when allocating tensor with shape[28000,1,28,28] 

Traceback (most recent call last): 
    File "tf_simple.py", line 173, in <module> 
    evals = sess.run(tf.argmax(y, 1), feed_dict={x: use_x}) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 343, in run 
    run_metadata_ptr) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 567, in _run 
    feed_dict_string, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 640, in _do_run 
    target_list, options, run_metadata) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 662, in _do_call 
    e.code) 
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[28000,1,28,28] 
    [[Node: 1_conv_layer/kernel_logits/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](as_grid, 1_conv_layer/kernel_weights/W1/read)]] 
    [[Node: ArgMax/_2316 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1481_ArgMax", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 
Caused by op u'1_conv_layer/kernel_logits/Conv2D', defined at: 
    File "tf_simple.py", line 47, in <module> 
    final_dropout=final_dropout) 
    File "/home/ubuntu/mlcode/tf_utils.py", line 150, in make_ff_network 
    layer_name) 
    File "/home/ubuntu/mlcode/tf_utils.py", line 86, in _add_conv_layer 
    kernel_logits = tf.nn.conv2d(input_tensor, weights, strides=[1, 1, 1, 1], padding='SAME') + biases 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 295, in conv2d 
    data_format=data_format, name=name) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 694, in apply_op 
    op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op 
    original_op=self._default_original_op, op_def=op_def) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__ 
    self._traceback = _extract_stack() 

Répondre

1

Le problème est la limite de mémoire sur les processeurs graphiques AWS ~ 4 Go, il est pas un problème avec l'AMI:

Limite: 3928915968

InUse: 3903864320

MaxInUse: 3903864320

NumAllocs: 418794

MaxAllocSize: 3703905024

La limite de mémoire est 3.928GB, la mémoire utilisée est 3.903GB, et la demande d'allocation est pour 0.083GB, ce qui dépasse la limite de mémoire. Sur AWS, vos options sont soit de réécrire votre code de sorte qu'il peut fonctionner dans la limite de 4 Go, fonctionner en mode CPU uniquement pour cette section de code et utiliser la mémoire système (ce qui va à l'encontre de l'utilisation d'un GPU) Attendez AWS pour introduire de nouvelles instances GPU avec une plus grande mémoire. Alternativement, vous pouvez rechercher un autre fournisseur de cloud tel que Nimbix qui offre plus de GPU à jour.

+0

AWS possède maintenant des instal- lations p2 qui disposent chacune de 12 Go de mémoire. Cela devrait vous permettre de travailler avec des tenseurs plus grands sans manquer de mémoire sur ces GPU. – mbajkowski