Utilisation du "Bitfusion Ubuntu 14 TensorFlow" AMI, toute tentative de préformer les opérations avec de grandes tenseurs tels queBitfusion Ubuntu 14 tensorflow AMI échoue avec les erreurs de MOO
sess.run(tf.argmax(y, 1), feed_dict={x: use_x})
lorsque use_x
est un 28.000 tf.Tensor
des flotteurs, des résultats dans
"Ressource Épuisée: OOM"
erreurs Cela rend l'AMI inutilisable pour moi
Y a-t-il un paramètre qui me manque pour éviter cela?
----------
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (32768): Total Chunks: 1, Chunks in use: 0 56.8KiB allocated for chunks. 3.1KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (65536): Total Chunks: 1, Chunks in use: 0 111.2KiB allocated for chunks. 4B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (8388608): Total Chunks: 2, Chunks in use: 0 23.73MiB allocated for chunks. 440.3KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:639] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:656] Bin for 83.74MiB was 64.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a0400 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a2400 of size 6144
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a3c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a3d00 of size 3328
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a4a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023a4b00 of size 204800
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023d6b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7023d6c00 of size 25088000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703bc3c00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x703bc5c00 of size 12000000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704737700 of size 6144
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704738f00 of size 60160
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704747a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704747b00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704749f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70474a200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704758600 of size 60160
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704767100 of size 76288
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x704779f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477a900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477aa00 of size 3328
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477b700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70477b800 of size 204800
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7047ad800 of size 12000000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x705f67a00 of size 8192
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x705f69a00 of size 25088000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x707756a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c8600 of size 6144
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c9e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082c9f00 of size 6144
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082e7400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x7082e7500 of size 25088000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x709ad4500 of size 12000000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a646000 of size 3328
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a646d00 of size 204800
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70a678d00 of size 87808000
I tensorflow/core/common_runtime/bfc_allocator.cc:674] Chunk at 0x70fa36500 of size 3703905024
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x70474a300 of size 58112
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x70531f300 of size 12879616
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x707756b00 of size 12000000
I tensorflow/core/common_runtime/bfc_allocator.cc:683] Free at 0x7082cb700 of size 113920
I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 35 Chunks of size 256 totalling 8.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 3328 totalling 9.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 6144 totalling 24.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 8192 totalling 32.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 60160 totalling 117.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 76288 totalling 74.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 204800 totalling 600.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 12000000 totalling 34.33MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 3 Chunks of size 25088000 totalling 71.78MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 87808000 totalling 83.74MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 3703905024 totalling 3.45GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 3.64GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 3928915968
InUse: 3903864320
MaxInUse: 3903864320
NumAllocs: 418794
MaxAllocSize: 3703905024
W tensorflow/core/common_runtime/bfc_allocator.cc:270] ******************************************************************************xxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 83.74MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:907] Resource exhausted: OOM when allocating tensor with shape[28000,1,28,28]
Traceback (most recent call last):
File "tf_simple.py", line 173, in <module>
evals = sess.run(tf.argmax(y, 1), feed_dict={x: use_x})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 343, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 567, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 640, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 662, in _do_call
e.code)
tensorflow.python.framework.errors.ResourceExhaustedError: OOM when allocating tensor with shape[28000,1,28,28]
[[Node: 1_conv_layer/kernel_logits/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](as_grid, 1_conv_layer/kernel_weights/W1/read)]]
[[Node: ArgMax/_2316 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1481_ArgMax", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'1_conv_layer/kernel_logits/Conv2D', defined at:
File "tf_simple.py", line 47, in <module>
final_dropout=final_dropout)
File "/home/ubuntu/mlcode/tf_utils.py", line 150, in make_ff_network
layer_name)
File "/home/ubuntu/mlcode/tf_utils.py", line 86, in _add_conv_layer
kernel_logits = tf.nn.conv2d(input_tensor, weights, strides=[1, 1, 1, 1], padding='SAME') + biases
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 295, in conv2d
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 694, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
AWS possède maintenant des instal- lations p2 qui disposent chacune de 12 Go de mémoire. Cela devrait vous permettre de travailler avec des tenseurs plus grands sans manquer de mémoire sur ces GPU. – mbajkowski