Erreur de ressource extériorisée MOO lors de l'allocation du tenseur avec forme [384,192]

Je suis maintenant confronté à un problème de classification d'images médicales. Les images sont de multiples tranches du cerveau du patient. Les données ont déjà été nettoyées. J'ai 150 patients AD, 150 patients MCI (déficience cognitive légère), 150 patients NC (normal). Chaque patient a 96 fichiers Dicom ou en disant, 96 tranches. Chaque tranche est de 160 * 160.Erreur de ressource extériorisée MOO lors de l'allocation du tenseur avec forme [384,192]

J'utilise le code Tensorflow cifar10 comme modèle et pour que le code fonctionne, je dois changer sa partie read_cifar10. Je change le code comme le suggère le lien ci-dessous.

Attach a queue to a numpy array in tensorflow for data fetch instead of files?

D'abord, je change les données aux fichiers binaires avec le mon propre load_img_M.py module python. Pour diminuer la quantité de données, je ne sélectionner que les tranches intermédiaires, de 30 à 60.

import numpy as np 
    import dicom 
    import os      
    ad_path = '/home/zmz/Pictures/AD' 
    mci_path = '/home/zmz/Pictures/MCI' 
    nc_path = '/home/zmz/Pictures/NC' 
    data_path =['/home/zmz/Pictures/%s' %i for i in ['NC','MCI','AD']] 
    KDOWN = 29 
    KUP = 61 
    SLICESNUM = KUP - KDOWN + 1 
    RECORDBYTES = 160*160*SLICESNUM + 1 
    #load image from the directory and save to binary file 
    def img2binary():   
     train_arr = np.zeros([100,SLICESNUM*160*160+1]) 
     test_arr = np.zeros([50,SLICESNUM*160*160+1]) 
     for p in range(len(data_path)): 
      Patientlist = os.listdir(data_path[p])   
      for q in range(len(Patientlist)): 
       Dicompath = os.path.join(data_path[p],Patientlist[q]) 
       Dicomlist = os.listdir(Dicompath) 
       if q<100: 
        train_arr[q,0] = p 
       else: 
        test_arr[q-100,0] = p    
       for k in range(len(Dicomlist)): 
        if k>KDOWN and k<KUP:#select the middle slices which have the most information 
         Picturepath = os.path.join(Dicompath,Dicomlist[k]) 
         img = dicom.read_file(Picturepath) 
         #print(type(img.pixel_array)) 
         imgpixel = img.pixel_array.reshape(25600) 
         if q <100: 
          # assign the label of the picture 
          train_arr[q,(1+(k-KDOWN-1)*25600):(1+(k-KDOWN)*25600)] = imgpixel #assign the pixel 
         else:       
          test_arr[q-100,(1+(k-KDOWN-1)*25600):(1+(k-KDOWN)*25600)] = imgpixel             
      train_arr.tofile("/home/zmz/Pictures/tmp/images/train%s.bin"%p) 
      test_arr.tofile("/home/zmz/Pictures/tmp/images/test%s.bin"%p)

Et les fichiers binaires ressembler à ceci:

How binary files look like

Ensuite, je change le module cifar10_input:

"""Routine for decoding the Alzeheimer dicom format""" 

from __future__ import absolute_import 
from __future__ import division 
from __future__ import print_function 
import load_img_M 
import os 
import numpy as np 
import dicom 

from six.moves import xrange # pylint: disable=redefined-builtin 
import tensorflow as tf 

# Process images of this size. Note that this differs from the original CIFAR 
# image size of 32 x 32. If one alters this number, then the entire model 
# architecture will change and any model would need to be retrained. 
IMAGE_SIZE = 160 

# Global constants describing the ADNI data set. 
IMAGE_HEIGHT = 160 
IMAGE_WIDTH = 160 
IMAGE_CHANNEL = 1 
SLICES_NUM = load_img_M.SLICESNUM 
NUM_CLASSES = 3 
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 300 
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 150 
#define a dicom reader to read a record 


def read_ADNI(filename_queue): 
    """Reads and parses examples from ADNI data files. 

    Recommendation: if you want N-way read parallelism, call this function 
    N times. This will give you N independent Readers reading different 
    files & positions within those files, which will give better mixing of 
    examples. 

    Args: 
    filename_queue: A queue of strings with the filenames to read from. 

    Returns: 
    An object representing a single example, with the following fields: 
     height: number of rows in the result (160) 
     width: number of columns in the result (160) 
     channels: number of color channels in the result (1) 
     key: a scalar string Tensor describing the filename & record number 
     for this example. 
     label: an int32 Tensor with the label in the range 0,1,2. 
     uint8image: a [slice, height, width, channels] uint8 Tensor with the image data 
    """ 

    class ADNIRecord(object): 
    pass#do nothing the class is vacant 
    result = ADNIRecord() 


    label_bytes = 1 
    result.height = IMAGE_HEIGHT 
    result.width = IMAGE_WIDTH 
    result.depth = IMAGE_CHANNEL 
    result.slice = SLICES_NUM 
    image_bytes = result.height * result.width * result.depth * result.slice 
    record_bytes = label_bytes + image_bytes 

    assert record_bytes == load_img_M.RECORDBYTES 

    reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) 
    result.key, value = reader.read(filename_queue) 

    # Convert from a string to a vector of uint8 that is record_bytes long. 
    record_bytes = tf.decode_raw(value, tf.uint8) 

    # The first bytes represent the label, which we convert from uint8->int32. 
    result.label = tf.cast(
     tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32) 

    # The remaining bytes after the label represent the image, which we reshape 
    # from [depth * height * width] to [depth, height, width]. 
    depth_major = tf.reshape(
     tf.strided_slice(record_bytes, [label_bytes], 
         [label_bytes + image_bytes]), 
     [result.slice,result.height, result.width, result.depth]) 
    # Convert from [depth, height, width] to [height, width, depth]. 
    #result.uint8image = tf.transpose(depth_major, [1, 2, 0]) 
    result.uint8image = depth_major 
    return result

Enfin, je modifie l'entrée déformée. Supprimer les blocs pour l'image comme pré-traitement des cultures et flipping:

def distorted_inputs(data_dir, batch_size): 
    """Construct distorted input for ADNI training using the Reader ops. 

    Args: 
    data_dir: Path to the ADNI data directory. 
    batch_size: Number of images per batch. 

    Returns: 
    images: Images. 5D tensor of [batch_size, slices , IMAGE_SIZE , IMAGE_SIZE, 1] size. 
    labels: Labels. 1D tensor of [batch_size] size. 
    """ 
    #filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i) 
    #   for i in xrange(1, 6)]#data_batch_1,2,3.bin 
    filenames = [os.path.join(data_dir,'tmp/images/train%s.bin' % i) for i in [0,1,2]] 
    for f in filenames: 
    if not tf.gfile.Exists(f): 
     raise ValueError('Failed to find file: ' + f) 

    # Create a queue that produces the filenames to read. 
    filename_queue = tf.train.string_input_producer(filenames) 
    ''' 
    call the first function defined at the very beggining 
    ''' 
    # Read examples from files in the filename queue. 
    read_input = read_ADNI(filename_queue) 
    reshaped_image = tf.cast(read_input.uint8image, tf.float32) 
# Set the shapes of tensors. 
    reshaped_image.set_shape([SLICES_NUM,IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNEL]) 
    read_input.label.set_shape([1]) 
    # Ensure that the random shuffling has good mixing properties. 
    min_fraction_of_examples_in_queue = 0.4 
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN * 
          min_fraction_of_examples_in_queue) 
    print ('Filling queue with %d ADNI images before starting to train. ' 
     'This will take a few minutes.' % min_queue_examples) 
return _generate_image_and_label_batch(reshaped_image, read_input.label, 
             min_queue_examples, batch_size, 
             shuffle=True)

Ce problème est un problème 3d réseau conv. J'utilise tf.conv3d et apporter des modifications afin que le code pourrait fonctionner:

def inference(images):#core code 
    """Build the ADNI model. 

    Args: 
    images: Images returned from distorted_inputs() or inputs(). 

    Returns: 
    Logits. 
    """ 
    # We instantiate all variables using tf.get_variable() instead of 
    # tf.Variable() in order to share variables across multiple GPU training runs. 
    # If we only ran this model on a single GPU, we could simplify this function 
    # by replacing all instances of tf.get_variable() with tf.Variable(). 
    # 
    # conv1 
    with tf.variable_scope('conv1') as scope: 
    kernel = _variable_with_weight_decay('weights', 
             shape=[3, 3, 3, 1, 64], 
             stddev=5e-2, 
             wd=0.0) 
    conv = tf.nn.conv3d(images, kernel, [1, 1, 1, 1, 1], padding='SAME') 
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0)) 
    pre_activation = tf.nn.bias_add(conv, biases) 
    conv1 = tf.nn.relu(pre_activation, name=scope.name) 
    _activation_summary(conv1) 

    # pool1 
    pool1 = tf.nn.max_pool3d(conv1, ksize=[1, 3, 3, 3, 1], strides=[1, 1, 2, 2, 1], 
         padding='SAME', name='pool1') 
    # norm1 
    #norm1 = tf.nn.lrn3d(pool1, 4, bias=1.0, alpha=0.001/9.0, beta=0.75,name='norm1') 
    norm1 = pool1 

    # conv2 
    with tf.variable_scope('conv2') as scope: 
    kernel = _variable_with_weight_decay('weights', 
             shape=[3, 3, 3, 64, 64], 
             stddev=5e-2, 
             wd=0.0) 
    conv = tf.nn.conv3d(norm1, kernel, [1, 1, 1, 1, 1], padding='SAME') 
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1)) 
    pre_activation = tf.nn.bias_add(conv, biases) 
    conv2 = tf.nn.relu(pre_activation, name=scope.name) 
    _activation_summary(conv2) 

    # norm2 
    #norm2 = tf.nn.lrn3d(conv2, 4, bias=1.0, alpha=0.001/9.0, beta=0.75,name='norm2') 
    norm2 = conv2 
    # pool2 
    pool2 = tf.nn.max_pool3d(norm2, ksize=[1, 3, 3 ,3, 1], 
         strides=[1, 1, 2, 2, 1], padding='SAME', name='pool2') 

    # local3 
    with tf.variable_scope('local3') as scope: 
    # Move everything into depth so we can perform a single matrix multiply. 
    reshape = tf.reshape(pool2, [FLAGS.batch_size, -1]) 
    dim = reshape.get_shape()[1].value 
    weights = _variable_with_weight_decay('weights', shape=[dim, 384], 
              stddev=0.04, wd=0.004) 
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1)) 
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name) 
    _activation_summary(local3) 

    # local4 
    with tf.variable_scope('local4') as scope: 
    weights = _variable_with_weight_decay('weights', shape=[384, 192], 
              stddev=0.04, wd=0.004) 
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1)) 
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name) 
    _activation_summary(local4) 

    # linear layer(WX + b), 
    # We don't apply softmax here because 
    # tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits 
    # and performs the softmax internally for efficiency. 
    with tf.variable_scope('softmax_linear') as scope: 
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES], 
              stddev=1/192.0, wd=0.0) 
    biases = _variable_on_cpu('biases', [NUM_CLASSES], 
           tf.constant_initializer(0.0)) 
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name) 
    _activation_summary(softmax_linear) 

    return softmax_linear

J'avais peur de l'ordinateur ne pouvait pas gérer cette grande quantité de data.So j'ai choisi la taille du lot à 1 dans un premier temps, dans l'espoir le code pourrait fonctionner au moins. Toujours rencontrer le problème dans le titre. En fait, j'ai très bon Linux workstation.12G GPU titan xp, mémoire 64G. Mais je le partage avec mes camarades de classe. Les ressources affectées à mon compte peuvent donc être faibles. My Linux GPU parameters

Et c'est encore mieux si vous pouvez faire un calcul pour démontrer pourquoi la ressource est épuisée.

Source

2017-10-10 oliverzzm

Même pour la taille de lot 1, le réseau est trop grand. Le problème peut être trouvé dans le nombre de poids nécessaires, pas dans le lot. Essayez de retirer quelques couches de la convonet ou réduisez les pixels des images 3d et voyez où vous en êtes.

Source

2017-10-10 12:07:23

Existe-t-il de meilleurs moyens pour entrer les données Dicom? La mise en file d'attente de ces fichiers binaires avec 16 threads prend-elle beaucoup de RAM? – oliverzzm

Je ne pense pas que votre problème soit les données. La mémoire est trop petite pour stocker les poids de votre réseau convolutif. Nourrir les données d'une manière différente ne va pas faire une différence significative dans votre cas. –

Je redimensionner à 80 * 80, et plus de problème de ressources épuisées. – oliverzzm

Erreur de ressource extériorisée MOO lors de l'allocation du tenseur avec forme [384,192]

Répondre

Questions connexes