2017-10-03 1 views
3

Je cherchais un moyen de lire un fichier csv avec un nombre inconnu de colonnes dans un dictionnaire imbriqué. à savoir pour l'entrée de la formeComment créer un dictionnaire imbriqué à partir d'un fichier csv avec N lignes en Python

file.csv: 
1, 2, 3, 4 
1, 6, 7, 8 
9, 10, 11, 12 

Je veux un dictionnaire de la forme:

{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}} 

Ceci afin de permettre à O (1) la recherche d'une valeur dans le fichier csv. La création du dictionnaire peut prendre un temps relativement long, car dans mon application, je ne le crée qu'une seule fois, mais je le recherche des millions de fois.

Je voulais aussi une option pour nommer les colonnes pertinentes, afin que je puisse ignorer inutile une fois

Répondre

0

Voici ce que je suis venu avec. N'hésitez pas à commenter et suggérer des améliorations.

import csv 
import itertools 

def list_to_dict(lst): 
    # Takes a list, and recursively turns it into a nested dictionary, where 
    # the first element is a key, whose value is the dictionary created from the 
    # rest of the list. the last element in the list will be the value of the 
    # innermost dictionary 
    # INPUTS: 
    # lst - a list (e.g. of strings or floats) 
    # OUTPUT: 
    # A nested dictionary 
    # EXAMPLE RUN: 
    # >>> lst = [1, 2, 3, 4] 
    # >>> list_to_dict(lst) 
    # {1:{2:{3:4}}} 
    if len(lst) == 1: 
     return lst[0] 
    else: 
     data_dict = {lst[-2]: lst[-1]} 
     lst.pop() 
     lst[-1] = data_dict 
     return list_to_dict(lst) 


def dict_combine(d1, d2): 
    # Combines two nested dictionaries into one. 
    # INPUTS: 
    # d1, d2: Two nested dictionaries. The function might change d1 and d2, 
    #   therefore if the input dictionaries are not to be mutated, 
    #   you should pass copies of d1 and d2. 
    #   Note that the function works more efficiently if d1 is the 
    #   bigger dictionary. 
    # OUTPUT: 
    # The combined dictionary 
    # EXAMPLE RUN: 
    # >>> d1 = {1: {2: {3: 4, 5: 6}}} 
    # >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}} 
    # >>> dict_combine(d1, d2) 
    # {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}} 

    for key in d2: 
     if key in d1: 
      d1[key] = dict_combine(d1[key], d2[key]) 
     else: 
      d1[key] = d2[key] 
    return d1 


def csv_to_dict(csv_file_path, params=None, n_row_max=None): 
    # NAME: csv_to_dict 
    # 
    # DESCRIPTION: Reads a csv file and turns relevant columns into a nested 
    #    dictionary. 
    # 
    # INPUTS: 
    # csv_file_path: The full path to the data file 
    # params:  A list of relevant column names. The resulting dictionary 
    #     will be nested in the same order as parameters in 'params'. 
    #     Default is None (read all columns) 
    # n_row_max:  The maximum number of rows to read. Default is None 
    #     (read all rows) 
    # 
    # OUTPUT: 
    # A nested dictionary containing all the relevant csv data 

    csv_dictionary = {} 

    with open(csv_file_path, 'r') as csv_file: 
     csv_data = csv.reader(csv_file, delimiter=',') 
     names = next(csv_data)   # Read title line 
     if not params: 
      # A list of column indices to read from csv 
      relevant_param_indices = list(range(0, len(names) - 1)) 
     else: 
      # A list of column indices to read from csv 
      relevant_param_indices = [] 
      for name in params: 
       if name not in names:  
       # Parameter name is not found in title line 
        raise ValueError('Could not find {} in csv file'.format(name)) 
       else: 
       # Get indices of the relevant columns 
        relevant_param_indices.append(names.index(name)) 
     for row in itertools.islice(csv_data, 1, n_row_max): 
      # Get a list containing relevant columns only 
      relevant_cols = [row[i] for i in relevant_param_indices] 
      # Turn the string to numbers. Not necessary 
      float_row = [float(element) for element in relevant_cols] 
      # Build nested dictionary 
      csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row)) 

     return csv_dictionary 
0

est ici un simple, mais fragile approche:

>>> d = {} 
>>> with io.StringIO(s) as f: # fake a file 
...  reader = csv.reader(f) 
...  for row in reader: 
...   nested = d 
...   for val in map(int, row[:-2]): 
...    nested = nested.setdefault(val, {}) 
...   k, v = map(int, row[-2:]) # this will fail if you don't have enough columns 
...   nested[k] = v 
... 
>>> d 
{1: {2: {3: 4}, 6: {7: 8}}, 9: {10: {11: 12}}} 

Cependant, cela suppose le nombre de colonnes est au moins 2.