2015-04-16 1 views
3

J'ai trois grandes listes et je veux créer un dictionnaire imbriqué comme si:dictionnaire Nested - python

dic={"gene":{"isoform1":positions1,"isoform2":positions2}, "gene2":{"isoform1:positions1, "isoform2":positions2...etc} 

Je suis en mesure d'obtenir les isoformes et les positions dans un dictionnaire comme si:

Dictionary = dict(zip(Isoform, ExonPos)) 

Cependant, je ne sais pas comment ajouter le nom de gène comme la clé du dictionnaire d'Isoform et ExonPos

Aussi, est-il un moyen d'utiliser une liste comme les valeurs d'une clé?

Un peu comme ceci:

Dictionary = {key:[1,2,3,4,5], key2:[3,5,4]} 

Voici mes listes d'échantillons:

Genes = ['A2M', 'ACADM', 'ACADS', 'ACADVL', 'ACAT1', 'ACVRL1', 'PSEN1', 'ADA', 'SGCA', 'ADRB2', 'ADSL', 'AGA', 'AGT', 'AGXT', 'ALAD', 'ALAS2', 'ABCD1', 'ALDOA', 'ALDOB'] 

Isoforms = ['NM_000014', 'NM_000016', 'NM_000017', 'NM_000018', 'NM_000019', 'NM_000020', 'NM_000021', 'NM_000022', 'NM_000023', 'NM_000024', 'NM_000026', 'NM_000027', 'NM_000029', 'NM_000030', 'NM_000031', 'NM_000032', 'NM_000033', 'NM_000034'] 

ExonPos = ['9220303,9220778,9221335,9222340,9223083,9224954,9225248,9227155,9229351,9229941,9230296,9231839,9232234,9232689,9241795,9242497,9242951,9243796,9246060,9247568,9248134,9251202,9251976,9253739,9254042,9256834,9258831,9259086,9260119,9261916,9262462,9262909,9264754,9264972,9265955,9268359,', '76190031,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376,', '121163570,121164828,121174788,121175158,121175639,121176082,121176335,121176622,121176942,121177098,', '7123149,7123440,7123782,7123922,7124084,7124242,7124856,7125270,7125495,7125985,7126451,7126962,7127131,7127286,7127464,7127639,7127798,7127960,7128127,7128275,', '107992257,108002633,108004546,108004947,108005868,108009624,108010791,108,108013163,108014709,108016928,108017996,', '52301201,52306253,52306882,52307342,52307757,52308222,52309008,52309819,52312768,52314542,', '73603142,73614502,73614674,73637504,73640273,73653560,73659351,73664738,73673093,73678476,73683833,73685841,', '43248162,43248939,43249658,43251228,43251469,43251647,43252842,43254209,43255096,43257687,43264867,43280215,', '48243365,48244728,48244942,48245307,48245734,48246452,48247503,48248000,48252617,48253072,', '148206155,', '40742503,40745835,40749076,40750251,40754867,40755263,40756405,40757276,40757491,40758984,40760279,40760883,40762439,'] 
+2

Mettez votre liste d'exemples ici – itzMEonTV

+0

ajouté mes listes d'échantillons – cosmictypist

+0

Les listes sont de longueurs différentes, je ne suis pas su Comment les isoformes correspondent aux gènes, les gènes correspondent aux exonpos et les isoformes aux exonpos. –

Répondre

2

Vous pouvez utiliser une compréhension dict:

>>> dic={gene:{iso:exon.split(',')} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)} 
>>> dic 
{'ACADVL': {'NM_000018': ['7123149', '7123440', '7123782', '7123922', '7124084', '7124242', '7124856', '7125270', '7125495', '7125985', '7126451', '7126962', '7127131', '7127286', '7127464', '7127639', '7127798', '7127960', '7128127', '7128275', '']}, 'PSEN1': {'NM_000021': ['73603142', '73614502', '73614674', '73637504', '73640273', '73653560', '73659351', '73664738', '73673093', '73678476', '73683833', '73685841', '']}, 'SGCA': {'NM_000023': ['48243365', '48244728', '48244942', '48245307', '48245734', '48246452', '48247503', '48248000', '48252617', '48253072', '']}, 'ACADM': {'NM_000016': ['76190031', '76194085', '76198328', '76198537', '76199212', '76200475', '76205664', '76211490', '76215103', '76216135', '76226806', '76228376', '']}, 'ACAT1': {'NM_000019': ['107992257', '108002633', '108004546', '108004947', '108005868', '108009624', '108010791', '108', '108013163', '108014709', '108016928', '108017996', '']}, 'ADRB2': {'NM_000024': ['148206155', '']}, 'ACADS': {'NM_000017': ['121163570', '121164828', '121174788', '121175158', '121175639', '121176082', '121176335', '121176622', '121176942', '121177098', '']}, 'ACVRL1': {'NM_000020': ['52301201', '52306253', '52306882', '52307342', '52307757', '52308222', '52309008', '52309819', '52312768', '52314542', '']}, 'ADA': {'NM_000022': ['43248162', '43248939', '43249658', '43251228', '43251469', '43251647', '43252842', '43254209', '43255096', '43257687', '43264867', '43280215', '']}, 'ADSL': {'NM_000026': ['40742503', '40745835', '40749076', '40750251', '40754867', '40755263', '40756405', '40757276', '40757491', '40758984', '40760279', '40760883', '40762439', '']}, 'A2M': {'NM_000014': ['9220303', '9220778', '9221335', '9222340', '9223083', '9224954', '9225248', '9227155', '9229351', '9229941', '9230296', '9231839', '9232234', '9232689', '9241795', '9242497', '9242951', '9243796', '9246060', '9247568', '9248134', '9251202', '9251976', '9253739', '9254042', '9256834', '9258831', '9259086', '9260119', '9261916', '9262462', '9262909', '9264754', '9264972', '9265955', '9268359', '']}} 
Ou

, si vous voulez une chaîne vs une liste:

>>> dic={gene:{iso:exon} for gene, iso, exon in zip(Genes, Isoforms, ExonPos)} 
>>> dic 
{'ACADVL': {'NM_000018': '7123149,7123440,7123782,7123922,7124084,7124242,7124856,7125270,7125495,7125985,7126451,7126962,7127131,7127286,7127464,7127639,7127798,7127960,7128127,7128275,'}, 'PSEN1': {'NM_000021': '73603142,73614502,73614674,73637504,73640273,73653560,73659351,73664738,73673093,73678476,73683833,73685841,'}, 'SGCA': {'NM_000023': '48243365,48244728,48244942,48245307,48245734,48246452,48247503,48248000,48252617,48253072,'}, 'ACADM': {'NM_000016': '76190031,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376,'}, 'ACAT1': {'NM_000019': '107992257,108002633,108004546,108004947,108005868,108009624,108010791,108,108013163,108014709,108016928,108017996,'}, 'ADRB2': {'NM_000024': '148206155,'}, 'ACADS': {'NM_000017': '121163570,121164828,121174788,121175158,121175639,121176082,121176335,121176622,121176942,121177098,'}, 'ACVRL1': {'NM_000020': '52301201,52306253,52306882,52307342,52307757,52308222,52309008,52309819,52312768,52314542,'}, 'ADA': {'NM_000022': '43248162,43248939,43249658,43251228,43251469,43251647,43252842,43254209,43255096,43257687,43264867,43280215,'}, 'ADSL': {'NM_000026': '40742503,40745835,40749076,40750251,40754867,40755263,40756405,40757276,40757491,40758984,40760279,40760883,40762439,'}, 'A2M': {'NM_000014': '9220303,9220778,9221335,9222340,9223083,9224954,9225248,9227155,9229351,9229941,9230296,9231839,9232234,9232689,9241795,9242497,9242951,9243796,9246060,9247568,9248134,9251202,9251976,9253739,9254042,9256834,9258831,9259086,9260119,9261916,9262462,9262909,9264754,9264972,9265955,9268359,'}} 
+0

Merci! Cela a fonctionné encore mieux. – cosmictypist

+0

Je viens de me rendre compte que cela ne fonctionnera pas s'il y a des isoformes qui ont le même nom de gène. Y a-t-il un moyen de surmonter cela? – cosmictypist

+0

Si les isoformes sont les mêmes mais avec plus d'une entrée de liste, quel est le résultat que vous recherchez? – dawg

2

Juste zip 2 fois et là, il est

Dictionary = dict(zip(Genes, [{i[0]: i[1:]} for i in zip(Isoforms, ExonPos)])) 


print(Dictionary) 
+0

Cela ne fonctionnera pas avec les listes inégales fournies. – ILostMySpoon

+0

J'ai corrigé. Merci – giaosudau

+0

Parfait. Cela a fonctionné. Merci – cosmictypist

0
result = {} 
for gene, iso, exon in zip(Genes, Isoforms, ExonPos): 
    result[gene] = {iso: exon.split(',')} 

Si vous ne pas besoin de convertir la liste de valeurs séparées par des virgules de strin g pour lister, utilisez exon au lieu de exon.split(',').

1

répondre à votre 2ème question. oui vous avez évidemment un dictionnaire avec des valeurs que les listes regardent ce

>>> dic = {} 
>>> dic = {"key1":[1,2,3]} 
>>> dic.update({"key2":[4,5,6]}) 
>>> dic['key3'] = [7,8,9] 
>>> dic 
{'key3': [7, 8, 9], 'key2': [4, 5, 6], 'key1': [1, 2, 3]} 

répondant à votre 1ère question. Puisque vous avez dit que vous avez déjà 2 autres listes zippées une manière que vous voulez, dans un Vous avez juste très grossière de faire quelque chose comme ça

newdictionary = {} 
newdictionary.update({gene[index]:zippeddictionary[index]})