2017-03-28 1 views
3

J'ai essayé de mettre en œuvre How to use expand in snakemake when some particular combinations of wildcards are not desired?Implémentation Comment utiliser expand dans snakemake lorsque certaines combinaisons de caractères génériques ne sont pas souhaitées?

L'objectif est de traiter seulement croisé des combinaisons entre SUPERGROUPS:

from itertools import product 

DOMAINS=["Metallophos"] 
SUPERGROUPS=["2supergroups","5supergroups"] 
SUPERGROUPS_INVERSED=["5supergroups","2supergroups"] 
CUTOFFS=["0"] 

def filter_combinator(combinator, blacklist): 
    def filtered_combinator(*args, **kwargs): 
     for wc_comb in combinator(*args, **kwargs): 
      # Use frozenset instead of tuple 
      # in order to accomodate 
      # unpredictable wildcard order 
      if frozenset(wc_comb) not in blacklist: 
       yield wc_comb 
    return filtered_combinator 

# "2supergroups/5supergroups" and "5supergroups/2supergroups" are undesired 
forbidden = { 
    frozenset({("supergroup", "2supergroups"), ("supergroup_other", "2supergroups")}), 
    frozenset({("supergroup", "5supergroups"), ("supergroup_other", "5supergroups")})} 

filtered_product = filter_combinator(product, forbidden) 

rule target : 
    input: 
     expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS) 

rule tree_measures: 
    input: 
     tree="results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups.for.notung", 
     list="results/{domain}/{supergroup}/hmmer_search_bbh_1/bbhlist.txt.{domain}.fa.OGs.tbl.txt.0.list.txt.nh.OGs.txt", 
     mapping1="results/{domain}/{supergroup_other}/{supergroup}/OGSmapping.txt.list", 
     categories="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.categories", 
     mapping2="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list", 
     supergroups="results/{domain}/{supergroup}/hmmer_search_2/{domain}.fa.OGs.tbl.txt.{cutoff}.supergroups.csv" 
    output: 
     "results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{cutoff}.statistics" 
    shell: 
     "~/tools/Python-2.7.11/python scripts/tree_measures.py {input.tree} {input.list} {input.mapping1} {input.categories} {input.mapping2} {input.supergroups} {wildcards.cutoff} results/{wildcards.domain}/{wildcards.supergroup}/{wildcards.supergroup_other}/" 

Mais je reçois toujours un message d'erreur:

Missing input files for rule tree_measures: 
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.list 
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.categories 

Qu'est-ce que je manque?

Répondre

2

Je semble que vous devez effectuer l'expansion en 2 étapes, comme suit:

rule target : 
    input: 
     expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS) 

L'intérieur utilise l'astuce élargir filtered_product, et l'extérieur est un normal.

Une autre approche consiste à utiliser itertools.permutations pour la liste intérieure:

from itertools import permutations 

DOMAINS=["Metallophos"] 
SUPERGROUPS=["2supergroups","5supergroups"] 
CUTOFFS=["0"] 

rule target : 
    input: 
     expand(
      ["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2) 
       for (sgrp1, sgrp2) in permutations(SUPERGROUPS)], 
      cutoff=CUTOFFS, domain = DOMAINS) 

Une autre possibilité est d'utiliser zip:

rule target : 
    input: 
     expand(
      ["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2) 
       for (sgrp1, sgrp2) in zip(SUPERGROUPS, SUPERGROUPS_INVERSED)], 
      cutoff=CUTOFFS, domain = DOMAINS)