2010-04-01 2 views
1

J'ai une donnée qui ressemble à ceciChunk des données en format tabulaires L'utilisation de Perl

1:SRX000566 
Submitter: WoldLab 
Study: RNASeq expression profiling for ENCODE project(SRP000228) 
Sample: Human cell line GM12878(SRS000567) 
Instrument: Solexa 1G Genome Analyzer 
Total: 4 runs, 62.7M spots, 2.1G bases 
Run #1: SRR002055, 11373440 spots, 375323520 bases 
Run #2: SRR002063, 22995209 spots, 758841897 bases 
Run #3: SRR005091, 13934766 spots, 459847278 bases 
Run #4: SRR005096, 14370900 spots, 474239700 bases 

2:SRX000565 
Submitter: WoldLab 
Study: RNASeq expression profiling for ENCODE project(SRP000228) 
Sample: Human cell line GM12878(SRS000567) 
Instrument: Solexa 1G Genome Analyzer 
Total: 3 runs, 51.2M spots, 1.7G bases 
Run #1: SRR002052, 12607931 spots, 416061723 bases 
Run #2: SRR002054, 12880281 spots, 425049273 bases 
Run #3: SRR002060, 25740337 spots, 849431121 bases 

3:SRX012407 
Submitter: GEO 
Study: GSE17153: Illumina sequencing of small RNAs from C. elegans embryos(SRP001363) 
Sample: Caenorhabditis elegans(SRS006961) 
Instrument: Illumina Genome Analyzer II 
Total: 1 run, 3M spots, 106.8M bases 
Run #1: SRR029428, 2965597 spots, 106761492 bases 

est-il un moyen compact pour les convertir en format tabulaire (onglet séparé). Donc 1 entrée/ligne par morceau. Dans ce cas 3 rangs.

J'ai essayé ceci mais ne semble pas fonctionner.

perl -laF/\n/ `-000ne"print join chr(9),@F" myfile.txt` 

Répondre

1

si vous ne me dérange pas awk

$ awk -vRS= -vFS="\n" '{$1=$1}1' OFS="\t" file 
1:SRX000566  Submitter: WoldLab  Study: RNASeq expression profiling for ENCODE project(SRP000228)  Sample: Human cell line GM12878(SRS000567) Instrument: Solexa 1G Genome Analyzer Total: 4 runs, 62.7M spots, 2.1G bases Run #1: SRR002055, 11373440 spots, 375323520 bases  Run #2: SRR002063, 22995209 spots, 758841897 bases Run #3: SRR005091, 13934766 spots, 459847278 bases  Run #4: SRR005096, 14370900 spots, 474239700 bases 
2:SRX000565  Submitter: WoldLab  Study: RNASeq expression profiling for ENCODE project(SRP000228)  Sample: Human cell line GM12878(SRS000567) Instrument: Solexa 1G Genome Analyzer Total: 3 runs, 51.2M spots, 1.7G bases Run #1: SRR002052, 12607931 spots, 416061723 bases  Run #2: SRR002054, 12880281 spots, 425049273 bases Run #3: SRR002060, 25740337 spots, 849431121 bases 
3:SRX012407  Submitter: GEO Study: GSE17153: Illumina sequencing of small RNAs from C. elegans embryos(SRP001363) Sample: Caenorhabditis elegans(SRS006961) Instrument: Illumina Genome Analyzer II Total: 1 run, 3M spots, 106.8M bases Run #1: SRR029428, 2965597 spots, 106761492 bases 

autrement un équivalent de la déclaration de awk ci-dessus

#!/usr/bin/perl 
$\ = "\n"; 
$/ = "\n\n"; 
while (<>) { 
    chomp; 
    @F = split(/\n/, $_); 
    print join("\t",@F); 
} 
1

Traitez cela comme un problème d'analyse syntaxique normale, et ajouter un peu de l'état :

my @records; 
my @current_record; 

while(my $line = <>) { 
    chomp; 

    if(length $line) { 
     # Store record data 
     push @current_record, $line; 
    } 
    else { 
     # Start new record 
     push @records, [@current_record] if @current_record; 
     @current_record =(); 
    } 
} 

print join "\t", @$_ for @records; 

Ceci est non testé et je dois aller au lit. Si ça ne marche pas, je vais devoir revoir demain.

3
perl -lanF"\n" -000 -e 'print join "\t", @F' file 
Questions connexes