2017-07-11 1 views
1

Je dois extraire des enregistrements d'un fichier texte dans U-SQL. La première ligne est différente des autres lignes et contient la date. Je dois passer la première ligne, mais je dois copier la date de la première ligne et la coller dans une nouvelle colonne pour toutes les lignes. Ainsi, dans ma dernière requête de sortie u-sql, la première colonne de chaque ligne contiendra les mêmes données qui ont été copiées à partir de la première ligne du fichier. S'il vous plaît voir le fichier image joint pour plus d'informations.Ajouter une nouvelle colonne avec une valeur de ligne spécifique dans U-SQL

enter image description here

S'il vous plaît me suggérer la requête u-sql correcte pour effectuer cette tâche.

Répondre

3

Voici une autre façon. J'ai utilisé le SearchLog.tsv dans les échantillons pour le démontrer. En haut du fichier, j'ai ajouté la ligne FROM 01JAN17 TO 31JAN17.

//Skip the first row and read all the other rows 
@searchlog = 
    EXTRACT UserId   int, 
      Start   DateTime, 
      Region   string, 
      Query   string, 
      Duration  int, 
      Urls   string, 
      ClickedUrls  string 
    FROM @"/Samples/Data/SearchLogWithHeader.tsv" 
    USING Extractors.Tsv(skipFirstNRows: 1); 

//Extract all the text in the same file but don't parse out the individual columns 
@searchlogAllText = 
    EXTRACT rowText string 
    FROM @"/Samples/Data/SearchLogWithHeader.tsv" 
    USING Extractors.Text(delimiter: '\n'); 

//Find a pattern that works for you and use the .NET expressions that match the string 
@searchlogHeaderDate = 
    SELECT rowText.Split(' ')[1] AS FromDate FROM @searchlogAllText WHERE rowText.StartsWith("FROM");          

@output = SELECT * FROM @searchlogHeaderDate CROSS JOIN @searchlog; 

OUTPUT @output 
    TO @"/Samples/Output/SearchLog_output.tsv" 
    USING Outputters.Tsv(); 

entrée:

FROM 01JAN17 TO 31JAN17 
399266 2/15/2012 11:53:16 AM en-us how to make nachos 73 www.nachos.com;www.wikipedia.com NULL 
382045 2/15/2012 11:53:18 AM en-gb best ski resorts 614 skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm ski-europe.com;www.travelersdigest.com/ski_resorts.htm 
382045 2/16/2012 11:53:20 AM en-gb broken leg 74 mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture 
106479 2/16/2012 11:53:50 AM en-ca south park episodes 24 southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall southparkstudios.com 
906441 2/16/2012 11:54:01 AM en-us cosmos 1213 cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos NULL 
351530 2/16/2012 11:54:01 AM en-fr microsoft 241 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 
640806 2/16/2012 11:54:02 AM en-us wireless headphones 502 www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com www.amazon.com;store.apple.com 
304305 2/16/2012 11:54:03 AM en-us dominos pizza 60 dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos dominos.com 
460748 2/16/2012 11:54:04 AM en-us yelp 1270 yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp yelp.com 
354841 2/16/2012 11:59:01 AM en-us how to run 610 running.about.com;ehow.com;go.com running.about.com;ehow.com 
354068 2/16/2012 12:00:33 PM en-mx what is sql 422 wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL wikipedia.org/wiki/SQL 
674364 2/16/2012 12:00:55 PM en-us mexican food redmond 283 eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com NULL 
347413 2/16/2012 12:11:55 PM en-gr microsoft 305 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 
848434 2/16/2012 12:12:35 PM en-ch facebook 10 facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook facebook.com 
604846 2/16/2012 12:13:55 PM en-us wikipedia 612 wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia wikipedia.org 
840614 2/16/2012 12:13:56 PM en-us xbox 1220 xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360 xbox.com/xbox360 
656666 2/16/2012 12:15:55 PM en-us hotmail 691 hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail NULL 
951513 2/16/2012 12:17:00 PM en-us pokemon 63 pokemon.com;pokemon.com/us;serebii.net pokemon.com 
350350 2/16/2012 12:18:17 PM en-us wolfram 30 wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram NULL 
641615 2/16/2012 12:19:55 PM en-us kahn 119 khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name) khanacademy.org 
65 2/16/2012 12:20:03 PM en-us clothes 732 gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars footballfanatics.com/college_washington_state_cougars 
651777 2/16/2012 12:20:33 PM en-us food recipes 183 allrecipes.com;foodnetwork.com;simplyrecipes.com foodnetwork.com 
666352 2/16/2012 12:21:03 PM en-us weight loss 630 en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com webmd.com/diet 

sortie:

"01JAN17" 399266 2012-02-15T11:53:16.0000000 "en-us" "how to make nachos" 73 "www.nachos.com;www.wikipedia.com" "NULL" 
"01JAN17" 382045 2012-02-15T11:53:18.0000000 "en-gb" "best ski resorts" 614 "skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm" "ski-europe.com;www.travelersdigest.com/ski_resorts.htm" 
"01JAN17" 382045 2012-02-16T11:53:20.0000000 "en-gb" "broken leg" 74 "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture" "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture" 
"01JAN17" 106479 2012-02-16T11:53:50.0000000 "en-ca" "south park episodes" 24 "southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall" "southparkstudios.com" 
"01JAN17" 906441 2012-02-16T11:54:01.0000000 "en-us" "cosmos" 1213 "cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos" "NULL" 
"01JAN17" 351530 2012-02-16T11:54:01.0000000 "en-fr" "microsoft" 241 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com" "NULL" 
"01JAN17" 640806 2012-02-16T11:54:02.0000000 "en-us" "wireless headphones" 502 "www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com" "www.amazon.com;store.apple.com" 
"01JAN17" 304305 2012-02-16T11:54:03.0000000 "en-us" "dominos pizza" 60 "dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos" "dominos.com" 
"01JAN17" 460748 2012-02-16T11:54:04.0000000 "en-us" "yelp" 1270 "yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp" "yelp.com" 
"01JAN17" 354841 2012-02-16T11:59:01.0000000 "en-us" "how to run" 610 "running.about.com;ehow.com;go.com" "running.about.com;ehow.com" 
"01JAN17" 354068 2012-02-16T12:00:33.0000000 "en-mx" "what is sql" 422 "wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL" "wikipedia.org/wiki/SQL" 
"01JAN17" 674364 2012-02-16T12:00:55.0000000 "en-us" "mexican food redmond" 283 "eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com" "NULL" 
"01JAN17" 347413 2012-02-16T12:11:55.0000000 "en-gr" "microsoft" 305 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com" "NULL" 
"01JAN17" 848434 2012-02-16T12:12:35.0000000 "en-ch" "facebook" 10 "facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook" "facebook.com" 
"01JAN17" 604846 2012-02-16T12:13:55.0000000 "en-us" "wikipedia" 612 "wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia" "wikipedia.org" 
"01JAN17" 840614 2012-02-16T12:13:56.0000000 "en-us" "xbox" 1220 "xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360" "xbox.com/xbox360" 
"01JAN17" 656666 2012-02-16T12:15:55.0000000 "en-us" "hotmail" 691 "hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail" "NULL" 
"01JAN17" 951513 2012-02-16T12:17:00.0000000 "en-us" "pokemon" 63 "pokemon.com;pokemon.com/us;serebii.net" "pokemon.com" 
"01JAN17" 350350 2012-02-16T12:18:17.0000000 "en-us" "wolfram" 30 "wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram" "NULL" 
"01JAN17" 641615 2012-02-16T12:19:55.0000000 "en-us" "kahn" 119 "khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name)" "khanacademy.org" 
"01JAN17"65 2012-02-16T12:20:03.0000000 "en-us" "clothes" 732 "gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars" "footballfanatics.com/college_washington_state_cougars" 
"01JAN17" 651777 2012-02-16T12:20:33.0000000 "en-us" "food recipes" 183 "allrecipes.com;foodnetwork.com;simplyrecipes.com" "foodnetwork.com" 
"01JAN17" 666352 2012-02-16T12:21:03.0000000 "en-us" "weight loss" 630 "en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com" "webmd.com/diet" 
2

j'ai pu faire quelque chose comme ça avec U-SQL normal (pas une coutume UDO) et ce sample file:

USING rx = System.Text.RegularExpressions.Regex; 

DECLARE @inputFilepath string = "input/input71.txt"; 
DECLARE @outputFilepath string = "output/output71.csv"; 


// Get the first line; use silent option to skip all other lines 
// ie which have more than one column 
@file = 
    EXTRACT headerLine string 
    FROM @inputFilepath 
    USING Extractors.Text(delimiter : '|', silent : true); 


// Get the start date from the header 
@header = 
    SELECT 
     headerLine, 
     rx.Match(headerLine, @"FROM (?<startDate>\d{2}[A-Z]{3}\d{2}) TO (?<endDate>\d{2}[A-Z]{3}\d{2})").Groups["startDate"].ToString() AS startDate 
    FROM @file 
    WHERE headerLine.Contains("FROM"); 



// Get the rest of the lines; skip the header row explicity 
// don't use 'silent' as it should not be required (as we're skipping header row) 
@body = 
    EXTRACT runDate string, 
      col1 int, 
      col2 int, 
      col3 int 
    FROM @inputFilepath 
    USING Extractors.Text(delimiter : '|', skipFirstNRows : 1); 


@result = 
    SELECT h.startDate, p.* 
    FROM @header AS h 
     CROSS JOIN 
      @body AS p; 


// Export as csv 
OUTPUT @result 
TO @outputFilepath 
USING Outputters.Csv(quoting:false); 

Mes résultats:

Results

C'était Exemple simple pour démontrer la puissance de U-SQL, en jumelant RegEx avec les opérations d'ensemble. Voyez si quelque chose de semblable fonctionnera pour vous.