Chaîne fractionnée basée sur le nombre de virgules séparées dans SQL Server

Requiert une requête SQL Server. Ci-dessous l'exempleChaîne fractionnée basée sur le nombre de virgules séparées dans SQL Server

1) 19003 IH-10 West, San Antonio, 78257, TX, États-Unis

Dans le premier exemple, il y a quatre virgules dans la chaîne. Ainsi, la sortie devrait être

19003 IH-10 West

2) Chevron Pipe Line Company, 4800 Fournace Place, Bellaire, 77401-2324, TX, États-Unis

Dans le second exemple, il y a cinq virgules dans la chaîne. Ainsi, la sortie devrait être

Chevron Pipe Line Company, 4800 Fournace Place

S'il y a quatre virgules, la sortie devrait être une sous-chaîne de chaîne principale jusqu'à la 1ère virgule, et s'il y a cinq virgules, la sortie devrait être une sous-chaîne de chaîne principale jusqu'à la 2ème virgule.

En entrée, il n'y a que 4 ou 5 virgules.

Je peux répondre à cette exigence en utilisant un fichier UDF. Mais j'ai besoin d'une requête directe.

Source

2017-10-09 Vijay

Vous ne savez pas comment cela fonctionnerait. Je recommande fortement un meilleur moyen de séparer les adresses et de stocker en tant que champs séparés, il existe de nombreuses solutions d'analyse d'adresses disponibles.

Si ce n'est pas une option, ce qui suit fonctionnera mais peut nécessiter une modification en fonction des performances.

Drop Table #Temp 
Create Table #Temp (Field Varchar(8000)) 
Insert #Temp Values ('19003 IH-10 West, San Antonio, 78257, TX, United States') 
Insert #Temp Values ('Chevron Pipe Line Company, 4800 Fournace Place, Bellaire, 77401-2324, TX, United States') 

;With cteFindCommas As 
(
Select CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field)+1)+1)+1)+1) FifthCommaPos, 
     CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field)+1)+1)+1) FourthCommaPos, 
     CharIndex(',', Field) FirstCommaPos, 
     CharIndex(',', Field, CharIndex(',', Field)+1) SecondCommaPos, 
     * 
    From #Temp 
) 
Select Case When FifthCommaPos > 0 Then Substring(Field, 0, SecondCommaPos) 
      Else Substring(Field, 0, FirstCommaPos) End String,    
     * 
    From cteFindCommas

Source

2017-10-09 16:18:03

Ceci est une très bonne solution. Vous n'avez cependant pas besoin de calculer la position de chaque virgule seulement la chaîne. Vous avez seulement besoin de la 1ère position 2 avec un décompte des délimiteurs. Notez ma solution. –

Le nombre de virgules est assez soigné.Je n'ai pas le temps de tester, mais je me demande si cela ferait une différence de performance en utilisant remplacer vs charindex multiples et si remplacer est plus lent dans ce cas, à quel point pourrait-il pencher dans l'autre sens. –

J'ai ajouté un test de performance. Il semble que ce que vous avez mis ensemble soit plus rapide de ~ 15%. Bravo. –

Voilà comment je le ferais (note les commentaires dans mon code):

-- sample data 
declare @sometable table (someid int identity, someAddress varchar(200)); 
insert @sometable (someAddress) 
values ('19003 IH-10 West, San Antonio, 78257, TX, United States'), 
     ('Chevron Pipe Line Company, 4800 Fournace Place, Bellaire, 77401-2324, TX, US'); 

-- solution 
select 
    someid, 
    someAddress, 
    newAddress = case commaCount when 4 then C4 when 5 then C5 end -- conditions for 4 or 5 commas only 
from 
(
    select 
    someid, 
    someAddress, 
    commaCount = len(v.a) - len(replace(v.a,',','')), -- calculate the number of commas 
    C4 = substring(v.a, 1, charindex(',', v.a)-1), -- grab everything up to the 1st comma 
    C5 = substring(v.a, 1, charindex(',', v.a, charindex(',', v.a)+1)-1) -- everything up to 2nd comma 
    from @sometable t 
    cross apply (values (t.someAddress)) v(a) -- how I avoid repeated references to t.SomeAddress 
) formatStrings;

Résultats

someid someAddress            newAddress 
------- ------------------------------------------------------- ---------------------------------------------- 
1  19003 IH-10 West, San Antonio, 78257, TX, United States 19003 IH-10 West 
2  Chevron Pipe Line Company, 4800 Fournace Place...  Chevron Pipe Line Company, 4800 Fournace Place

MISE À JOUR - AJOUT DE TEST PERFORMANCE

Voici un test h Arness je mis ensemble; notez mes commentaires.

Données d'échantillons

-- #1: Create Sample Data 
------------------------------------------------------------------------------------------ 
declare @rows int = 100000; 
if object_id('tempdb..#base') is not null drop table #base; 
if object_id('tempdb..#address') is not null drop table #address; 

-- grabbed 50 random addresses from here: https://www.randomlists.com/random-addresses 
-- manually added the first comma (between street address and city) 
select 
    someId  = identity(int,1,1), 
    someAddress = stuff(addr,patindex('%'+replicate('[0-9]',5)+'%',addr),0,',') 
into #base from (values 
('886 Hartford Ave., Gwynn Oak, MD 21207'), 
('322 Wakehurst St., Deerfield, IL 60015'), 
('62 South Oak Valley St., Lorain, OH 44052'), 
('72 53rd St., New Bern, NC 28560'), 
('569 Swanson Ave., Snellville, GA 30039'), 
('15 Walnut St., New Bern, NC 28560'), 
('94 Kingston St., North Royalton, OH 44133'), 
('77 Rock Creek St., Ocean Springs, MS 39564'), 
('688 S. Bellevue St., Mableton, GA 30126'), 
('61 Queen Rd., Potomac, MD 20854'), 
('72 Jockey Hollow Drive, Elgin, IL 60120'), 
('777 School St., Clarksville, TN 37040'), 
('50 North 1st Street, Mount Prospect, IL 60056'), 
('8004 Valley Drive, Long Beach, NY 11561'), 
('8569 Franklin Court, Lakeland, FL 33801'), 
('837 Buckingham St., Newnan, GA 30263'), 
('46 Birch Hill St., Helena, MT 59601'), 
('617 E. Brookside Drive, Jersey City, NJ 07302'), 
('8133 Valley View St., Clearwater, FL 33756'), 
('42 South Ave., Greensburg, PA 15601'), 
('8782 Oak Meadow St., Helotes, TX 78023'), 
('35 Valley Farms Ave., Racine, WI 53402'), 
('7613 Cobblestone Road, Orlando, FL 32806'), 
('27 Broad Lane, Kaukauna, WI 54130'), 
('9213 Corona Dr., Rockville, MD 20850'), 
('7390 W. Bay Court, Mason, OH 45040'), 
('561 W. St Louis Ave., Silver Spring, MD 20901'), 
('7447 Evergreen Ave., Rocky Mount, NC 27804'), 
('24 NW. Pilgrim Road, Sun Prairie, WI 53590'), 
('846 E. Hall St., Lake Villa, IL 60046'), 
('919 Green Hill Street, New Orleans, LA 70115'), 
('532 Newbridge Lane, Hanover, PA 17331'), 
('3 E. Rose Rd., Waukegan, IL 60085'), 
('15 South Euclid Rd., Springfield Gardens, NY 11413'), 
('453 Mulberry Ave., Parlin, NJ 08859'), 
('8128 New Saddle Court, Fullerton, CA 92831'), 
('9143 Lafayette Ave., Jackson Heights, NY 11372'), 
('481 Edgewater St., Dacula, GA 30019'), 
('8243 Hilltop St., Camp Hill, PA 17011'), 
('70 Lookout St., Marlborough, MA 01752'), 
('9370 South Shirley Drive, King Of Prussia, PA 19406'), 
('8071 Plymouth Road, Huntersville, NC 28078'), 
('593 Charles St., Buckeye, AZ 85326'), 
('9092 Atlantic Ave., Yuma, AZ 85365'), 
('81 Longbranch Road, Ontario, CA 91762'), 
('868 Garfield St., New Lenox, IL 60451'), 
('8333 Kirkland Rd., Plainview, NY 11803'), 
('9714 Prospect Ave., Monroe Township, NJ 08831'), 
('7 N. Atlantic Ave., Reidsville, NC 27320'), 
('9283 Cherry Lane, Waukegan, IL 60085')) a(addr); 

-- index to support large sample data requests: 
create unique clustered index uq_cl_base on #base(someid); 

-- Create randomized addresses: up to 6,250,000 dummy rows (50^4) 
with r(x) as (select top(@rows/50) 1 from #base a, #base b, #base c, #base d), 
base(Field) as 
(
    select Field = 
    max(
     case itemNumber when 1 then substring(item,charindex(' ',item)+1, len(item)+1) end+ 
     case abs(checksum(newid())%10) 
     when 0 then ', Unit '+ cast(abs(checksum(newid())%100)+1 as varchar(3)) 
     when 1 then ', Suite '+ cast(abs(checksum(newid())%100)+1 as varchar(3)) 
     when 2 then ', Penthouse' else '' end)+','+ 
    max(case ItemNumber when 2 then item end)+','+ 
    max(case ItemNumber when 3 then left(item,3)+', ' end)+ 
    max(case ItemNumber when 4 then item end)+', United States' 
    from #base t 
    cross apply dbo.DelimitedSplit8K(t.someAddress,',') 
    group by t.someId 
) 
select addressId = identity(int,1,1), 
     field  = 
     left(cast(abs(checksum(newid())%10000)+1 as varchar(5)), 
      case checksum(newid())%3 when 4 then 4 when 1 then 1 else 3 end)+' '+b.Field 
into #address 
from base b 
cross join r 
order by newid(); 
go

test de performance

set nocount on; 
print 'solution 1'+char(10)+replicate('-',50); 
go 
declare @st datetime = getdate(), @field varchar(100); 
    ;With cteFindCommas As 
    (
    Select CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field)+1)+1)+1)+1) FifthCommaPos, 
      CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field, CharIndex(',', Field)+1)+1)+1) FourthCommaPos, 
      CharIndex(',', Field) FirstCommaPos, 
      CharIndex(',', Field, CharIndex(',', Field)+1) SecondCommaPos, * 
    From #Address 
) 
    select @field = case when FifthCommaPos > 0 Then Substring(Field, 0, SecondCommaPos) 
        else Substring(Field, 0, FirstCommaPos) end 
    From cteFindCommas; 
print datediff(ms,@st,getdate()); 
go 5 

print 'solution 2'+char(10)+replicate('-',50); 
go 
declare @st datetime = getdate(), @field varchar(100); 
    select @field = case commaCount when 4 then C4 when 5 then C5 end -- conditions for 4 or 5 commas only 
    from 
    (
    select 
     addressId, 
     field, 
     commaCount = len(v.a) - len(replace(v.a,',','')), -- calculate the number of commas 
     C4 = substring(v.a, 1, charindex(',', v.a)-1), -- grab everything up to the 1st comma 
     C5 = substring(v.a, 1, charindex(',', v.a, charindex(',', v.a)+1)-1) -- everything up to 2nd comma 
    from #address t cross apply (values (t.field)) v(a) 
) formatStrings; 
print datediff(ms,@st,getdate()); 
go 5

de résultats (100 000 d'essai de ligne)

solution 1 
-------------------------------------------------- 
Beginning execution loop 
133 
133 
130 
126 
126 
Batch execution completed 5 times. 

solution 2 
-------------------------------------------------- 
Beginning execution loop 
156 
160 
156 
157 
153 
Batch execution completed 5 times.

Il semble que la solution de Joe C (solution # 1) ait été plus rapide.

Source

2017-10-09 17:08:25

Chaîne fractionnée basée sur le nombre de virgules séparées dans SQL Server

Répondre

Questions connexes