2017-05-18 3 views
0

En BigQuery, j'ai créé une table avec le schéma ci-dessousComment faire groupe par le champ répété dans BigQuery

id     INTEGER NULLABLE  
visits    INTEGER NULLABLE  
dimensions   RECORD REPEATED  
dimensions.value STRING 
dimensions.key  STRING 

Comment obtenir une somme (visites) en regroupant les périphériques et les valeurs étatiques?

données de l'exemple:

{"id": 1, visits: 100, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"CA"}]} 
{"id": 1, visits: 500, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"CA"}]} 
{"id": 1, visits: 200, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"NY"}]} 
{"id": 2, visits: 100, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"CA"}]} 
{"id": 2, visits: 500, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"CA"}]} 
{"id": 2, visits: 200, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"NY"}]} 
{"id": 2, visits: 780, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"NY"}]} 

Je veux id, dispositif, état, somme (visites) dans la sortie.

Je pourrais faire un groupe en utilisant une seule dimension avec la requête ci-dessous, mais je ne sais pas comment le faire pour plusieurs dimensions.

SELECT id,d.value, sum(visits) FROM dataset.tabe_name,UNNEST(dimensions) as d where d.key = "device" group by id, d.value LIMIT 1000 

Et est-il également possible d'écrire une requête générique lorsque les valeurs clés ne sont pas connues à l'avance?

Répondre

1

est ci-dessous pour BigQuery standard SQL

#standardSQL 
SELECT 
    id, 
    (SELECT value FROM UNNEST(dimensions) WHERE key = "device") AS device, 
    (SELECT value FROM UNNEST(dimensions) WHERE key = "state") AS state, 
    SUM(visits) AS visits 
FROM `dataset.tabe_name` 
GROUP BY id, device, state 
LIMIT 1000 

Vous pouvez essayer/jouer avec des données factices de votre exemple ci-dessous

#standardSQL 
WITH data AS (
    SELECT 1 AS id, 100 AS visits, ARRAY<STRUCT<key STRING, value STRING>>[("device", "mobile"), ("state", "CA")] AS dimensions UNION ALL 
    SELECT 1, 100, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "CA")] UNION ALL 
    SELECT 1, 500, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "CA")] UNION ALL 
    SELECT 1, 200, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "NY")] UNION ALL 
    SELECT 2, 100, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "CA")] UNION ALL 
    SELECT 2, 500, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "CA")] UNION ALL 
    SELECT 2, 200, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "NY")] UNION ALL 
    SELECT 2, 780, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "NY")] 
) 
SELECT 
    id, 
    (SELECT value FROM UNNEST(dimensions) WHERE key = "device") AS device, 
    (SELECT value FROM UNNEST(dimensions) WHERE key = "state") AS state, 
    SUM(visits) AS visits 
FROM data 
GROUP BY id, device, state 
-- ORDER BY id, device, state