0

J'ai configuré un pipeline de données AWS pour importer des données de journal CSV de S3 vers un cluster Redshift.Données du journal de copie AWS Data Pipeline de S3 vers Redshift

Ma table de base de données de Redshift a la structure suivante:

CREATE TABLE access_log 
(
    id bigint identity(1, 1), 
    host character varying(64), 
    cf_host character varying(64), 
    xff_host character varying(64), 
    event_time timestamp, 
    method character varying(16), 
    url text, 
    response_code integer, 
    referer text, 
    user_agent text, 
    device_id character varying(40), 
    primary key(id) 
) 
sortkey(id); 

Voici un extrait de mes données journal CSV:

"172.20.2.224", "nul", "nul", "2016 -03-16 00:01:28 "," GET ","/"," 302 "," null "," null " " 172.20.2.224 "," null "," null "," 2016-03- 16 00:01:33 "," GET ","/"," 200 "," null "," null " " 172.20.2.224 "," null "," null "," 2016-03-16 00: 11:28 "," GET ","/"," 302 "," null "," null " " 172.20.2.224 "," null "," null "," 2016-03-16 00:11:33 "," GET ","/"," 200 "," null "," null " "172.20.2.224", "null", "null", "2016-03-16 00:21:28", "GET", "/", "302", "null", "null" "172,20 .2.224 "," null "," null "," 2016-03-16 00:21:33 "," GET ","/"," 200 "," null "," null "

De SQLWorkbenchJ si j'utilise la commande copie suivante tout fonctionne bien:

copy access_log 
from 's3://mylogrepo' 
credentials 
'aws_access_key_id=myaccesskey;aws_secret_access_key=myaccesskeysecret' 
DELIMITER ',' 
REMOVEQUOTES 
TIMEFORMAT 'YYYY-MM-DD HH:MI:SS' 

Mais quand Redshift copie activité court, je reçois l'erreur suivante:

[Amazon](500310) Invalid operation: cannot set an identity column to a value; 

ce que je trouve intéressant est cette ligne de la trace de la pile d'erreur:

private.com.amazonaws.services.datapipeline.redshift.QueryStatementException: Exception Amazon Invalid operation: cannot set an identity column to a value; while executing START TRANSACTION; INSERT INTO public.access_log SELECT s.* FROM staging s LEFT JOIN public.access_log t ON s."id" = t."id" WHERE t."id" IS NULL; COMMIT; at private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatement.(RedshiftQueryStatement.java:43) at private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatementFactory.newQueryStatement(RedshiftQueryStatementFactory.java:9) at ... private.com.amazonaws.services.datapipeline.redshift.SqlHelper.prepareStatement(SqlHelper.java:84) at $TaskRunner.run(HeartbeatingTaskRunner.java:34) ... 1 more Caused by: java.sql.SQLException: Amazon Invalid operation: cannot set an identity column to a value; at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)

Est-il possible que l'adresse IP de mes données CSV soit interprétée comme un identifiant de colonne?

Merci! Je suppose que vous avez un mappage de colonne s3 invalide.

Répondre

0

Pouvez-vous partager votre mappage de colonnes?

+1

J'ai oublié d'exporter les informations de device_id sur mes données CSV. La commande de copie a donc tenté de mapper 9 colonnes des données CSV à une table de 10 colonnes. Après avoir ajouté l'information device_id à mes données CSV, cela semble fonctionner. Je vous remercie! –