2017-03-15 2 views
0

J'ai une carte seule emploi Hadoop, qui jette plusieurs exceptions IO au cours de son travail:java IOException: Write impasse pendant un travail Hadoop

1) java.io.IOException: Ecrire impasse

2) java.io.IOException: Tuyau fermé

Il parvient à terminer son travail, mais il y a des exceptions qui m'inquiètent. Y a-t-il quelque chose que je fasse mal?

Pratiquement le même travail travaille quotidiennement sur un autre ensemble de données qui est 20 fois plus petit, et aucune exception n'est levée. Les travaux sont exécutés par Google DataProc.

Le fichier de configuration J'utilise:

#!/bin/bash 
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \ 
-D mapreduce.output.fileoutputformat.compress=true \ 
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec \ 
-D mapreduce.job.reduces=0 \ 
-D mapreduce.input.fileinputformat.split.maxsize=1500000000 \ 
-D mapreduce.map.failures.maxpercent=1 \ 
-D mapreduce.fileoutputcommitter.algorithm.version=2 \ 
-D mapreduce.task.timeout=900000 \ 
-D mapreduce.map.memory.mb=2048 \ 
-file mymapper.py \ 
-input gs://input_folder/* \ 
-output gs://output_folder/$1 \ 
-mapper mymapper.py \ 
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \ 
-inputformat org.apache.hadoop.mapred.lib.CombineTextInputFormat 

Voici un journal d'erreur:

17/03/15 09:53:30 INFO mapreduce.Job: Running job: 
job_1489571529338_0001 
17/03/15 09:53:37 INFO mapreduce.Job: Job job_1489571529338_0001 running in uber mode : false 
17/03/15 09:53:37 INFO mapreduce.Job: map 0% reduce 0% 
17/03/15 09:56:58 INFO mapreduce.Job: map 1% reduce 0% 
17/03/15 10:00:16 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_000744_0, Status : FAILED 
Error: java.io.IOException: java.io.IOException: Write end dead 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432) 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:256) 
    at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58) 
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) 
    at java.nio.channels.Channels.writeFully(Channels.java:101) 
    at java.nio.channels.Channels.access$000(Channels.java:61) 
    at java.nio.channels.Channels$1.write(Channels.java:174) 
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126) 
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) 
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
    at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) 
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
    Suppressed: java.io.IOException: java.io.IOException: Write end dead 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287) 
     at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68) 
     at java.nio.channels.Channels$1.close(Channels.java:178) 
     at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
     ... 14 more 
    Caused by: java.io.IOException: Write end dead 
     at java.io.PipedInputStream.read(PipedInputStream.java:310) 
     at java.io.PipedInputStream.read(PipedInputStream.java:377) 
     at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
    [CIRCULAR REFERENCE:java.io.IOException: Write end dead] 

Container killed by the ApplicationMaster. 
Container killed on request. Exit code is 143 
Container exited with a non-zero exit code 143 

17/03/15 10:01:06 INFO mapreduce.Job: map 2% reduce 0% 
17/03/15 10:02:46 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_001089_0, Status : FAILED 
Error: java.io.IOException: Pipe closed 
    at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260) 
    at java.io.PipedInputStream.receive(PipedInputStream.java:226) 
    at java.io.PipedOutputStream.write(PipedOutputStream.java:149) 
    at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:458) 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:259) 
    at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58) 
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) 
    at java.nio.channels.Channels.writeFully(Channels.java:101) 
    at java.nio.channels.Channels.access$000(Channels.java:61) 
    at java.nio.channels.Channels$1.write(Channels.java:174) 
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126) 
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) 
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
    at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) 
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
    Suppressed: java.io.IOException: java.io.IOException: Write end dead 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287) 
     at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68) 
     at java.nio.channels.Channels$1.close(Channels.java:178) 
     at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
     ... 14 more 
    Caused by: java.io.IOException: Write end dead 
     at java.io.PipedInputStream.read(PipedInputStream.java:310) 
     at java.io.PipedInputStream.read(PipedInputStream.java:377) 
     at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 

Container killed by the ApplicationMaster. 
Container killed on request. Exit code is 143 
Container exited with a non-zero exit code 143 

17/03/15 10:03:35 INFO mapreduce.Job: Task Id : attempt_1489571529338_0001_m_001217_0, Status : FAILED 
Error: java.io.IOException: Pipe closed 
    at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260) 
    at java.io.PipedInputStream.receive(PipedInputStream.java:226) 
    at java.io.PipedOutputStream.write(PipedOutputStream.java:149) 
    at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:458) 
    at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.write(AbstractGoogleAsyncWriteChannel.java:259) 
    at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.write(CacheSupplementedGoogleCloudStorage.java:58) 
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) 
    at java.nio.channels.Channels.writeFully(Channels.java:101) 
    at java.nio.channels.Channels.access$000(Channels.java:61) 
    at java.nio.channels.Channels$1.write(Channels.java:174) 
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:158) 
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126) 
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) 
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
    at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109) 
    at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
    at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) 
    at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
    Suppressed: java.io.IOException: java.io.IOException: Write end dead 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287) 
     at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloudStorage.java:68) 
     at java.nio.channels.Channels$1.close(Channels.java:178) 
     at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
     ... 14 more 
    Caused by: java.io.IOException: Write end dead 
     at java.io.PipedInputStream.read(PipedInputStream.java:310) 
     at java.io.PipedInputStream.read(PipedInputStream.java:377) 
     at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentRequest(MediaHttpUploader.java:629) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:409) 
     at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) 
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:358) 
     at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 

Container killed by the ApplicationMaster. 
Container killed on request. Exit code is 143 
Container exited with a non-zero exit code 143 

17/03/15 10:04:51 INFO mapreduce.Job: map 3% reduce 0% 
17/03/15 10:08:34 INFO mapreduce.Job: map 4% reduce 0% 
17/03/15 10:12:12 INFO mapreduce.Job: map 5% reduce 0% 

UPD.

Maintenant, il est livré avec Backend Erreur:

Error: java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 
{ 
    "code" : 500, 
    "errors" : [ { 
    "domain" : "global", 
    "message" : "Backend Error", 
    "reason" : "backendError" 
    } ], 
    "message" : "Backend Error" 
} 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWrit 
eChannel.java:432) 
     at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287) 
     at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage$WritableByteChannelImpl.close(CacheSupplementedGoogleCloud 
Storage.java:68) 
     at java.nio.channels.Channels$1.close(Channels.java:178) 
     at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
     at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:126) 
     at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) 
     at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
     at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:109) 
     at java.io.FilterOutputStream.close(FilterOutputStream.java:159) 
     at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108) 
     at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:844) 
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) 
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone 
{ 
    "code" : 500, 
    "errors" : [ { 
    "domain" : "global", 
    "message" : "Backend Error", 
    "reason" : "backendError" 
    } ], 
    "message" : "Backend Error" 
} 

Répondre

1

Habituellement Write end dead signifie un fil d'auteur n'a pas close() le flux de sortie avant de sortir, mais si ça se passe quelque chose dans le cadre sous-jacent plutôt que tout type de créé manuellement canal d'écriture, il s'agit probablement du résultat d'une défaillance transitoire qui a provoqué l'échec d'une tâche unique pour d'autres raisons, puis le message Write end dead est simplement un autre symptôme de l'échec.

Dans votre cas, l'erreur 410 Gone est un mode de défaillance transitoire connu de GCS qui n'est pas récupérable dans le même flux (les erreurs récupérables sont automatiquement réessayées silencieusement sous le capot). Mais ce n'est qu'une seule tâche échouée, et Hadoop s'assure que les tâches échouées seront retentées de bout en bout pour le travail, et seulement si la même tâche échoue trop souvent le travail global échouera. Donc, en général, cela signifie que tant que votre travail global se termine avec succès, toutes vos données ont été traitées correctement; Les échecs d'une seule tâche peuvent simplement être traités comme des avertissements.

+0

Eh bien, à partir d'un certain moment, mes tâches ne parviennent pas à se terminer avec succès en raison de l'erreur «410», donc je suis coincé avec des données non traitées. Est-ce que 'Write and dead' est causé par l'un de mes scripts ou config Hadoop, ou est-ce juste une erreur de backend? Y at-il quelque chose que je peux faire pour trouver une solution de contournement de cette situation? Quelque chose pour éviter l'erreur '410' ou l'avertissement' Write and dead'? –

+0

Les travaux échouent donc complètement? Je pensais que tu avais dit "ça réussit à finir son travail, mais là des exceptions me font m'inquiéter"; si le travail se termine, les exceptions signifient simplement que les tâches individuelles doivent être réessayées, mais il n'y a pas de données non traitées à la fin. –