Je voudrais changer s3distcp
et d'autres commandes hadoop pour n'enregistrer que WARN
messages ou pire, alors que actuellement il enregistre INFO
et pire.Configurer la journalisation sur AWS EMR pour s3distcp
Comment puis-je configurer ceci sur le nœud principal d'un cluster AWS EMR?
Voici un exemple de la sortie que je suis en train de cacher:
$ hadoop jar ~hadoop/lib/emr-s3distcp-1.0.jar --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
6/06/01 17:18:03 INFO s3distcp.S3DistCp: Running with args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:03 INFO s3distcp.S3DistCp: S3DistCp args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/output'
16/06/01 17:18:06 INFO s3distcp.S3DistCp: GET http://x.x.x.x/latest/meta-data/placement/availability-zone result: us-east-1b
16/06/01 17:18:06 INFO s3distcp.FileInfoListing: Opening new file: hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/files/1
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Created 1 files to copy 88 files
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Reducer number: 15
16/06/01 17:18:06 INFO client.RMProxy: Connecting to ResourceManager at /x.x.x.x:9022
16/06/01 17:18:07 INFO input.FileInputFormat: Total input paths to process : 1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464201102672_0019
16/06/01 17:18:07 INFO impl.YarnClientImpl: Submitted application application_1464201102672_0019
16/06/01 17:18:07 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:9046/proxy/application_1464201102672_0019/
16/06/01 17:18:07 INFO mapreduce.Job: Running job: job_1464201102672_0019
16/06/01 17:18:13 INFO mapreduce.Job: Job job_1464201102672_0019 running in uber mode : false
16/06/01 17:18:13 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 17:18:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 17:18:30 INFO mapreduce.Job: map 100% reduce 5%
16/06/01 17:18:31 INFO mapreduce.Job: map 100% reduce 10%
16/06/01 17:18:32 INFO mapreduce.Job: map 100% reduce 22%
16/06/01 17:18:33 INFO mapreduce.Job: map 100% reduce 23%
16/06/01 17:18:34 INFO mapreduce.Job: map 100% reduce 33%
16/06/01 17:18:35 INFO mapreduce.Job: map 100% reduce 40%
16/06/01 17:18:36 INFO mapreduce.Job: map 100% reduce 50%
16/06/01 17:18:37 INFO mapreduce.Job: map 100% reduce 57%
16/06/01 17:18:38 INFO mapreduce.Job: map 100% reduce 77%
16/06/01 17:18:39 INFO mapreduce.Job: map 100% reduce 85%
16/06/01 17:18:40 INFO mapreduce.Job: map 100% reduce 90%
16/06/01 17:18:41 INFO mapreduce.Job: map 100% reduce 95%
16/06/01 17:18:42 INFO mapreduce.Job: map 100% reduce 98%
16/06/01 17:18:43 INFO mapreduce.Job: map 100% reduce 100%
16/06/01 17:18:43 INFO mapreduce.Job: Job job_1464201102672_0019 completed successfully
16/06/01 17:18:43 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=5447
FILE: Number of bytes written=1640535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=113570708
HDFS: Number of bytes written=56776676
HDFS: Number of read operations=401
HDFS: Number of large read operations=0
HDFS: Number of write operations=206
S3: Number of bytes read=0
S3: Number of bytes written=0
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=15
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=166005
Total time spent by all reduces in occupied slots (ms)=18351000
Total time spent by all map tasks (ms)=3689
Total time spent by all reduce tasks (ms)=203900
Total vcore-seconds taken by all map tasks=3689
Total vcore-seconds taken by all reduce tasks=203900
Total megabyte-seconds taken by all map tasks=5312160
Total megabyte-seconds taken by all reduce tasks=587232000
Map-Reduce Framework
Map input records=88
Map output records=88
Map output bytes=20500
Map output materialized bytes=5387
Input split bytes=138
Combine input records=0
Combine output records=0
Reduce input groups=88
Reduce shuffle bytes=5387
Reduce input records=88
Reduce output records=0
Spilled Records=176
Shuffled Maps =15
Failed Shuffles=0
Merged Map outputs=15
GC time elapsed (ms)=2658
CPU time spent (ms)=98620
Physical memory (bytes) snapshot=5777489920
Virtual memory (bytes) snapshot=50741022720
Total committed heap usage (bytes)=9051308032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=17218
File Output Format Counters
Bytes Written=0
16/06/01 17:18:43 INFO s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/tempspace