Nous sommes aux prises avec le flux de données de Kafka à HDFS géré par Flume. Les données ne sont pas entièrement transportées vers hdfs, en raison des exceptions décrites ci-dessous. Cependant, cette erreur semble trompeuse pour nous, nous avons assez d'espace à la fois dans le répertoire de données et dans hdfs. Nous pensons que cela pourrait être le problème avec la configuration des canaux, mais nous avons une configuration similaire pour les autres sources et cela fonctionne correctement pour eux. Si quelqu'un devait faire face à ce problème, je serais reconnaissant pour les conseils.Flume pas assez d'espace alors que les données passent de Kafka à HDFS
17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204) - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
at org.apache.flume.channel.file.Log.access$200(Log.java:75)
at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305) - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
at org.apache.flume.channel.file.Log.rollback(Log.java:722)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
... 6 more
configuration Flume:
agent2.sources = kafkaSource
#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1
# channels defined
agent2.channels = channel1
agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000
#hdfs sinks
agent2.sinks = sink
agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-
commande df -h:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 26G 6.8G 18G 28%/
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 6.3M 126G 1% /dev/shm
tmpfs 126G 2.9G 123G 3% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda1 477M 133M 315M 30% /boot
tmpfs 26G 0 26G 0% /run/user/0
cm_processes 126G 1.9G 124G 2% /run/cloudera-scm-agent/process
/dev/scinib 2.0T 53G 1.9T 3% /data
tmpfs 26G 20K 26G 1% /run/user/2000
Et si vous utilisiez un canal Kafka ou Mémoire? –