J'essaie d'exécuter Gobblin sur Google Dataproc mais je reçois ce NoSuchMethodError et ne parviens pas à résoudre le problème.NoSuchMethodError lors de la tentative d'exécution de Gobblin sur Dataproc
Waiting for job output...
...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
Caused by: java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder()Lorg/apache/commons/cli/Option$Builder;
at gobblin.runtime.cli.CliOption
...
Ce même travail (contenu ci-dessous) fonctionne bien sur ma configuration Hadoop locale (sur mon ordinateur portable), mais ne pas Dataproc. Est-ce que quelqu'un a déjà essayé d'exécuter Gobblin sur Dataproc?
Voici mon fichier de travail de gloutonnerie:
job.name=kafka2gcs
job.group=gkafka2gcs
job.description=Gobblin job to read messages from Kafka and save as is on GCS
job.lock.enabled=false
kafka.brokers=mykafka:9092
topic.whitelist=mytopic
bootstrap.with.offset=earliest
source.class=gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
kafka.deserializer.type=BYTE_ARRAY
extract.namespace=nskafka2gcs
writer.builder.class=gobblin.writer.SimpleDataWriterBuilder
writer.destination.type=HDFS
mr.job.max.mappers=2
writer.output.format=txt
data.publisher.type=gobblin.publisher.BaseDataPublisher
metrics.enabled=false
fs.uri=file:///.
writer.fs.uri=${fs.uri}
mr.job.root.dir=gobblin
writer.output.dir=${mr.job.root.dir}/out
writer.staging.dir=${mr.job.root.dir}/stg
fs.gs.project.id=my-test-project
data.publisher.fs.uri=gs://my-bucket
state.store.fs.uri=${data.publisher.fs.uri}
data.publisher.final.dir=gobblin/pub
state.store.dir=gobblin/state
Et ce sont les commandes que j'ENJEU Dataproc:
gcloud dataproc clusters create myspark \
--image-version 1.1 \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 10
gcloud dataproc jobs submit hadoop --cluster=myspark \
--class gobblin.runtime.mapreduce.CliMRJobLauncher \
--jars /opt/gobblin-dist/lib/gobblin-runtime-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-api-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-avro-json-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-codecs-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-core-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-crypto-provider-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-data-management-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metastore-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metrics-base-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-metadata-0.10.0.jar,/opt/gobblin-dist/lib/gobblin-utility-0.10.0.jar,/opt/gobblin-dist/lib/avro-1.8.1.jar,/opt/gobblin-dist/lib/avro-mapred-1.8.1.jar,/opt/gobblin-dist/lib/commons-lang3-3.4.jar,/opt/gobblin-dist/lib/config-1.2.1.jar,/opt/gobblin-dist/lib/data-2.6.0.jar,/opt/gobblin-dist/lib/gson-2.6.2.jar,/opt/gobblin-dist/lib/guava-15.0.jar,/opt/gobblin-dist/lib/guava-retrying-2.0.0.jar,/opt/gobblin-dist/lib/joda-time-2.9.3.jar,/opt/gobblin-dist/lib/javassist-3.18.2-GA.jar,/opt/gobblin-dist/lib/kafka_2.11-0.8.2.2.jar,/opt/gobblin-dist/lib/kafka-clients-0.8.2.2.jar,/opt/gobblin-dist/lib/metrics-core-2.2.0.jar,/opt/gobblin-dist/lib/metrics-core-3.1.0.jar,/opt/gobblin-dist/lib/metrics-graphite-3.1.0.jar,/opt/gobblin-dist/lib/scala-library-2.11.8.jar,/opt/gobblin-dist/lib/influxdb-java-2.1.jar,/opt/gobblin-dist/lib/okhttp-2.4.0.jar,/opt/gobblin-dist/lib/okio-1.4.0.jar,/opt/gobblin-dist/lib/retrofit-1.9.0.jar,/opt/gobblin-dist/lib/reflections-0.9.10.jar \
--properties mapreduce.job.user.classpath.first=true \
-- -jobconfig gs://my-bucket/gobblin-kafka-gcs.job
Je l'ai déjà essayé de copier tous les gobblins pots lib dans /usr/lib/hadoop/lib
sur toutes les machines du cluster dataproc, mais cela n'a pas fonctionné non plus.
Des idées?
gobblin 0.10.0
hadoop 2.7.3
dataproc image 1.1
Merci pour la réponse détaillée. J'ai été capable de passer cette erreur en supprimant commons-cli-1.2 et quelques autres jars du chemin et en remplaçant par des gobblin spécifiques. Mais je ne suis toujours pas capable de l'exécuter avec succès sur dataproc :-(J'essaie le [groupe gobblin-users] (https://groups.google.com/d/msg/gobblin-users/YJv49jvJrtI/SoNFarAiBgAJ) maintenant –