2017-10-18 14 views
0

Comment importer des données de la base de données Oracle en utilisant spark to dataframe ou rdd, puis écrire ces données dans une table de ruche?Données de Tranfser d'oracle à ruche en utilisant Spark

J'ai le même code:

public static void main(String[] args) { 

    SparkConf conf = new SparkConf().setAppName("Data transfer test (Oracle -> Hive)").setMaster("local"); 
    JavaSparkContext sc = new JavaSparkContext(conf); 
    SQLContext sqlContext = new SQLContext(sc); 

    HashMap<String, String> options = new HashMap<>(); 
    options.put("url", "jdbc:oracle:thin:@<ip>:<port>:orcl"); 
    options.put("dbtable", "ACCOUNTS"); 
    options.put("user", "username"); 
    options.put("password", "12345"); 
    options.put("driver", "oracle.jdbc.OracleDriver"); 
    options.put("numPartitions", "4"); 

    DataFrame oracleDataFrame = sqlContext.read() 
       .format("jdbc") 
       .options(options) 
       .load(); 

} 

si je crée une instance de HiveContext utiliser ruche

HiveContext hiveContext = new HiveContext(sc); 

Je suis la même erreur:

ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser [email protected]:java.lang                      .UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFacto                      ry 
java.lang.UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBui                      lderFactory 
     at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:614) 
     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2534) 
     at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2503) 
     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2409) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1144) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1116) 
     at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525) 
     at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543) 
     at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:437) 
     at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2750) 
     at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:2713) 
     at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:185) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239) 
     at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:443) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) 
     at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) 
     at scala.collection.AbstractIterable.foreach(Iterable.scala:54) 
     at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:271) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:103) 
     at replicator.ImportFromOracleToHive.init(ImportFromOracleToHive.java:52) 
     at replicator.ImportFromOracleToHive.main(ImportFromOracleToHive.java:76) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

Répondre

0

La question serait semble être un problème avec une dépendance Xerces obsolète, comme detailed in this question. Je suppose que vous avez en quelque sorte tiré cela en transit, mais il est impossible de le dire sans voir votre pom.xml. Vous remarquerez à partir de la trace de pile que vous avez signalée que l'erreur provient de l'objet Hadoop-Common Configuration, et non de Spark lui-même. La solution est de vous assurer que vous utilisez une nouvelle version.

<dependency> 
    <groupId>xerces</groupId> 
    <artifactId>xercesImpl</artifactId> 
    <version>2.11.0</version> 
</dependency>