2017-06-03 1 views
0

Je reçois cette erreur lorsque j'essaie d'exécuter un script Pysparkling sur un cluster AWS EMR. Je peux tout faire fonctionner en téléchargeant l'eau pétillante 2.1.8 et en l'exécutant à partir d'une coquille pysparkling. Cependant, spark-submit ne semble pas fonctionner.Erreur avec H20Context exécutant PySparkling avec Spark 2.1

Erreur:

NameError: name 'H2OContext' is not defined 

Mon étincelle soumettre:

spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py 

fichier Python

from pysparkling import * 

hc = H2OContext.getOrCreate(sc) 

Aussi, j'ai essayé de faire réellement un contexte d'étincelle, mais que les résultats juste la même erreur, mais prend plus de temps.

fichier Bootstrap:

#!/usr/bin/env bash 


# install conda (conda 4.2 defaults to pythone35) 
wget --quiet http://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh -O ~/anaconda.sh \ 
    && /bin/bash ~/anaconda.sh -b -p $HOME/conda 

echo -e '\nexport PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc 

# install packages 
conda install -y ipython jupyter 

# needed for PySparkling 
conda install requests 
conda install six 
conda install future 
conda install tabulate 

# install pysparkling 
pip install h2o 
# pip install pysparkling 
pip install h2o_pysparkling_2.1 

sortie plus détaillée:

[[email protected] test]$ spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py 
Ivy Default Cache set to: /home/hadoop/.ivy2/cache 
The jars for the packages stored in: /home/hadoop/.ivy2/jars 
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml 
ai.h2o#sparkling-water-core_2.11 added as a dependency 
ai.h2o#sparkling-water-examples_2.11 added as a dependency 
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 
     confs: [default] 
     found ai.h2o#sparkling-water-core_2.11;2.1.7 in central 
     found ai.h2o#h2o-genmodel;3.10.4.7 in central 
     found net.sf.opencsv#opencsv;2.3 in central 
     found ai.h2o#deepwater-backend-api;1.0.2 in central 
     found com.google.guava#guava;19.0 in central 
     found ai.h2o#h2o-core;3.10.4.7 in central 
     found joda-time#joda-time;2.3 in central 
     found gov.nist.math#jama;1.0.3 in central 
     found org.javassist#javassist;3.18.2-GA in central 
     found org.apache.commons#commons-math3;3.3 in central 
     found commons-io#commons-io;2.4 in central 
     found ai.h2o#google-analytics-java;1.1.2-H2O-CUSTOM in central 
     found org.apache.httpcomponents#httpclient;4.1 in central 
     found org.apache.httpcomponents#httpcore;4.1 in central 
     found commons-logging#commons-logging;1.1.1 in central 
     found commons-codec#commons-codec;1.4 in central 
     found org.eclipse.jetty.aggregate#jetty-servlet;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-plus;8.1.17.v20150415 in central 
     found org.eclipse.jetty.orbit#javax.transaction;1.1.1.v201105210645 in central 
     found org.eclipse.jetty#jetty-webapp;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-xml;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-util;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-servlet;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-security;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-server;8.1.17.v20150415 in central 
     found org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016 in central 
     found org.eclipse.jetty#jetty-continuation;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-http;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-io;8.1.17.v20150415 in central 
     found org.eclipse.jetty#jetty-jndi;8.1.17.v20150415 in central 
     found org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020 in central 
     found org.eclipse.jetty.orbit#javax.activation;1.1.0.v201105071233 in central 
     found com.github.rwl#jtransforms;2.4.0 in central 
     found ai.h2o#h2o-jaas-pam;3.10.4.7 in central 
     found org.kohsuke#libpam4j;1.8 in central 
     found net.java.dev.jna#jna;4.0.0 in central 
     found log4j#log4j;1.2.15 in central 
     found com.google.code.gson#gson;2.3.1 in central 
     found commons-lang#commons-lang;2.6 in central 
     found ai.h2o#reflections;0.9.11-h2o-custom in central 
     found com.google.code.findbugs#jsr305;3.0.0 in central 
     found ai.h2o#h2o-algos;3.10.4.7 in central 
     found ai.h2o#h2o-web;3.10.4.7 in central 
     found ai.h2o#h2o-avro-parser;3.10.4.7 in central 
     found ai.h2o#h2o-parquet-parser;3.10.4.7 in central 
     found ai.h2o#h2o-orc-parser;3.10.4.7 in central 
     found ai.h2o#h2o-scala_2.11;3.10.4.7 in central 
     found ai.h2o#h2o-persist-hdfs;3.10.4.7 in central 
     found net.java.dev.jets3t#jets3t;0.6.1 in central 
     found commons-httpclient#commons-httpclient;3.1 in central 
     found ai.h2o#h2o-persist-s3;3.10.4.7 in central 
     found com.amazonaws#aws-java-sdk-s3;1.10.47 in central 
     found com.amazonaws#aws-java-sdk-kms;1.10.47 in central 
     found com.amazonaws#aws-java-sdk-core;1.10.47 in central 
     found commons-logging#commons-logging;1.1.3 in central 
     found org.apache.httpcomponents#httpclient;4.3.6 in central 
     found org.apache.httpcomponents#httpcore;4.3.3 in central 
     found commons-codec#commons-codec;1.6 in central 
     found joda-time#joda-time;2.8.1 in central 
     found ai.h2o#sparkling-water-repl_2.11;2.1.7 in central 
     found org.joda#joda-convert;1.7 in central 
     found ai.h2o#sparkling-water-examples_2.11;2.1.7 in central 
     found ai.h2o#sparkling-water-ml_2.11;2.1.7 in central 
downloading https://repo1.maven.org/maven2/ai/h2o/sparkling-water-core_2.11/2.1.7/sparkling-water-core_2.11-2.1.7.jar ... 
     [SUCCESSFUL ] ai.h2o#sparkling-water-core_2.11;2.1.7!sparkling-water-core_2.11.jar (56ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/sparkling-water-examples_2.11/2.1.7/sparkling-water-examples_2.11-2.1.7.jar ... 
     [SUCCESSFUL ] ai.h2o#sparkling-water-examples_2.11;2.1.7!sparkling-water-examples_2.11.jar (15ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-genmodel/3.10.4.7/h2o-genmodel-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-genmodel;3.10.4.7!h2o-genmodel.jar (7ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-core/3.10.4.7/h2o-core-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-core;3.10.4.7!h2o-core.jar (129ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-algos/3.10.4.7/h2o-algos-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-algos;3.10.4.7!h2o-algos.jar (35ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-web/3.10.4.7/h2o-web-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-web;3.10.4.7!h2o-web.jar (512ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-scala_2.11/3.10.4.7/h2o-scala_2.11-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-scala_2.11;3.10.4.7!h2o-scala_2.11.jar (4ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-persist-hdfs/3.10.4.7/h2o-persist-hdfs-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-persist-hdfs;3.10.4.7!h2o-persist-hdfs.jar (2ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-persist-s3/3.10.4.7/h2o-persist-s3-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-persist-s3;3.10.4.7!h2o-persist-s3.jar (2ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/sparkling-water-repl_2.11/2.1.7/sparkling-water-repl_2.11-2.1.7.jar ... 
     [SUCCESSFUL ] ai.h2o#sparkling-water-repl_2.11;2.1.7!sparkling-water-repl_2.11.jar (4ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/h2o-jaas-pam/3.10.4.7/h2o-jaas-pam-3.10.4.7.jar ... 
     [SUCCESSFUL ] ai.h2o#h2o-jaas-pam;3.10.4.7!h2o-jaas-pam.jar (2ms) 
downloading https://repo1.maven.org/maven2/ai/h2o/sparkling-water-ml_2.11/2.1.7/sparkling-water-ml_2.11-2.1.7.jar ... 
     [SUCCESSFUL ] ai.h2o#sparkling-water-ml_2.11;2.1.7!sparkling-water-ml_2.11.jar (10ms) 
:: resolution report :: resolve 6024ms :: artifacts dl 802ms 
     :: modules in use: 
     ai.h2o#deepwater-backend-api;1.0.2 from central in [default] 
     ai.h2o#google-analytics-java;1.1.2-H2O-CUSTOM from central in [default] 
     ai.h2o#h2o-algos;3.10.4.7 from central in [default] 
     ai.h2o#h2o-avro-parser;3.10.4.7 from central in [default] 
     ai.h2o#h2o-core;3.10.4.7 from central in [default] 
     ai.h2o#h2o-genmodel;3.10.4.7 from central in [default] 
     ai.h2o#h2o-jaas-pam;3.10.4.7 from central in [default] 
     ai.h2o#h2o-orc-parser;3.10.4.7 from central in [default] 
     ai.h2o#h2o-parquet-parser;3.10.4.7 from central in [default] 
     ai.h2o#h2o-persist-hdfs;3.10.4.7 from central in [default] 
     ai.h2o#h2o-persist-s3;3.10.4.7 from central in [default] 
     ai.h2o#h2o-scala_2.11;3.10.4.7 from central in [default] 
     ai.h2o#h2o-web;3.10.4.7 from central in [default] 
     ai.h2o#reflections;0.9.11-h2o-custom from central in [default] 
     ai.h2o#sparkling-water-core_2.11;2.1.7 from central in [default] 
     ai.h2o#sparkling-water-examples_2.11;2.1.7 from central in [default] 
     ai.h2o#sparkling-water-ml_2.11;2.1.7 from central in [default] 
     ai.h2o#sparkling-water-repl_2.11;2.1.7 from central in [default] 
     com.amazonaws#aws-java-sdk-core;1.10.47 from central in [default] 
     com.amazonaws#aws-java-sdk-kms;1.10.47 from central in [default] 
     com.amazonaws#aws-java-sdk-s3;1.10.47 from central in [default] 
     com.github.rwl#jtransforms;2.4.0 from central in [default] 
     com.google.code.findbugs#jsr305;3.0.0 from central in [default] 
     com.google.code.gson#gson;2.3.1 from central in [default] 
     com.google.guava#guava;19.0 from central in [default] 
     commons-codec#commons-codec;1.6 from central in [default] 
     commons-httpclient#commons-httpclient;3.1 from central in [default] 
     commons-io#commons-io;2.4 from central in [default] 
     commons-lang#commons-lang;2.6 from central in [default] 
     commons-logging#commons-logging;1.1.3 from central in [default] 
     gov.nist.math#jama;1.0.3 from central in [default] 
     joda-time#joda-time;2.8.1 from central in [default] 
     log4j#log4j;1.2.15 from central in [default] 
     net.java.dev.jets3t#jets3t;0.6.1 from central in [default] 
     net.java.dev.jna#jna;4.0.0 from central in [default] 
     net.sf.opencsv#opencsv;2.3 from central in [default] 
     org.apache.commons#commons-math3;3.3 from central in [default] 
     org.apache.httpcomponents#httpclient;4.3.6 from central in [default] 
     org.apache.httpcomponents#httpcore;4.3.3 from central in [default] 
     org.eclipse.jetty#jetty-continuation;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-http;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-io;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-jndi;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-plus;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-security;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-server;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-servlet;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-util;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-webapp;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty#jetty-xml;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty.aggregate#jetty-servlet;8.1.17.v20150415 from central in [default] 
     org.eclipse.jetty.orbit#javax.activation;1.1.0.v201105071233 from central in [default] 
     org.eclipse.jetty.orbit#javax.mail.glassfish;1.4.1.v201005082020 from central in [default] 
     org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016 from central in [default] 
     org.eclipse.jetty.orbit#javax.transaction;1.1.1.v201105210645 from central in [default] 
     org.javassist#javassist;3.18.2-GA from central in [default] 
     org.joda#joda-convert;1.7 from central in [default] 
     org.kohsuke#libpam4j;1.8 from central in [default] 
     :: evicted modules: 
     joda-time#joda-time;2.3 by [joda-time#joda-time;2.8.1] in [default] 
     org.apache.httpcomponents#httpclient;4.1 by [org.apache.httpcomponents#httpclient;4.3.6] in [default] 
     org.apache.httpcomponents#httpcore;4.1 by [org.apache.httpcomponents#httpcore;4.3.3] in [default] 
     commons-logging#commons-logging;1.1.1 by [commons-logging#commons-logging;1.1.3] in [default] 
     commons-codec#commons-codec;1.4 by [commons-codec#commons-codec;1.6] in [default] 
     com.google.guava#guava;16.0.1 by [com.google.guava#guava;19.0] in [default] 
     com.google.guava#guava;18.0 by [com.google.guava#guava;19.0] in [default] 
     commons-codec#commons-codec;1.3 by [commons-codec#commons-codec;1.4] in [default] 
     commons-logging#commons-logging;1.0.4 by [commons-logging#commons-logging;1.1.1] in [default] 
     commons-codec#commons-codec;1.2 by [commons-codec#commons-codec;1.4] in [default] 
     --------------------------------------------------------------------- 
     |     |   modules   || artifacts | 
     |  conf  | number| search|dwnlded|evicted|| number|dwnlded| 
     --------------------------------------------------------------------- 
     |  default  | 68 | 15 | 15 | 10 || 55 | 12 | 
     --------------------------------------------------------------------- 
:: retrieving :: org.apache.spark#spark-submit-parent 
     confs: [default] 
     12 artifacts copied, 43 already retrieved (23416kB/63ms) 
Traceback (most recent call last): 
    File "/home/hadoop/scripts/test/spark.py", line 3, in <module> 
    hc = H2OContext.getOrCreate(sc) 
NameError: name 'H2OContext' is not defined 
+0

La commande a 2.1.7 comme ce fut la dernière chose que j'ai essayé, mais c'est la même avec 2.1.8. – Keston

+0

Il existe un autre paquet appelé pysparkling qui n'est pas associé à h2o. quand je l'ai installé, c'était le problème. Je peux maintenant l'exécuter sur un cluster sans problèmes. Après avoir terminé ce cluster et faire le bon bootstrap avec seulement h2o_pysparkling_2.1 et non pysparkling. ça a marché. avoir les deux installés provoque des problèmes. – Keston

Répondre

1

Vous n'avez pas besoin de fixer des paquets effervescents d'eau (en option --packages), mais vous devez fournir pysparkling package Python (il contient toutes les dépendances binaires nécessaires en interne).

La meilleure façon est de télécharger la distribution binaire de l'eau gazeuse à partir http://h2o.ai/download et utiliser le script bin/pysparkling ou utilisez directement étincelle:

$SPARK_HOME/bin/pyspark --py-files h2o_pysparkling_2.1-2.1.8.zip 
+0

Merci! J'ai testé ça et ça marche. Je cherchais un fichier .egg dans les répertoires qui n'existe plus dans les versions plus récentes je suppose. Le --package fonctionne aussi, mon problème était que j'avais installé pysparkling sur ce cluster qui n'est pas associé à h2o est un autre paquet aléatoire. Un bootstrap avec pip installer h2o_pysparkling_2.1 fonctionne aussi avec l'argument --package. – Keston