2017-07-06 2 views
0

Je l'dataframe suivante:Calcul de l'intervalle de confiance dans Spark scala

+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+------+--------------------+-------+-------+--------------------+ 
| time_stamp_0|sender_ip_1|receiver_ip_2|count|rank| xi|     pi|     r|attack|    myvalue|max_int|min_int|     int| 
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+------+--------------------+-------+-------+--------------------+ 
|12:18:52.702936| 10.0.0.1|  10.0.0.4|11139| 1| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:53.702976| 10.0.0.1|  10.0.0.4|11139| 2| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.702873| 10.0.0.1|  10.0.0.4|11139| 3| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:55.702825| 10.0.0.1|  10.0.0.4|11139| 4| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:56.703021| 10.0.0.1|  10.0.0.4|11139| 5| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:57.703786| 10.0.0.1|  10.0.0.4|11139| 6| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:58.706354| 10.0.0.1|  10.0.0.4|11139| 7| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:59.705885| 10.0.0.1|  10.0.0.4|11139| 8| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:14.703371| 10.0.0.1|  10.0.0.4|11139| 9| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:15.702891| 10.0.0.1|  10.0.0.4|11139| 10| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:16.703450| 10.0.0.1|  10.0.0.4|11139| 11| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:17.703087| 10.0.0.1|  10.0.0.4|11139| 12| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:18.704467| 10.0.0.1|  10.0.0.4|11139| 13| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:19.703472| 10.0.0.1|  10.0.0.4|11139| 14| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:20:20.703268| 10.0.0.1|  10.0.0.4|11139| 15| 15| 0.00134661998384056|0.49609480204686235|  0|0.008901370242045487|11139.0|11139.0|[11139.000, 11139...| 
|12:18:52.995718| 10.0.0.5|  10.0.0.1|11139| 1| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:53.995478| 10.0.0.5|  10.0.0.1|11139| 2| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.995653| 10.0.0.5|  10.0.0.1|11139| 3| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:55.995978| 10.0.0.5|  10.0.0.1|11139| 4| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:56.994984| 10.0.0.5|  10.0.0.1|11139| 5| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:57.995190| 10.0.0.5|  10.0.0.1|11139| 6| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:58.994970| 10.0.0.5|  10.0.0.1|11139| 7| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:14.995142| 10.0.0.5|  10.0.0.1|11139| 8| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:15.995244| 10.0.0.5|  10.0.0.1|11139| 9| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:16.995481| 10.0.0.5|  10.0.0.1|11139| 10| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:17.995213| 10.0.0.5|  10.0.0.1|11139| 11| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:18.994985| 10.0.0.5|  10.0.0.1|11139| 12| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:19.994872| 10.0.0.5|  10.0.0.1|11139| 13| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:20.994932| 10.0.0.5|  10.0.0.1|11139| 14| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:52.995744| 10.0.0.1|  10.0.0.5|11139| 1| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:53.995496| 10.0.0.1|  10.0.0.5|11139| 2| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.995665| 10.0.0.1|  10.0.0.5|11139| 3| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:55.995986| 10.0.0.1|  10.0.0.5|11139| 4| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:56.994999| 10.0.0.1|  10.0.0.5|11139| 5| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:57.995204| 10.0.0.1|  10.0.0.5|11139| 6| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:58.995057| 10.0.0.1|  10.0.0.5|11139| 7| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:14.995169| 10.0.0.1|  10.0.0.5|11139| 8| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:15.995261| 10.0.0.1|  10.0.0.5|11139| 9| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:16.995499| 10.0.0.1|  10.0.0.5|11139| 10| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:17.995220| 10.0.0.1|  10.0.0.5|11139| 11| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:18.994997| 10.0.0.1|  10.0.0.5|11139| 12| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:19.994891| 10.0.0.1|  10.0.0.5|11139| 13| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:20:20.994951| 10.0.0.1|  10.0.0.5|11139| 14| 14|0.001256845318251...|0.49609480204686235|  0|0.008394658926763537|11139.0|11139.0|[11139.000, 11139...| 
|12:18:52.811535| 10.0.0.1|  10.0.0.2|11139| 1|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:53.812029| 10.0.0.1|  10.0.0.2|11139| 2|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.480070| 10.0.0.1|  10.0.0.2|11139| 3|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.481196| 10.0.0.1|  10.0.0.2|11139| 4|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.483532| 10.0.0.1|  10.0.0.2|11139| 5|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.485713| 10.0.0.1|  10.0.0.2|11139| 6|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.487091| 10.0.0.1|  10.0.0.2|11139| 7|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 
|12:18:54.488272| 10.0.0.1|  10.0.0.2|11139| 8|5526| 0.49609480204686235|0.49609480204686235|  0| 0.347756620851195|11139.0|11139.0|[11139.000, 11139...| 

je dois calculer l'intervalle de confiance, min intervalle de confiance et l'intervalle de confiance max (A propos calcul intervalle de confiance: http://www.statisticshowto.com/how-to-find-a-confidence-interval/) pour colonne « myvalue » . J'ai utilisé le code suivant:

val cntInterval = final_add_count_rank_xi_pi_r_attack_antropy.select("myvalue").rdd.countApprox(timeout = 1000L,confidence = 0.95) 
    val (lowCnt,highCnt) = (cntInterval.getFinalValue().low, cntInterval.getFinalValue().high) 

    //Add the confidencial interval to df 
    val final_integration_df=final_add_count_rank_xi_pi_r_attack_antropy.withColumn("max_int", lit(highCnt)) 
    .withColumn("min_int", lit(lowCnt)) 
    .withColumn("int", lit(cntInterval.getFinalValue().toString())) 

    //Data becomes clean 
    final_integration_df.show(100) 

Cependant, mon problème est que, l'intervalle de confiance est 11139,0 pour les trois valeurs (intervalle de confiance, min d'intervalle de confiance et l'intervalle de confiance max) dans mon dataframe qui est égal au nombre de connexion entre "10.0.0.1" et "10.0.0.2"! (compter la colonne dans l'image) Pouvez-vous m'aider à résoudre le problème? merci

+1

Je ne comprends pas comment vous voulez calculer les intervalles de confiance en utilisant 'countApprox' qui vous donne le nombre approximatif de lignes? –

Répondre

1

Pour autant que je comprends, vous voulez calculer la confiance pour chaque ligne de votre DataFrame. Pour ce faire, veuillez utiliser UDF, au lieu d'allumer. La fonction Lit insère les mêmes données dans chaque ligne.

Voici exemple de l'UDF:

val df = spark.sparkContext.parallelize(Seq(1,2,3,4)).toDF("first") 
import org.apache.spark.sql.functions.udf 
val func = udf((i1: Int) => i1 + 3) 
df.withColumn("sum", func(df("first"))).show 
+0

Ceci ne fournit pas de réponse à la question. Une fois que vous avez suffisamment [réputation] (https://stackoverflow.com/help/whats-reputation) vous pourrez [commenter n'importe quel article] (https://stackoverflow.com/help/privileges/comment); Au lieu de cela, [fournissez des réponses qui ne nécessitent pas de précisions de la part du demandeur] (https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can- je-fais-à la place). - [De l'examen] (/ review/low-quality-posts/16629527) –

+1

la question était "Pouvez-vous m'aider à résoudre ce problème?". Donc, j'ai souligné ce que, comment et pourquoi devrait être réparé. – Bartek

+0

Merci pour votre réponse. Alors, comment dois-je faire cela? – Queen