2017-09-07 1 views

Répondre

2

Vous pouvez utiliser la fonction size:

val df = Seq((Array("a","b","c"), 2), (Array("a"), 4)).toDF("friends", "id") 
// df: org.apache.spark.sql.DataFrame = [friends: array<string>, id: int] 

df.select(size($"friends").as("no_of_friends")).show 
+-------------+ 
|no_of_friends| 
+-------------+ 
|   3| 
|   1| 
+-------------+ 

Pour ajouter une nouvelle colonne:

df.withColumn("no_of_friends", size($"friends")).show 
+---------+---+-------------+ 
| friends| id|no_of_friends| 
+---------+---+-------------+ 
|[a, b, c]| 2|   3| 
|  [a]| 4|   1| 
+---------+---+-------------+