Skip to content

Recent versions of Spark not supported ? (ClassCastException error) #1334

Open
@AWebNagra

Description

@AWebNagra

Hello,

We recently tried using almond as a scala kernel within our jupyterlab environment but we are encountering errors when trying to use recent versions of spark.

Spark versions tested : 3.5.0
Scala version : 2.13.8 (scala version used in spark 3.5.0)
Java version : 17.0.2
Almond versions (tried) : 0.13.14 and 0.14.0-RC13

The errors arise when trying to send code to the execs, hence inducing serialization and deserialization. Other operations work fine (count, show, etc..)

Here's a minimalistic example causing the error :

import org.apache.spark.rdd.RDD
val rdd:RDD[Int] = spark.sparkContext.parallelize(List(1,2,3,4,5))
val multipliedRDD = rdd.map(_ * 2)
println(multipliedRDD.collect().mkString(", "))

and the error is :

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

Note that running the exact same code in a spark-shell on the jupyterlab instance works fine, hence the problem must come from almond.
Our best guess is that the classpath used by almond imports mismatched versions of some libs, but we don't have any proof that this could be the issue

Second note : we tried using our own spark installed in the jupyterlab image AND installing spark directly with ivy from a scala notebook, both induce the same error.

Does anyone have any idea what could be causing this issue ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions