Skip to content

Installed version is not aligned with dependencies #1321

Open
@djuarezg

Description

I am installing Almond as in here

# Install Almond Scala kernel
RUN curl -Lo coursier https://git.io/coursier-cli && \
    chmod +x coursier && \
    ./coursier -J-Dhttps.proxyHost=<proxy> -J-Dhttps.proxyPort=3128 -J-Dhttp.proxyHost=<proxy> -J-Dhttp.proxyPort=3128 -J-Dhttp.nonProxyHosts=<non_proxy> -J-Dhttps.nonProxyHosts=<non_proxy> bootstrap --hybrid  almond:0.14.0-RC8 --scala 2.13.8 -o almond
RUN ./almond --install --force --jupyter-path="/opt/conda/share/jupyter/kernels"

Then, on my notebook that used to work with older versions I have the following:

import $ivy.`org.apache.spark::spark-sql:3.5.1`
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.5.1/spark-sql_2.13-3.5.1.pom
Downloading https://repo1.maven.org/maven2/sh/almond/almond-spark_2.13/0.14.0-RC3/almond-spark_2.13-0.14.0-RC3.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/almond-spark_2.13/0.14.0-RC3/almond-spark_2.13-0.14.0-RC3.pom
(...)
import org.apache.spark.sql._
import $ivy.`sh.almond::almond-spark:0.14.0-RC8`

As you see it downloads RC3 related libraries instead of RC8.

And when running the session building this is again showing on the logs:

val spark = {
  NotebookSparkSession.builder()
    .appName("MvSparkNotebook")
    .master("spark://vmk-tdtspark-01:7070")
    .config("spark.cores.max", "1")
    .config("spark.executor.instances", "1")
    .config("spark.executor.cores", "1")
    .config("spark.executor.memory", "1g")
    .getOrCreate()
}
Downloading https://repo1.maven.org/maven2/sh/almond/spark-stubs_32_2.13/0.14.0-RC3/spark-stubs_32_2.13-0.14.0-RC3.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/spark-stubs_32_2.13/0.14.0-RC3/spark-stubs_32_2.13-0.14.0-RC3.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.2.0/spark-sql_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.2.0/spark-sql_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.13/3.2.0/spark-parent_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.13/3.2.0/spark-parent_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.1/parquet-hadoop-1.12.1.pom
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sketch_2.13/3.2.0/spark-sketch_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.13/3.2.0/spark-catalyst_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/hive/hive-storage-api/2.7.2/hive-storage-api-2.7.2.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-tags_2.13/3.2.0/spark-tags_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.pom
(...)
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.jar
Downloaded https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.20/commons-compress-1.20.jar
Downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/core/jersey-client/2.34/jersey-client-2.34-sources.jar
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.12.3/jackson-databind-2.12.3-sources.jar
Downloaded https://repo1.maven.org/maven2/io/netty/netty-handler/4.1.50.Final/netty-handler-4.1.50.Final.jar
Downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar
Downloaded https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar
Downloading https://repo1.maven.org/maven2/io/netty/netty-transport-native-epoll/4.1.50.Final/netty-transport-native-epoll-4.1.50.Final.jar
Downloaded https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar
Downloading https://repo1.maven.org/maven2/io/dropwizard/metrics/metrics-json/4.2.0/metrics-json-4.2.0.jar
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.jar

From this, you can see the same library is installed twice and then it uses the wrong one when opening the session with Spark master, causing a version mismatch and therefore failing the connection:

24/02/29 16:13:46 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: scala.concurrent.duration.FiniteDuration; local class incompatible: stream classdesc serialVersionUID = -6513803676778706429,
 local class serialVersionUID = -4594686286536372853

This means that the version I specify, RC8 is not respected.
How do I enforce this?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions