Skip to content

Commit

Permalink
[SPARK-1876] Windows fixes to deal with latest distribution layout ch…
Browse files Browse the repository at this point in the history
…anges

- Look for JARs in the right place
- Launch examples the same way as on Unix
- Load datanucleus JARs if they exist
- Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
- Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)

Author: Matei Zaharia <[email protected]>

Closes apache#819 from mateiz/win-fixes and squashes the following commits:

d558f96 [Matei Zaharia] Fix comment
228577b [Matei Zaharia] Review comments
d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
  • Loading branch information
mateiz authored and tdas committed May 19, 2014
1 parent df0aa83 commit 7b70a70
Show file tree
Hide file tree
Showing 7 changed files with 81 additions and 30 deletions.
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@ You can find the latest Spark documentation, including a programming
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
This README file only contains basic setup instructions.


## Building Spark

Spark is built on Scala 2.10. To build Spark and its example programs, run:

./sbt/sbt assembly

(You do not need to do this if you downloaded a pre-built package.)

## Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:
Expand All @@ -41,9 +42,9 @@ And run the following command, which should also return 1000:
Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> [params]`. For example:

./bin/run-example org.apache.spark.examples.SparkLR
./bin/run-example SparkPi

will run the Logistic Regression example locally.
will run the Pi example locally.

You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
Expand Down
2 changes: 1 addition & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>org.datanucleus:*</exclude>
<exclude>org/datanucleus/**</exclude>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
Expand Down
24 changes: 23 additions & 1 deletion bin/compute-classpath.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@ rem
rem This script computes Spark's classpath and prints it to stdout; it's used by both the "run"
rem script and the ExecutorRunner in standalone cluster mode.

rem If we're called from spark-class2.cmd, it already set enabledelayedexpansion and setting
rem it here would stop us from affecting its copy of the CLASSPATH variable; otherwise we
rem need to set it here because we use !datanucleus_jars! below.
if "%DONT_PRINT_CLASSPATH%"=="1" goto skip_delayed_expansion
setlocal enabledelayedexpansion
:skip_delayed_expansion

set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
Expand All @@ -31,7 +38,7 @@ if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"
rem Build up classpath
set CLASSPATH=%FWDIR%conf
if exist "%FWDIR%RELEASE" (
for %%d in ("%FWDIR%jars\spark-assembly*.jar") do (
for %%d in ("%FWDIR%lib\spark-assembly*.jar") do (
set ASSEMBLY_JAR=%%d
)
) else (
Expand All @@ -42,6 +49,21 @@ if exist "%FWDIR%RELEASE" (

set CLASSPATH=%CLASSPATH%;%ASSEMBLY_JAR%

rem When Hive support is needed, Datanucleus jars must be included on the classpath.
rem Datanucleus jars do not work if only included in the uber jar as plugin.xml metadata is lost.
rem Both sbt and maven will populate "lib_managed/jars/" with the datanucleus jars when Spark is
rem built with Hive, so look for them there.
if exist "%FWDIR%RELEASE" (
set datanucleus_dir=%FWDIR%lib
) else (
set datanucleus_dir=%FWDIR%lib_managed\jars
)
set "datanucleus_jars="
for %%d in ("%datanucleus_dir%\datanucleus-*.jar") do (
set datanucleus_jars=!datanucleus_jars!;%%d
)
set CLASSPATH=%CLASSPATH%;%datanucleus_jars%

set SPARK_CLASSES=%FWDIR%core\target\scala-%SCALA_VERSION%\classes
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%repl\target\scala-%SCALA_VERSION%\classes
set SPARK_CLASSES=%SPARK_CLASSES%;%FWDIR%mllib\target\scala-%SCALA_VERSION%\classes
Expand Down
23 changes: 11 additions & 12 deletions bin/run-example
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,16 @@ FWDIR="$(cd `dirname $0`/..; pwd)"
export SPARK_HOME="$FWDIR"
EXAMPLES_DIR="$FWDIR"/examples

if [ -n "$1" ]; then
EXAMPLE_CLASS="$1"
shift
else
echo "Usage: ./bin/run-example <example-class> [example-args]"
echo " - set MASTER=XX to use a specific master"
echo " - can use abbreviated example class name (e.g. SparkPi, mllib.LinearRegression)"
exit 1
fi

if [ -f "$FWDIR/RELEASE" ]; then
export SPARK_EXAMPLES_JAR=`ls "$FWDIR"/lib/spark-examples-*hadoop*.jar`
elif [ -e "$EXAMPLES_DIR"/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.jar ]; then
Expand All @@ -37,23 +47,12 @@ fi

EXAMPLE_MASTER=${MASTER:-"local[*]"}

if [ -n "$1" ]; then
EXAMPLE_CLASS="$1"
shift
else
echo "usage: ./bin/run-example <example-class> [example-args]"
echo " - set MASTER=XX to use a specific master"
echo " - can use abbreviated example class name (e.g. SparkPi, mllib.MovieLensALS)"
echo
exit -1
fi

if [[ ! $EXAMPLE_CLASS == org.apache.spark.examples* ]]; then
EXAMPLE_CLASS="org.apache.spark.examples.$EXAMPLE_CLASS"
fi

./bin/spark-submit \
--master $EXAMPLE_MASTER \
--class $EXAMPLE_CLASS \
$SPARK_EXAMPLES_JAR \
"$SPARK_EXAMPLES_JAR" \
"$@"
51 changes: 39 additions & 12 deletions bin/run-example2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,32 +30,59 @@ if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"

rem Test that an argument was given
if not "x%1"=="x" goto arg_given
echo Usage: run-example ^<example-class^> [^<args^>]
echo Usage: run-example ^<example-class^> [example-args]
echo - set MASTER=XX to use a specific master
echo - can use abbreviated example class name (e.g. SparkPi, mllib.LinearRegression)
goto exit
:arg_given

set EXAMPLES_DIR=%FWDIR%examples

rem Figure out the JAR file that our examples were packaged into.
set SPARK_EXAMPLES_JAR=
for %%d in ("%EXAMPLES_DIR%\target\scala-%SCALA_VERSION%\spark-examples*assembly*.jar") do (
set SPARK_EXAMPLES_JAR=%%d
if exist "%FWDIR%RELEASE" (
for %%d in ("%FWDIR%lib\spark-examples*.jar") do (
set SPARK_EXAMPLES_JAR=%%d
)
) else (
for %%d in ("%EXAMPLES_DIR%\target\scala-%SCALA_VERSION%\spark-examples*.jar") do (
set SPARK_EXAMPLES_JAR=%%d
)
)
if "x%SPARK_EXAMPLES_JAR%"=="x" (
echo Failed to find Spark examples assembly JAR.
echo You need to build Spark with sbt\sbt assembly before running this program.
goto exit
)

rem Compute Spark classpath using external script
set DONT_PRINT_CLASSPATH=1
call "%FWDIR%bin\compute-classpath.cmd"
set DONT_PRINT_CLASSPATH=0
set CLASSPATH=%SPARK_EXAMPLES_JAR%;%CLASSPATH%
rem Set master from MASTER environment variable if given
if "x%MASTER%"=="x" (
set EXAMPLE_MASTER=local[*]
) else (
set EXAMPLE_MASTER=%MASTER%
)

rem If the EXAMPLE_CLASS does not start with org.apache.spark.examples, add that
set EXAMPLE_CLASS=%1
set PREFIX=%EXAMPLE_CLASS:~0,25%
if not %PREFIX%==org.apache.spark.examples (
set EXAMPLE_CLASS=org.apache.spark.examples.%EXAMPLE_CLASS%
)

rem Get the tail of the argument list, to skip the first one. This is surprisingly
rem complicated on Windows.
set "ARGS="
:top
shift
if "%~1" neq "" (
set ARGS=%ARGS% "%~1"
goto :top
)
if defined ARGS set ARGS=%ARGS:~1%

rem Figure out where java is.
set RUNNER=java
if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java
call "%FWDIR%bin\spark-submit.cmd" ^
--master %EXAMPLE_MASTER% ^
--class %EXAMPLE_CLASS% ^
"%SPARK_EXAMPLES_JAR%" %ARGS%

"%RUNNER%" -cp "%CLASSPATH%" %JAVA_OPTS% %*
:exit
2 changes: 2 additions & 0 deletions bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

setlocal enabledelayedexpansion

set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ object SparkSubmit {
}

private def addJarToClasspath(localJar: String, loader: ExecutorURLClassLoader) {
val localJarFile = new File(new URI(localJar).getPath)
val localJarFile = new File(localJar)
if (!localJarFile.exists()) {
printWarning(s"Jar $localJar does not exist, skipping.")
}
Expand Down

0 comments on commit 7b70a70

Please sign in to comment.