Stratosphere version 0.5 available
We are happy to announce a new major Stratosphere release, version 0.5. This release adds many new features and improves the interoperability, stability, and performance of the system. The major theme of the release is the completely new Java API that makes it easy to write powerful distributed programs.
The release can be downloaded from the [Stratosphere website] (http://stratosphere.eu/downloads/) and from [GitHub] (https://github.com/stratosphere/stratosphere/releases/tag/release-0.5). All components are available as Apache Maven dependencies, making it simple to include Stratosphere in other projects. The website provides extensive documentation of the system and the new features.
Shortlist of new Features
Below is a short list of the most important additions to the Stratosphere system.
New Java API
This release introduces a completely new data set-centric Java API. This programming model significantly eases the development of Stratosphere programs, supports flexible use of regular Java classes as data types, and adds many new built-in operators to simplify the writing of powerful programs. The result are programs that need less code, are more readable, interoperate better with existing code, and execute faster.
Take a look at the examples to get a feel for the API.
General API Improvements
Broadcast Variables: Publish a data set to all instances of another operator. This is handy if the your operator depends on the result of a computation, e.g., filter all values smaller than the average.
Distributed Cache: Make (local and HDFS) files locally available on each machine processing a task.
Iteration Termination Improvements Iterative algorithms can now terminate based on intermediate data sets, not only through aggregated statistics.
Collection data sources and sinks: Speed-up the development and testing of Stratosphere programs by reading data from regular Java collections and inserting back into them.
JDBC data sources and sinks: Read data from and write data to relational databases using a JDBC driver.
Hadoop input format and output format support: Read and write data with any Hadoop input or output format.
Support for Avro encoded data: Read data that has been materialized using Avro.
Deflate Files: Stratosphere now transparently reads .deflate
compressed files.
Runtime and Optimizer Improvements
DAG Runtime Streaming: Detection and resolution of streaming data flow deadlocks in the data flow optimizer.
Intermediate results across iteration boundaries: Intermediate results computed outside iterative parts can be used inside iterative parts of the program.
Stability fixes: Various stability fixes in both optimizer and runtime.
Setup & Tooling
Improved YARN support: Many improvements based on user-feedback: Packaging, Permissions, Error handling.
Java 8 compatibility
Contributors
In total, 26 people have contributed to Stratosphere since the last release. Thank you for making this project possible!
- Alexander Alexandrov
- Jesus Camacho
- Ufuk Celebi
- Mikhail Erofeev
- Stephan Ewen
- Alexandr Ferodov
- Filip Haase
- Jonathan Hasenberg
- Markus Holzemer
- Fabian Hueske
- Vasia Kalavri
- Aljoscha Krettek
- Rajika Kumarasiri
- Sebastian Kunert
- Aaron Lam
- Robert Metzger
- Faisal Moeen
- Martin Neumann
- Mingliang Qi
- Till Rohrmann
- Chesnay Schepler
- Vyachislav Soludev
- Tuan Trieu
- Artem Tsikiridis
- Timo Walther
- Robert Waury
Stratosphere is going Apache
The Stratosphere project has been accepted to the Apache Incubator and will continue its work under the umbrella of the Apache Software Foundation. Due to a name conflict, we are switching the name of the project. We will make future releases of Stratosphere through the Apache foundation under a new name.