Wvlet: Redesigning 50-Year-Old SQL for Modern Data Analytics

December 30, 2024 · 5 min read

Senior Principal Engineer @ Treasure Data

We are excited to announce the release of Wvlet version 2024.9, an open-source flow-style query language designed to help users to write efficient queries for SQL engines. You can try Wvlet, pronounced as weave-let, directly in your web browser at Wvlet Playground. The source code of Wvlet compiler is available at GitHub wvlet/wvlet.

Why Wvlet?

At Treasure Data, we process over 3 million SQL queries daily. Managing this volume of queries while helping users (including LLM) write efficient queries presents significant challenges.

The primary challenges lies in SQL's syntax: its syntactic order doesn't match the actual data flow. This mismatch makes debugging complex and deeply nested queries difficult, even for SQL experts. A Critique of Modern SQL And A Proposal Towards A Simple and Expressive Query Language (CIDR '24) clearly illustrates this issue in this figure:

semantic-order

Additionally, the SQL standard (e.g., SQL-92) is limited in scope and lacks essential software engineering features for managing multiple queries. For example, SQL has:

No built-in support for reusing and generating queries.
No extension point for multi-query optimization, such as incremental processing and pipeline execution like dbt.
No built-in debugging or testing capabilities.

These limitations stem from SQL, born in the 1970s, which was not designed for today's complex data analytics needs. Wvlet addresses these challenges by modernizing 50-year-old SQL, making it more intuitive and functional while incorporating software engineering best practices.

What's The Current State of Wvlet?

Though still in early development, Wvlet already enables users to write and run queries against DuckDB and Trino through either a command line client (wv) or the Web-based UI (wvlet ui).

Interactive Editor (wv)

If you are using Mac, you can easily install the interative shell (wv) with the Homebrew command: brew install wvlet/wvlet/wvlet.

The wv interactive editor (REPL) supports various shortcut keys, allowing you to check the schema (ctrl-j, ctrl-d), test the sub query (ctrl-j ctrl-t), or run the query (ctrl-j, ctrl-r) even in the middle of the query.

wvlet shell

For using Trino SQL engine, you need to configure ~/.wvlet/profiles.yml file to specify the target Trino server address.

Wvlet Playground

Wvlet is written in Scala 3, which can be compiled to JavaScript using the power of Scala.js, enabling browser-based execution. You can try out Wvlet queries in the Wvlet Playground, where the Wvlet queries are compiled into SQL and run on DuckDB's WebAssembly version (DuckDB Wasm)--all without requiring any installation.

Wvlet Playground

wvlet playground

Wvlet also provides a standalone WebUI to start a local web server to run Wvlet queries in your browser.

Flow-Style Query Syntax

Wvlet has redesigned SQL in various ways to match the syntax with the natural data flow by introducing flow-style relational operators (e.g., add, agg, concat, sample, etc.), and column-at-a-time operators (e.g., rename, exclude, shift) for reducing the burden of enumerating columns.

Notably, Wvlet enhances SQL with test syntax, leveraging test expressions to verify Wvlet's functionality through Wvlet queries with test expressions.

For more details on the query syntax, refer to the following presentation slides from Trino Summit 2024:

Functional Data Modeling

Queries written in Wvlet are reusable and composable, making it easier to manage complex queries. Once you write your query in .wv files, you can call or reuse them in other queries.

Data Models

Wvlet SDKs

We plan to add SDKs for multiple programming languages to enable users to convert Wvlet queries into SQL. Wvlet compiler, written in Scala 3, can be compiled into native LLVM code through Scala Native. This generates binaries that integrate with languages like Python, Rust, Ruby, C/C++ etc. Our 2024.9 release includes an initial version of Python SDKs:

Wvlet SDKs

Thanks to contributors from the community, we are getting closer to support multiple programming languages. For example, an extension to use Wvlet using DuckDB, has been created. After the build pipeline is stabilized, we will release the official SDKs for various programming languages in PyPI, Maven, and other package repositories.

What's Next?

We plan to release milestone versions approximately every 3 months, following the format (year).(milestone month).(patch). The next milestone version will be 2025.1. You can find our project roadmap and features under active development on the Wvlet Roadmap.

The next 2025.1 milestone will focus on functional data modeling features, including:

Advanced query optimization with cascading updates and materialization of Wvlet data models, similar to dbt, featuring incremental processing.
Support for importing Wvlet queries from GitHub repositories.
Enhance the type system with improved dot-syntax support for complex expressions.
Support for more SQL dialects through context-specific query inlining.

Join our discussions in the Discord channel. We welcom your thoughts, feedback, and feature requests.

Why Wvlet?​

What's The Current State of Wvlet?​

Interactive Editor (wv)​

Wvlet Playground​

Flow-Style Query Syntax​

Functional Data Modeling​

Wvlet SDKs​

What's Next?​