Materialize Blog https://materialize.com The latest technical articles, product updates and company news from Materialize: A streaming-first data warehouse for operational workloads. Fri, 03 Jan 2025 16:31:02 GMT https://validator.w3.org/feed/docs/rss2.html https://github.com/jpmonette/feed en <![CDATA[Efficient Real-Time App with TAIL | Materialize]]> https://materialize.com/blog/a-simple-and-efficient-real-time-application-powered-by-materializes-tail-command https://materialize.com/blog/a-simple-and-efficient-real-time-application-powered-by-materializes-tail-command Wed, 20 Jan 2021 00:00:00 GMT <![CDATA[

Let's build a python application to demonstrate how developers can create real-time, event-driven experiences for their users, powered by Materialize.

]]>
<![CDATA[

...to this data structure.

timestamp = 1608081358001 inserted = [('Epidosis', '4595'), ('Matlin', '5221')] deleted = [('Lockal', '4590'), ('Matlin', '5220')]

    async for (timestamp, progressed, diff, *columns) in cursor:
        # The progressed column serves as a synchronization primitive indicating that all
        # rows for an update have been read. We should publish this update.
        if progressed:
            self.update(deleted, inserted, timestamp)
            inserted = []
            deleted = []
            continue

        # Simplify our implementation by creating "diff" copies of each row instead
        # of tracking counts per row
        if diff &lt; 0: deleted.extend([columns] * abs(diff)) elif diff &gt; 0:
            inserted.extend([columns] * diff)
        else:
            raise ValueError(f"Bad data from TAIL: {row}")

# Remove any rows that have been deleted
for r in deleted:
    self.current_rows.remove(r)

# And add any rows that have been inserted
self.current_rows.extend(inserted)

# If we have listeners configured, broadcast this diff
if self.listeners:
    payload = {"deleted": deleted, "inserted": inserted, "timestamp": timestamp}
    self.broadcast(payload)

connection.onmessage = function (event) { var data = JSON.parse(event.data); // Counter is a single row table, so every update should contain one insert and // maybe one delete (which we don't care about) document.getElementById('counter').innerHTML = data.inserted[0][0]; };

function convert_to_subject(row) { return { subject: row[0], count: parseInt(row[1]) }; }

function subject_in_array(e, arr) { return arr.find((i) => i.subject === e.subject && i.count === e.count); }

connection.onmessage = function (event) { var data = JSON.parse(event.data); var insert_values = data.inserted.map(convert_to_subject); var delete_values = data.deleted.map(convert_to_subject); var changeSet = vega .changeset() .insert(insert_values) .remove((d) => subject_in_array(d, delete_values));

chart.view.change('data', changeSet).resize().run();

}; });

]]>
<![CDATA[Materialize and Advent of Code: Using SQL to solve your puzzles!]]> https://materialize.com/blog/advent-of-code-2023 https://materialize.com/blog/advent-of-code-2023 Fri, 19 Jan 2024 00:00:00 GMT <![CDATA[

The Materialize team participated in Advent of Code 2023 and took a bold approach in using SQL to solve each puzzle. Check it out.

]]>
<![CDATA[
-- Parse the problem input into tabular form.
lines(line TEXT) AS ( .. ),

-- SQL leading up to part 1.
part1(part1 BIGINT) AS ( .. ),

-- SQL leading up to part 2.
part2(part2 BIGINT) AS ( .. ) 

SELECT * FROM part1, part2;

Link to puzzle(s) 🟢 🟢

Part one

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

Consider your entire calibration document. What is the sum of all of the calibration values?

Part two

Your calculation isn't quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".

Equipped with this new information, you now need to find the real first and last digit on each line.

Contributors

Day 1 was brought to you by: @chass, @def-, @doy-materialize, @frankmcsherry, @josharenberg, @morsapaes, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one

Given a table with the following format:

Part two

Part one + two in one go!

Contributors

Day 2 was brought to you by: @def-, @frankmcsherry, @morsapaes

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 3 was brought to you by: @frankmcsherry, @morsapaes

Link to puzzle(s) 🟢 🟢

Part one

Part two

Part one + two in one go!

Contributors

Day 4 was brought to you by: @chass, @doy-materialize, @frankmcsherry, @morsapaes

Link to puzzle(s) 🟢 🟢

Part one

Part one + two in one go!

Contributors

Day 5 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one

Part one + two in one go!

Contributors

Day 6 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize, @petrosagg

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 7 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 8 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 9 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 10 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 11 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one

Contributors

Day 12 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 13 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part 1

Part 2

Contributors

Day 14 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 15 was brought to you by: @frankmcsherry, @nrainer-materialize

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 16 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 17 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 18 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 19 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 20 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 21 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 22 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 23 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢 🟢

Part one + two in one go!

Contributors

Day 24 was brought to you by: @frankmcsherry

Link to puzzle(s) 🟢

Part one

Contributors

Day 25 was brought to you by: @frankmcsherry

]]>
<![CDATA[Building a MySQL source for Materialize]]> https://materialize.com/blog/building-a-mysql-source https://materialize.com/blog/building-a-mysql-source Thu, 21 Mar 2024 00:00:00 GMT <![CDATA[

An in-depth breakdown of how we architected and built a native MySQL CDC source

]]>
<![CDATA[How we built the SQL Shell]]> https://materialize.com/blog/building-sql-shell https://materialize.com/blog/building-sql-shell Thu, 21 Dec 2023 00:00:00 GMT <![CDATA[

Learn how we built an in-browser SQL shell that empowers Materialize users to interact with their databases

]]>
<![CDATA[Bulk exports to S3, now in Private Preview!]]> https://materialize.com/blog/bulk-exports-s3 https://materialize.com/blog/bulk-exports-s3 Mon, 03 Jun 2024 00:00:00 GMT <![CDATA[

Export a snapshot of your data to Amazon S3 object storage as an intermediary to sink data to a broader set of systems downstream

]]>
<![CDATA[Capturing Change Data Capture (CDC) Data]]> https://materialize.com/blog/capturing-cdc-data https://materialize.com/blog/capturing-cdc-data Tue, 01 Aug 2023 00:00:00 GMT <![CDATA[

An illustration of the unexpectedly high downstream cost of clever optimizations to change data capture.

]]>
<![CDATA[The Challenges With Microservices (and how Materialize can help)]]> https://materialize.com/blog/challenges-with-microservices https://materialize.com/blog/challenges-with-microservices Wed, 11 Dec 2024 00:00:00 GMT <![CDATA[

Explore how Materialize overcomes key microservices challenges like data silos, network fan-out, and reconvergence issues. Learn how database-level transformations unlock real-time, consistent, and efficient operations in microservices architectures.

]]>
<![CDATA[Change Data Capture is having a moment. Why?]]> https://materialize.com/blog/change-data-capture-is-having-a-moment-why https://materialize.com/blog/change-data-capture-is-having-a-moment-why Tue, 21 Sep 2021 00:00:00 GMT <![CDATA[

Change Data Capture (CDC) is finally gaining widespread adoption as a architectural primitive. Why now?

]]>
<![CDATA[Change Data Capture (part 1)]]> https://materialize.com/blog/change-data-capture-part-1 https://materialize.com/blog/change-data-capture-part-1 Thu, 13 Aug 2020 00:00:00 GMT <![CDATA[

Here we set the context for and propose a change data capture protocol: a means of writing down and reading back changes to data.

]]>
<![CDATA[

// one record is "updated" (record1, time1, -1) (record2, time1, +1)

// two records are deleted (record0, time2, -1) (record2, time2, -1)

/// Frontier through which `Self` has reported updates.
///
/// All updates not beyond this frontier have been reported.
/// Any information related to times not beyond this frontier can be discarded.
///
/// This frontier tracks the meet of `progress_frontier` and `updates_frontier`,
/// our two bounds on potential uncertainty in progress and update messages.
reported_frontier: Antichain&lt;T&gt;,

/// Updates that have been received, but are still beyond `reported_frontier`.
///
/// These updates are retained both so that they can eventually be transmitted,
/// but also so that they can deduplicate updates that may still be received.
updates: std::collections::HashSet&lt;(D, T, R)&gt;,

/// Frontier of accepted progress statements.
///
/// All progress message counts for times not beyond this frontier have been
/// incorporated in to `updates_frontier`. This frontier also guides which
/// received progress statements can be incorporated: those whose for which
/// this frontier is beyond their lower bound.
progress_frontier: Antichain&lt;T&gt;,

/// Counts of outstanding messages at times.
///
/// These counts track the difference between message counts at times announced
/// by progress messages, and message counts at times received in distinct updates.
updates_frontier: MutableAntichain&lt;T&gt;,

/// Progress statements that are not yet actionable due to out-of-orderedness.
///
/// A progress statement becomes actionable once the progress frontier is beyond
/// its lower frontier. This ensures that the [0, lower) interval is already
/// covered, and that we will not leave a gap by incorporating the counts
/// and reflecting the progress statement's upper frontier.
progress_queue: Vec&lt;Progress&lt;T&gt;&gt;,

}

    // Drain actionable progress messages.
    unimplemented!()

    // Determine if the lower bound of `progress_frontier` and `updates_frontier` has advanced.
    // If so, we can determine and return a batch of updates and an newly advanced frontier.
    unimplemented!()
}
// If we've exhausted our iterator, we have nothing to say.
None

]]>
<![CDATA[Clusters, explained with Data Warehouses]]> https://materialize.com/blog/clusters-explained https://materialize.com/blog/clusters-explained Tue, 31 Jan 2023 00:00:00 GMT <![CDATA[

If you're familiar with data warehouses, this article will help you understand Materialize Clusters in relation to well-known components in Snowflake.

]]>
<![CDATA[CMU DB Talk: Building Materialize]]> https://materialize.com/blog/cmudb https://materialize.com/blog/cmudb Mon, 08 Jun 2020 00:00:00 GMT <![CDATA[

Arjun Narayan introduces the CMU DB group to streaming databases, the problems they solve, and specific architectural decisions in Materialize.

]]>
<![CDATA[Compile Times and Code Graphs]]> https://materialize.com/blog/compile-times-and-code-graphs https://materialize.com/blog/compile-times-and-code-graphs Fri, 27 Oct 2023 00:00:00 GMT <![CDATA[

Recently, I've felt the pain of long Rust compile times at Materialize, and so was motived to improve them a bit. Here's how I did it.

]]>
<![CDATA[Confluent & Materialize Expand Streaming | Materialize]]> https://materialize.com/blog/confluent-partnership https://materialize.com/blog/confluent-partnership Tue, 18 Jul 2023 00:00:00 GMT <![CDATA[

Materialize & Confluent partnership offers SQL on Kafka capabilities for efficient data team integration.

]]>
<![CDATA[Direct PostgreSQL Replication Stream Setup | Materialize]]> https://materialize.com/blog/connecting-materialize-directly-to-postgresql-via-the-replication-stream https://materialize.com/blog/connecting-materialize-directly-to-postgresql-via-the-replication-stream Wed, 16 Feb 2022 00:00:00 GMT <![CDATA[

Comprehensive guide on using PostgreSQL's write-ahead log as a data source for Materialize, with technical insights & benefits.

]]>
<![CDATA[

START_REPLICATION slot_name;

]]>
<![CDATA[Consistency Guarantees in Data Streaming | Materialize]]> https://materialize.com/blog/consistency https://materialize.com/blog/consistency Tue, 31 Mar 2020 00:00:00 GMT <![CDATA[

Understand the necessary consistency guarantees for a streaming data platform & how they ensure accurate data views.

]]>
<![CDATA[Real-Time Customer Data Platform Views on Materialize]]> https://materialize.com/blog/customer-data-platforms https://materialize.com/blog/customer-data-platforms Wed, 19 Oct 2022 00:00:00 GMT <![CDATA[

Let's demonstrate the unique features of Materialize by building the core functionality of a customer data platform.

]]>
<![CDATA[

const client = new Client({ user: MATERIALIZE_USERNAME, password: MATERIALIZE_PASSWORD, host: MATERIALIZE_HOST, port: 6875, database: 'materialize', ssl: true });

async function main() { await client.connect(); const res = await client.query("SELECT * FROM cdp_users WHERE uuid = 'ABC123'"); console.log(res.rows); }

main();

]]>
<![CDATA[Fresh Data, Complex Queries: A Guide for PostgreSQL Users]]> https://materialize.com/blog/data-queries-postgres https://materialize.com/blog/data-queries-postgres Fri, 04 Oct 2024 00:00:00 GMT <![CDATA[

Let's explore why many teams rely on PostgreSQL for analytics, the challenges they face, and how Materialize solves these problems.

]]>
<![CDATA[

WITH latest_orders AS ( SELECT * FROM orders WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) ), updated_totals AS ( SELECT customer_id, SUM(order_total) AS total_sales FROM latest_orders GROUP BY customer_id ), existing_totals AS ( SELECT customer_id, total_sales FROM {{ this }} WHERE customer_id NOT IN (SELECT customer_id FROM updated_totals) ) SELECT * FROM updated_totals UNION ALL SELECT * FROM existing_totals;

SELECT customer_id, SUM(order_total) AS total_sales FROM orders GROUP BY customer_id;

]]>
<![CDATA[dbt & Materialize: Streamline Jaffle Shop Demo | Materialize]]> https://materialize.com/blog/dbt-materialize-jaffle-shop-demo https://materialize.com/blog/dbt-materialize-jaffle-shop-demo Wed, 24 Mar 2021 00:00:00 GMT <![CDATA[

Let's demonstrate how to manage streaming SQL in Materialize with dbt by porting the classic dbt jaffle-shop demo scenario to the world of streaming.

]]>
<![CDATA[
   target: dev

See only the materialized views

materialize=> SHOW MATERIALIZED VIEWS IN jaffle_shop;

Output:

   name

dim_customers fct_orders raw_customers raw_orders raw_payments

Check out data in one of your core models

materialize=> SELECT * FROM jaffle_shop.dim_customers WHERE customer_id = 1;

Output:

customer_id | first_order | most_recent_order | number_of_orders | customer_lifetime_value ------------+-------------+-------------------+------------------+------------------------- 1 | 2018-01-01 | 2018-02-10 | 2 | 33

]]>
<![CDATA[How Materialize and other databases optimize SQL subqueries]]> https://materialize.com/blog/decorrelation-subquery-optimization https://materialize.com/blog/decorrelation-subquery-optimization Mon, 01 Mar 2021 00:00:00 GMT <![CDATA[

Insight into SQL subquery optimization & how Materialize's approach differs from other databases, enhancing query performance.

]]>
<![CDATA[

// Filter out null posts.user_id // (Materialize doesn't understand foreign constraints yet) %2 = | Get jamie.public.posts (u5) | Filter !(isnull(#1))

// Join %1 and %2 on users.id = posts.user_id // Group by users.id and count distinct posts.content %3 = | Join %1 %2 (= #0 #2) | | implementation = Differential %2 %1.(#0) | | demand = (#0, #3) | Filter !(isnull(#0)) | Reduce group=(#0) | | agg count(distinct #3)

// Request an index on users.id // (Materialize doesn't understand unique keys yet, so doesn't realize this index is redundant) %4 = | Get jamie.public.users (u3) | ArrangeBy (#0)

// Find values of users.id for which there are no posts and assign count 0 %5 = | Get %3 | Negate | Project (#0) %6 = | Union %5 %0 | Map 0

// Union the zero counts and the non-zero counts %7 = | Union %3 %6

// Join the results against users to recover row counts that were erased by the group-by above // (Materialize doesn't understand unique keys yet, so doesn't realize this join is redundant) %8 = | Join %4 %7 (= #0 #2) | | implementation = Differential %7 %4.(#0) | | demand = (#0, #3) | Project (#0, #3)

]]>
<![CDATA[Lower Data Freshness Costs for Teams | Materialize]]> https://materialize.com/blog/decouple-cost-and-freshness https://materialize.com/blog/decouple-cost-and-freshness Tue, 29 Aug 2023 00:00:00 GMT <![CDATA[

Materialize has a subtly different cost model that is a huge advantage for operational workloads that need fresh data.

]]>
<![CDATA[Strategies for Reducing Data Warehouse Costs: Part 3]]> https://materialize.com/blog/decrease-data-warehouse-bill https://materialize.com/blog/decrease-data-warehouse-bill Tue, 16 Apr 2024 00:00:00 GMT <![CDATA[

Decrease your data warehouse costs by sinking precomputed results and leveraging real-time analytics.

]]>
<![CDATA[Delta Joins and Late Materialization]]> https://materialize.com/blog/delta-joins https://materialize.com/blog/delta-joins Wed, 18 Jan 2023 00:00:00 GMT <![CDATA[

Understand how to optimize joins with indexes and late materialization.

]]>
<![CDATA[Building Differential Dataflow from Scratch]]> https://materialize.com/blog/differential-from-scratch https://materialize.com/blog/differential-from-scratch Thu, 09 Feb 2023 00:00:00 GMT <![CDATA[

Let's build (in Python) the Differential Dataflow framework at the heart of Materialize, and explain what it's doing along the way.

]]>
<![CDATA[

([((1, 1), 1), ((2, 4), 1), ((3, 9), 1), ((4, 16), 1), ((5, 25), 1)]

Define a function that produces for each input record, the set

{record * 2^0, record * 2^1, record * 2^2 ... record * 2^n} s.t.

the produced outputs are <= 50.

def geometric_series(collection): return ( collection.map(lambda data: data * 2) .concat(collection) .filter(lambda data: data <= 50) .map(lambda data: (data, ())) .distinct() .map(lambda data: data[0]) .consolidate() )

Iterate over the input and print outputs to stdout and connect a reader to

the output so that we can track progress.

output = input_a.iterate(geometric_series).debug("iterate").connect_reader() graph = graph_builder.finalize()

Keep doing work until the output advances to version 1.

while output.probe_frontier_less_than(Version(1)): graph.step()

output = input_a.iterate(example).connect_reader() graph = graph_builder.finalize()

input_a_writer.send_data(Version(0), Collection([(1, 1)])) input_a_writer.send_frontier(Antichain([Version(1)]))

while output.probe_frontier_less_than(Antichain([Version(1)])): graph.step()

]]>
<![CDATA[Doing business with recursive SQL]]> https://materialize.com/blog/doing-business-with-recursive-sql https://materialize.com/blog/doing-business-with-recursive-sql Mon, 12 Feb 2024 00:00:00 GMT <![CDATA[

Learn how recursive SQL provides an elegant solution for a fundamental use case in economics - stable matching.

]]>
<![CDATA[Eventual Consistency isn't for Streaming]]> https://materialize.com/blog/eventual-consistency-isnt-for-streaming https://materialize.com/blog/eventual-consistency-isnt-for-streaming Tue, 14 Jul 2020 00:00:00 GMT <![CDATA[

Understand why eventual consistency isn't suitable for streaming systems & the systematic errors it can cause with Materialize's insights.

]]>
<![CDATA[

-- select out surprisingly large values select data.key, data.value from data, stats_by_key where data.key = stats_by_key.key and data.value > average + 3 * devation

// Delayed map from values back to their keys. let input2 = data.delay(|t| t + 1) .map(|(key,val)| (val,key));

// Observe any results input2.semijoin(&input) .inspect(|x| println!("KEY: {:?}", x));

-- select out surprisingly large values select data.key, data.value from data, stats_by_key where data.key = stats_by_key.key and data.value > average + 3 * devation

// These collections should be always empty. let errors_a = histogram1a.concat(histogram2a.negate()); let errors_b = histogram1b.concat(histogram3b.negate()); let errors_c = histogram2c.concat(histogram3c.negate());

]]>
<![CDATA[Sync your data into Materialize with Fivetran]]> https://materialize.com/blog/fivetran-and-materialize https://materialize.com/blog/fivetran-and-materialize Mon, 22 Jul 2024 00:00:00 GMT <![CDATA[

A breakdown of how we built the Materialize Fivetran Destination with Fivetran's Partner SDK, and how this unlocks new workflows in Materialize.

]]>
<![CDATA[Real-Time Fraud Detection: Analytical vs. Operational Data Warehouses]]> https://materialize.com/blog/fraud-detection-latency-accuracy https://materialize.com/blog/fraud-detection-latency-accuracy Thu, 07 Mar 2024 00:00:00 GMT <![CDATA[

In this blog, we’ll explain the different roles of analytical and operational data warehouses in building real-time fraud detection systems.

]]>
<![CDATA[Freshness and Operational Autonomy]]> https://materialize.com/blog/freshness https://materialize.com/blog/freshness Thu, 12 Oct 2023 00:00:00 GMT <![CDATA[

At the heart of freshness in Materialize is autonomous proactive work, done in response to the arrival of data rather than waiting for a user command.

]]>
<![CDATA[Generalizing linear operators in differential dataflow]]> https://materialize.com/blog/generalizing-linear-operators https://materialize.com/blog/generalizing-linear-operators Thu, 29 Apr 2021 00:00:00 GMT <![CDATA[

Differential dataflow uses simple linear operators: map, filter, flat_map and complex: explode and temporal filter operators. But, with some thinking, we can generalize them all to a restricted form of join.

]]>
<![CDATA[Indexes: A Silent Frenemy]]> https://materialize.com/blog/indexes-a-silent-frenemy https://materialize.com/blog/indexes-a-silent-frenemy Wed, 27 Jul 2022 00:00:00 GMT <![CDATA[

Insights on how indexes impact scaling in databases & their evolution in streaming-first data warehouses.

]]>
<![CDATA[

-- 15 million rows INSERT INTO contacts SELECT 'Kelly' as name, generate_series(650000000, 665500000) as phone, 1 as prefix;

CREATE INDEX contacts_name_idx ON contacts (name); CREATE INDEX contacts_phone_idx ON contacts (phone);

ANALYZE contacts;

                          QUERY PLAN

Seq Scan on contacts (cost=0.00..277533.14 rows=15499931 width=14) Filter: (name = 'Kelly'::text)

                                 QUERY PLAN

Index Scan using contacts_phone_idx on contacts (cost=0.43..8.45 rows=1 width=14) Index Cond: (phone = 2)

count | name | dataflow_name -------+------------------------+---------------------------------- 3 | ArrangeBy[[Column(1)]] | Dataflow: 1.3.contacts_phone_idx

]]>
<![CDATA[Introducing: dbt + Materialize]]> https://materialize.com/blog/introducing-dbt-materialize https://materialize.com/blog/introducing-dbt-materialize Mon, 01 Mar 2021 00:00:00 GMT <![CDATA[

Efficient SQL data transformations & real-time analytics with dbt + Materialize: a powerful operational data warehouse combo.

]]>
<![CDATA[Introducing: Tailscale + Materialize]]> https://materialize.com/blog/introducing-tailscale-materialize https://materialize.com/blog/introducing-tailscale-materialize Wed, 19 Jan 2022 00:00:00 GMT <![CDATA[

Materialize Cloud integrates with Tailscale, offering secure & easy connection of clusters to private networks using WireGuard protocol.

]]>
<![CDATA[Introducing Materialize: the Streaming Data Warehouse]]> https://materialize.com/blog/introduction https://materialize.com/blog/introduction Tue, 18 Feb 2020 00:00:00 GMT <![CDATA[

Materialize offers a streaming data warehouse for real-time analytics & interoperability with millisecond latency, revolutionizing data handling.

]]>
<![CDATA[Incremental View Maintenance Replicas: Improve Database Stability and Accelerate Workloads]]> https://materialize.com/blog/ivm-database-replica https://materialize.com/blog/ivm-database-replica Wed, 14 Aug 2024 00:00:00 GMT <![CDATA[

IVMRs can deliver 1000x performance for read-heavy workloads, without losing freshness, and do so at a fraction of the price of a traditional replica.

]]>
<![CDATA[Join Kafka with a Database using Debezium and Materialize]]> https://materialize.com/blog/join-kafka-with-database-debezium-materialize https://materialize.com/blog/join-kafka-with-database-debezium-materialize Tue, 27 Apr 2021 00:00:00 GMT <![CDATA[

Debezium and Materialize can be used as powerful tools for joining high-volume streams of data from Kafka and tables from databases.

]]>
<![CDATA[

CREATE SOURCE users FROM KAFKA BROKER 'kafka:9092' TOPIC 'mysql.shop.users' FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://schema-registry:8081' ENVELOPE DEBEZIUM;

]]>
<![CDATA[Joins in Materialize]]> https://materialize.com/blog/joins-in-materialize https://materialize.com/blog/joins-in-materialize Mon, 14 Dec 2020 00:00:00 GMT <![CDATA[

Comprehensive guide to implementing joins in Materialize, covering binary to delta joins for efficient streaming systems.

]]>
<![CDATA[

Time: 12.927 ms materialize=>

]]>
<![CDATA[Kafka is not a Database]]> https://materialize.com/blog/kafka-is-not-a-database https://materialize.com/blog/kafka-is-not-a-database Tue, 08 Dec 2020 00:00:00 GMT <![CDATA[

In principle, it is possible to use Kafka as a database. But in doing so you will confront every hard problem that database management systems have faced for decades

]]>
<![CDATA[The Problem with Lying is Keeping Track of All the Lies]]> https://materialize.com/blog/keeping-track-lies https://materialize.com/blog/keeping-track-lies Wed, 05 Jun 2024 00:00:00 GMT <![CDATA[

Or why clear consistency guarantees are how to stay sane when programming distributed systems.

]]>
<![CDATA[Lateral Joins and Demand-Driven Queries]]> https://materialize.com/blog/lateral-joins-and-demand-driven-queries https://materialize.com/blog/lateral-joins-and-demand-driven-queries Tue, 18 Aug 2020 00:00:00 GMT <![CDATA[

Comprehensive guide to using Materialize's LATERAL join for efficient query patterns in incremental view maintenance engines.

]]>
<![CDATA[

INSERT INTO cities VALUES ('Los_Angeles', 'CA', 3979576), ('Phoenix', 'AZ', 1680992), ('Houston', 'TX', 2320268), ('San_Diego', 'CA', 1423851), ('San_Francisco', 'CA', 881549), ('New_York', 'NY', 8336817), ('Dallas', 'TX', 1343573), ('San_Antonio', 'TX', 1547253), ('San_Jose', 'CA', 1021795), ('Chicago', 'IL', 2695598), ('Austin', 'TX', 978908);

-- same query as above, but starting from queries. -- also, we materialize a view to build a dataflow. CREATE MATERIALIZED VIEW top_3s AS SELECT state, name FROM -- for each distinct state we are asked about ... (SELECT DISTINCT state FROM queries) states, -- ... extract the top 3 cities by population. LATERAL ( SELECT name, pop FROM cities WHERE state = states.state ORDER BY pop DESC LIMIT 3 );

materialize=>

materialize=>

%1 = | Get materialize.public.cities (u8544)

%2 = | Join %0 %1 (= #0 #2) | | implementation = Differential %1 %0.(#0) | | demand = (#0, #1, #3) | TopK group=(#0) order=(#3 desc) limit=3 offset=0 | Project (#0, #1)

%1 = | Get materialize.public.queries (u8548) | Distinct group=(#0) | ArrangeBy (#0)

%2 = | Get materialize.public.cities (u8544)

%3 = | Join %1 %2 (= #0 #2) | | implementation = Differential %2 %1.(#0) | | demand = (#0, #1, #3) | TopK group=(#0) order=(#3 desc) limit=3 offset=0

%4 = | Join %0 %3 (= #0 #2) | | implementation = Differential %3 %0.(#0) | | demand = (#0, #1, #3) | Project (#1, #0, #3)

]]>
<![CDATA[Let’s talk about Data Apps]]> https://materialize.com/blog/lets-talk-about-data-apps https://materialize.com/blog/lets-talk-about-data-apps Thu, 09 Jun 2022 00:00:00 GMT <![CDATA[

What is a Data Application? How do they help our customers? What new challenges do we face when building Data Apps? Here's our perspective.

]]>
<![CDATA[Understanding Differential Dataflow]]> https://materialize.com/blog/life-in-differential-dataflow https://materialize.com/blog/life-in-differential-dataflow Mon, 11 Jan 2021 00:00:00 GMT <![CDATA[

How to write algorithms in differential dataflow, using Conway's Game of Life as an example.

]]>
<![CDATA[

fn intersection(first: &[i32], second: &[i32]) -> Vec<i32> { let mut output = Vec::new();

let first_set: HashSet&lt;_&gt; = first.iter().cloned().collect();
let second_set: HashSet&lt;_&gt; = second.iter().cloned().collect();

for element in first_set.iter() {
    if second_set.contains(element) {
        output.push(*element);
    }
}

output

}

// Send some sample data to our dataflow
for i in 0..10 {
    // Advance time to i
    first.advance_to(i);
    second.advance_to(i);

    for x in i..(i + 10) {
        first.insert(x);
        second.insert(x + 5);
    }
}

})

multiplicity: 1

(x, str.to_string())

}); let output = input.concat(&successors).distinct();

    let result = initial.iterate(|input| {
        let successors = input.map(|(x, _)| x + 1).map(|x| {
            let str = if x % 3 == 0 &amp;&amp; x % 5 == 0 {
                "FizzBuzz"
            } else if x % 5 == 0 {
                "Buzz"
            } else if x % 3 == 0 {
                "Fizz"
            } else {
                ""
            };

            (x, str.to_string())
        });
        let output = input.concat(&amp;successors).distinct();
        output.filter(|(x, _)| *x &lt;= 100)
   });
   result
       .inspect(|(x, time, m)| println!("x: {:?} time: {:?} multiplicity: {}", x, time, m));
});

})

let live_with_three_neighbors = maybe_live_cells
    .filter(|(_, count)| *count == 3)
    .map(|(cell, _)| cell);
let live_with_two_neighbors = maybe_live_cells
    .filter(|(_, count)| *count == 2)
    .semijoin(&amp;live)
    .map(|(cell, _)| cell);

let live_next_round = live_with_two_neighbors
    .concat(&amp;live_with_three_neighbors)
    .distinct();

live_next_round

})

]]>
<![CDATA[Live Maintained Views on Boston Transit to Run at Home]]> https://materialize.com/blog/live-maintained-views-on-boston-transit-to-run-at-home https://materialize.com/blog/live-maintained-views-on-boston-transit-to-run-at-home Wed, 02 Dec 2020 00:00:00 GMT <![CDATA[

Real-time apps for Boston Transit with live data are easy to set up using Materialize; see two examples you can run at home.

]]>
<![CDATA[

CREATE MATERIALIZED SOURCE mbta_stops FROM FILE '/workdir/workspace/MBTA_GTFS/stops.txt' FORMAT CSV WITH HEADER;

CREATE MATERIALIZED SOURCE mbta_routes FROM FILE '/workdir/workspace/MBTA_GTFS/routes.txt' FORMAT CSV WITH HEADER;

SELECT * FROM south_from_kendall ORDER BY departure_time;

CREATE MATERIALIZED VIEW parsed_all_trip as SELECT trip_id, payload->'attributes'->>'bikes_allowed' bikes_allowed, CAST(CAST(payload->'attributes'->>'direction_id' AS DECIMAL(5,1)) AS INT) direction_id, payload->'attributes'->>'headsign' headsign, payload->'attributes'->>'wheelchair_accessible' wheelchair_accessible, payload->'relationships'->'route'->'data'->>'id' route_id, payload->'relationships'->'route_pattern'->'data'->>'id' route_pattern_id, payload->'relationships'->'service'->'data'->>'id' service_id, payload->'relationships'->'shape'->'data'->>'id' shape_id FROM (SELECT key0 as trip_id, cast ("text" as jsonb) AS payload FROM all_trip);

CREATE MATERIALIZED VIEW parsed_all_vehicles as SELECT vehicle_id, payload->'attributes'->>'current_status' status, CAST(CAST(payload->'attributes'->>'direction_id' AS DECIMAL(5,1)) AS INT) direction_id, payload->'relationships'->'route'->'data'->>'id' route_id, payload->'relationships'->'stop'->'data'->>'id' stop_id, payload->'relationships'->'trip'->'data'->>'id' trip_id FROM (SELECT key0 as vehicle_id, cast ("text" as jsonb) AS payload FROM all_vehicles);

CREATE MATERIALIZED VIEW current_time_v AS SELECT max(to_timestamp(cast(text as int))) AS now FROM current_time;

CREATE INDEX countdown_stop_dir_rt ON countdown(stop_name, direction, route_name);

CREATE INDEX one_leg_stops ON one_leg_travel_time(origin, destination);

SELECT departure_time, arrival_time, headsign FROM one_leg_travel_time WHERE origin = 'Kendall/MIT' and destination = 'South Station' ORDER BY arrival_time;

]]>
<![CDATA[Loan Underwriting Process: The Move to Big Data & SQL]]> https://materialize.com/blog/loan-underwriting-big-data-sql https://materialize.com/blog/loan-underwriting-big-data-sql Tue, 07 May 2024 00:00:00 GMT <![CDATA[

In this blog, we'll examine the loan underwriting process, including the current landscape, credit modeling, and the move toward big data and SQL.

]]>
<![CDATA[Loan Underwriting: Real-Time Data Architectures]]> https://materialize.com/blog/loan-underwriting-real-time-data-architectures https://materialize.com/blog/loan-underwriting-real-time-data-architectures Wed, 08 May 2024 00:00:00 GMT <![CDATA[

This blog will provide an overview of the different data architectures lenders use to power real-time loan underwriting.

]]>
<![CDATA[Strategies for Reducing Data Warehouse Costs: Part 2]]> https://materialize.com/blog/lower-data-warehouse-costs https://materialize.com/blog/lower-data-warehouse-costs Tue, 16 Apr 2024 00:00:00 GMT <![CDATA[

Here's how to save money on your data warehouse bill with normalized data models and data mesh principles.

]]>
<![CDATA[Maintaining Joins using Few Resources]]> https://materialize.com/blog/maintaining-joins-using-few-resources https://materialize.com/blog/maintaining-joins-using-few-resources Wed, 02 Jun 2021 00:00:00 GMT <![CDATA[

Efficiently maintain joins with shared arrangements & reduce resource usage with Materialize's innovative approach.

]]>
<![CDATA[The Making of Self-Managed Materialize: Flexible Deployments Explained]]> https://materialize.com/blog/making-self-managed-materialize-flexible-deployments https://materialize.com/blog/making-self-managed-materialize-flexible-deployments Tue, 17 Dec 2024 00:00:00 GMT <![CDATA[

Discover how Materialize engineered its self-managed product to support flexible deployments, improve architecture, and meet diverse customer needs, all while refining its managed cloud service.

]]>
<![CDATA[Managing memory with differential dataflow]]> https://materialize.com/blog/managing-memory-with-differential-dataflow https://materialize.com/blog/managing-memory-with-differential-dataflow Tue, 05 May 2020 00:00:00 GMT <![CDATA[

Insights on how Differential Dataflow manages & limits memory use for processing unbounded data streams, ensuring efficiency.

]]>
<![CDATA[
// Build a dataflow to present most recent values for keys.
worker.dataflow(|scope| {

    use differential_dataflow::operators::reduce::Reduce;

    // Determine the most recent inputs for each key.
    input
        .to_collection(scope);
        .reduce(|_key, input, output| {
            // Emit the last value with a count of 1
            let max = input.last().unwrap();
            output.push((*max.0, 1));
        })
        .probe_with(&amp;mut probe);
});

loop {
    // Refresh our view of elapsed time.
    let elapsed = worker.timer().elapsed();

    // Refresh the maximum gap between elapsed and completed times.
    // Important: this varies based on rate; low rate ups the latency.
    let completed = probe.with_frontier(|frontier| frontier[0]);
    if max_latency &lt; elapsed - completed {
        max_latency = elapsed - completed;
    }

    // Report how large a gap we just experienced.
    if input.time().as_secs() != elapsed.as_secs() {
        println!("{:?}\tmax latency: {:?}", elapsed, max_latency);
    }

    // Insert any newly released requests.
    while pause * req_counter &lt; elapsed {
        input.advance_to(pause * req_counter);
        input.insert((0, pause * req_counter));
        req_counter += worker.peers() as u32;
    }
    input.advance_to(elapsed);
    input.flush();

    // Take just one step! (perhaps we should take more)
    worker.step();
}

// Build a dataflow to present most recent values for keys.
worker.dataflow(|scope| {

    use differential_dataflow::operators::reduce::Reduce;

    // Give input its own name to re-use later.
    let input = input.to_collection(scope);

    // Determine the most recent inputs for each key.
    let results = input
        .reduce(|_key, input, output| {
            // Emit the last value with a count of 1
            let max = input.last().unwrap();
            output.push((*max.0, 1));
        })
        .probe_with(&amp;mut probe);

    // Retract any input not present in the ouput.
    let retractions = input.concat(&amp;results.negate());
});

    use differential_dataflow::operators::reduce::Reduce;
    use differential_dataflow::operators::iterate::Variable;

    // Prepare some delayed feedback from the output.
    // Explanation of `delay` deferred for the moment.
    let delay = Duration::from_nanos(delay_ns);
    let retractions = Variable::new(scope, delay);

    // Give input its own name to re-use later.
    let input = input.to_collection(scope);

    // Determine the results minus any retractions.
    let results = input
        .concat(&amp;retractions.negate())
        .reduce(|_key, input, output| {
            let max = input.last().unwrap();
            output.push((*max.0, max.1));
        })
        .probe_with(&amp;mut probe);

    // Retract any input that is not an output.
    retractions.set(&amp;input.concat(&amp;results.negate()));

});

]]>
<![CDATA[Managing streaming analytics pipelines with dbt]]> https://materialize.com/blog/managing-streaming-analytics-pipelines-with-dbt https://materialize.com/blog/managing-streaming-analytics-pipelines-with-dbt Wed, 15 Jun 2022 00:00:00 GMT <![CDATA[

Using dbt to manage and document a streaming analytics workflow from a message broker to Metabase.

]]>
<![CDATA[Materialize and Memory]]> https://materialize.com/blog/materialize-and-memory https://materialize.com/blog/materialize-and-memory Thu, 16 May 2024 00:00:00 GMT <![CDATA[

We reduced memory requirements for many users by nearly 2x, resulting in significant cost-savings.

]]>
<![CDATA[The Software Architecture of Materialize]]> https://materialize.com/blog/materialize-architecture https://materialize.com/blog/materialize-architecture Thu, 23 Feb 2023 00:00:00 GMT <![CDATA[

Materialize aims to be usable by anyone who knows SQL, but for those interested in going deeper and understanding the architecture powering Materialize, this post is for you!

]]>
<![CDATA[Celebrating our newest partnership at Data Cloud Summit]]> https://materialize.com/blog/materialize-at-snowflake-summit https://materialize.com/blog/materialize-at-snowflake-summit Thu, 30 May 2024 00:00:00 GMT <![CDATA[

Materialize is now partners with Snowflake. Celebrate with us next week at Snowflake Data Cloud Summit.

]]>
<![CDATA[Materialize: More Cost-Effective than Aurora Read Replicas]]> https://materialize.com/blog/materialize-aurora-read-replica-cost https://materialize.com/blog/materialize-aurora-read-replica-cost Mon, 09 Sep 2024 00:00:00 GMT <![CDATA[

Materialize costs 1/20th what Aurora PostgreSQL read replicas cost, when you have non-trivial business logic.

]]>
<![CDATA[Materialize Beta: The Details]]> https://materialize.com/blog/materialize-beta-the-details https://materialize.com/blog/materialize-beta-the-details Thu, 20 Feb 2020 00:00:00 GMT <![CDATA[

Materialize Beta offers insights on a cloud data warehouse with real-time streaming capabilities for immediate action on current data.

]]>
<![CDATA[Materialize Cloud Enters Open Beta]]> https://materialize.com/blog/materialize-cloud-open-beta https://materialize.com/blog/materialize-cloud-open-beta Mon, 13 Sep 2021 00:00:00 GMT <![CDATA[

Materialize Cloud, now in open beta, offers real-time data warehousing for immediate insights & action on live data.

]]>
<![CDATA[Supporting Open Source: Materialize’s Community Sponsorship Program]]> https://materialize.com/blog/materialize-community-sponsorship-program https://materialize.com/blog/materialize-community-sponsorship-program Wed, 25 Sep 2024 00:00:00 GMT <![CDATA[

Read about how we give back to the open source community through our Community Sponsorship Program.

]]>
<![CDATA[Announcing the Materialize Integration with Cube]]> https://materialize.com/blog/materialize-cube-integration https://materialize.com/blog/materialize-cube-integration Fri, 13 May 2022 00:00:00 GMT <![CDATA[

Connect headless BI tool Cube.js to the read-side of Materialize to get Rest/GraphQL API's, Authentication, metrics modelling, and more out of the box.

]]>
<![CDATA[

services: materialize: image: materialize/materialized:v0.26.1 ports: - 6875:6875

seed: image: jbergknoff/postgresql-client volumes: - .:/seed entrypoint: ['sh', 'seed/seed.sh'] depends_on: - materialize

cube: image: cubejs/cube:latest ports: - 4000:4000 environment: - CUBEJS_DEV_MODE=true - CUBEJS_DB_TYPE=materialize - CUBEJS_DB_HOST=materialize - CUBEJS_DB_PORT=6875 - CUBEJS_DB_NAME=materialize - CUBEJS_DB_USER=materialize - CUBEJS_API_SECRET=SECRET volumes: - .:/cube/conf depends_on: - seed

cat > seed.sql << EOL

CREATE SOURCE hn_raw FROM PUBNUB SUBSCRIBE KEY 'sub-c-c00db4fc-a1e7-11e6-8bfd-0619f8945a4f' CHANNEL 'hacker-news';

CREATE VIEW hn AS SELECT (item::jsonb)->>'link' AS link, (item::jsonb)->>'comments' AS comments, (item::jsonb)->>'title' AS title, ((item::jsonb)->>'rank')::int AS rank FROM ( SELECT jsonb_array_elements(text::jsonb) AS item FROM hn_raw );

CREATE MATERIALIZED VIEW hn_top AS SELECT link, comments, title, MIN(rank) AS rank FROM hn GROUP BY 1, 2, 3;

EOL

psql -U materialize -h materialize -p 6875 materialize -f ./seed.sql

refreshKey: { every: '1 second' }, measures: { count: { type: count },

countTop3: {
  type: `count`,
  filters: [
    {
      sql: `${rank} &lt;= 3`
    }
  ]
},

bestRank: {
  sql: `rank`,
  type: `min`
}

}, dimensions: { link: { sql: link, type: string },

comments: {
  sql: `comments`,
  type: `string`
},

title: {
  sql: `title`,
  type: `string`
},

rank: {
  sql: `rank`,
  type: `number`
}

},

segments: { show: { sql: ${title} LIKE 'Show HN:%' } } });

]]>
<![CDATA[Materialize & Datalot: Real-time Application Development]]> https://materialize.com/blog/materialize-datalot-real-time-application-development https://materialize.com/blog/materialize-datalot-real-time-application-development Thu, 05 Aug 2021 00:00:00 GMT <![CDATA[

Materialize & Datalot collaborate on cutting-edge real-time application development, leveraging streaming data for immediate insights & action.

]]>
<![CDATA[How to Use the Materialize Emulator]]> https://materialize.com/blog/materialize-emulator https://materialize.com/blog/materialize-emulator Thu, 10 Oct 2024 00:00:00 GMT <![CDATA[

Here's a step-by-step walkthrough of how to use the Materialize Emulator.

]]>
<![CDATA[

Issue a SQL query to get started. Need help? View documentation: https://materialize.com/s/docs Join our Slack community: https://materialize.com/s/chat

psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 9.5.0) Type "help" for help.

materialize=>

postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES; CREATE PUBLICATION postgres=# CREATE TABLE t (f1 INTEGER); CREATE TABLE postgres=# ALTER TABLE t REPLICA IDENTITY FULL; ALTER TABLE postgres=# INSERT INTO t VALUES (1), (2), (3); INSERT 0 3

materialize=> SELECT * FROM mv; sum

1000400050002 (1 row) Time: 40.362 ms

PREF="${PWD##*/}"

wait_for_health() { echo -n "waiting for container '$PREF-$1' to be healthy" while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do echo -n "." sleep 1 done printf "\ncontainer '%s' is healthy\n" "$PREF-$1" }

cat > docker-compose.yml <<EOF version: '3.8' services: materialized: image: materialize/materialized:latest container_name: $PREF-materialized environment: MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true" networks: - network ports: - "127.0.0.1:6875:6875" - "127.0.0.1:6876:6876" healthcheck: test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"] interval: 1s start_period: 60s

postgres: image: postgres:latest container_name: $PREF-postgres environment: POSTGRES_PASSWORD: postgres POSTGRES_INITDB_ARGS: "-c wal_level=logical" networks: - network ports: - "127.0.0.1:5432:5432" healthcheck: test: ["CMD", "pg_isready", "-d", "db_prod"] interval: 1s start_period: 60s

mysql: image: mysql:latest container_name: $PREF-mysql environment: MYSQL_ROOT_PASSWORD: mysql networks: - network ports: - "127.0.0.1:3306:3306" command: - "--log-bin=mysql-bin" - "--gtid_mode=ON" - "--enforce_gtid_consistency=ON" - "--binlog-format=row" - "--binlog-row-image=full" healthcheck: test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"] interval: 1s start_period: 60s

redpanda: image: vectorized/redpanda:latest container_name: $PREF-redpanda networks: - network ports: - "127.0.0.1:9092:9092" - "127.0.0.1:8081:8081" command: - "redpanda" - "start" - "--overprovisioned" - "--smp=1" - "--memory=1G" - "--reserve-memory=0M" - "--node-id=0" - "--check=false" - "--set" - "redpanda.enable_transactions=true" - "--set" - "redpanda.enable_idempotence=true" - "--set" - "--advertise-kafka-addr=redpanda:9092" healthcheck: test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"] interval: 1s start_period: 60s

minio: image: minio/minio:latest container_name: $PREF-minio environment: MINIO_STORAGE_CLASS_STANDARD: "EC:0" networks: - network ports: - "127.0.0.1:9000:9000" - "127.0.0.1:9001:9001" entrypoint: ["sh", "-c"] command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"] healthcheck: test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"] interval: 1s start_period: 60s

networks: network: driver: bridge EOF docker compose down || true docker compose up -d

wait_for_health postgres psql postgres://postgres:[email protected]:5432/postgres <<EOF CREATE PUBLICATION mz_source FOR ALL TABLES; CREATE TABLE pg_table (f1 INTEGER); ALTER TABLE pg_table REPLICA IDENTITY FULL; INSERT INTO pg_table VALUES (1), (2), (3); EOF

wait_for_health mysql mysql --protocol=tcp --user=root --password=mysql <<EOF CREATE DATABASE public; USE public; CREATE TABLE mysql_table (f1 INTEGER); INSERT INTO mysql_table VALUES (1), (2), (3); EOF

wait_for_health redpanda docker compose exec -T redpanda rpk topic create redpanda_table docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF {"f1": 1} {"f1": 2} {"f1": 3} EOF

wait_for_health materialized psql postgres://[email protected]:6875/materialize <<EOF -- Create a Postgres source CREATE SECRET pgpass AS 'postgres'; CREATE CONNECTION pg TO POSTGRES ( HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass ); CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg ( PUBLICATION 'mz_source' ) FOR SCHEMAS (public);

-- Create a MySQL source CREATE SECRET mysqlpass AS 'mysql'; CREATE CONNECTION mysql TO MYSQL ( HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass ); CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;

-- Create a Webhook source CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;

-- Create a Redpanda (Kafka-compatible) source CREATE CONNECTION kafka_conn TO KAFKA ( BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT ); CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY ( URL 'http://$PREF-redpanda:8081/' ); CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn ( TOPIC 'redpanda_table' ) FORMAT JSON;

-- Simple materialized view, incrementally updated, with data from all sources CREATE MATERIALIZED VIEW mv AS SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int + (redpanda_table.data->'f1')::int) FROM pg_table JOIN mysql_table ON TRUE JOIN webhook_table ON TRUE JOIN redpanda_table ON TRUE;

-- Create a sink to Redpanda so that the topic will always be up to date CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn ENVELOPE DEBEZIUM;

-- One-off export of our materialized view to S3-compatible MinIO CREATE SECRET miniopass AS 'minioadmin'; CREATE CONNECTION minio TO AWS ( ENDPOINT 'http://minio:9000', REGION 'minio', ACCESS KEY ID 'minioadmin', SECRET ACCESS KEY SECRET miniopass ); COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH ( AWS CONNECTION = minio, FORMAT = 'csv' );

-- Allow HTTP API read requests without a token CREATE ROLE anonymous_http_user; GRANT SELECT ON TABLE mv TO anonymous_http_user; EOF

Write additional data into Webhook source

curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table

Read latest data from Redpanda

docker compose exec -T redpanda rpk topic consume mv --num 1

CSV exists on S3-compatible MinIO

docker compose exec -T minio mc ls data/mzemulator/mv

Use Postgres wire-compatible

psql postgres://[email protected]:6875/materialize <<EOF SELECT * FROM pg_table; SELECT * FROM mysql_table; SELECT * FROM webhook_table; SELECT * FROM redpanda_table; SELECT * FROM mv; EOF

Use HTTP API

curl -s -X POST -H "Content-Type: application/json"
--data '{"queries": [{"query": "SELECT * FROM mv"}]}'
http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"

]]>
<![CDATA[Materialize Secures $60M Series C Funding | Materialize]]> https://materialize.com/blog/materialize-raises-a-series-c https://materialize.com/blog/materialize-raises-a-series-c Thu, 30 Sep 2021 00:00:00 GMT <![CDATA[

Materialize raises another round of funding to help build a cloud-native streaming data warehouse.

]]>
<![CDATA[Materialize Raises a Series B]]> https://materialize.com/blog/materialize-series-b https://materialize.com/blog/materialize-series-b Mon, 30 Nov 2020 00:00:00 GMT <![CDATA[

Materialize secures Series B funding to enhance its Operational Data Warehouse with real-time streaming capabilities for immediate data action.

]]>
<![CDATA[Materialize's unbundled cloud architecture]]> https://materialize.com/blog/materialize-unbundled https://materialize.com/blog/materialize-unbundled Fri, 06 May 2022 00:00:00 GMT <![CDATA[

Materialize's new cloud architecture enhances scalability & performance by breaking the materialized binary into separate services.

]]>
<![CDATA[Materialize under the Hood]]> https://materialize.com/blog/materialize-under-the-hood https://materialize.com/blog/materialize-under-the-hood Wed, 30 Sep 2020 00:00:00 GMT <![CDATA[

An in-depth look at Materialize, the Operational Data Warehouse with streaming capabilities for real-time data action.

]]>
<![CDATA[Migrating from dbt-postgres to dbt-materialize]]> https://materialize.com/blog/migrating-postgres-materialize https://materialize.com/blog/migrating-postgres-materialize Wed, 02 Oct 2024 00:00:00 GMT <![CDATA[

In this guide, we’ll show you how to migrate your existing PostgreSQL dbt project to Materialize with minimal SQL tweaks.

]]>
<![CDATA[

SELECT customer_id, SUM(order_total) AS total_revenue FROM orders GROUP BY customer_id;

WITH latest_orders AS ( SELECT * FROM {{ source('public', 'orders') }} WHERE updated_at > (SELECT COALESCE(MAX(updated_at), '1900-01-01'::timestamp) FROM {{ this }}) ),

updated_customers AS ( SELECT customer_id, SUM(order_total) AS total_revenue FROM latest_orders GROUP BY customer_id ),

existing_customers AS ( SELECT customer_id, total_revenue FROM {{ this }} WHERE customer_id NOT IN (SELECT customer_id FROM updated_customers) )

SELECT * FROM updated_customers UNION ALL SELECT * FROM existing_customers

SELECT customer_id, SUM(order_total) AS total_revenue FROM orders GROUP BY customer_id;

SELECT customer_id, SUM(order_value) AS total_value FROM orders GROUP BY customer_id HAVING SUM(order_value) > 1000;

SELECT order_id, customer_id, order_total, order_date FROM orders WHERE order_date + INTERVAL ‘24 hours’ >= mz_now();

]]>
<![CDATA[The Missing Element in Your Data Architecture]]> https://materialize.com/blog/missing-element-data-architecture https://materialize.com/blog/missing-element-data-architecture Wed, 26 Jun 2024 00:00:00 GMT <![CDATA[

Learn how replacing the legacy materialized view with a new element is transformational for your data stack.

]]>
<![CDATA[Shifting Workloads from Data Warehouses | Materialize]]> https://materialize.com/blog/moving-workloads-from-warehouses https://materialize.com/blog/moving-workloads-from-warehouses Fri, 02 Jun 2023 00:00:00 GMT <![CDATA[

A framework for understanding why and when to shift a workload from traditional cloud data warehouses to Materialize.

]]>
<![CDATA[Native MySQL Source, now in Private Preview]]> https://materialize.com/blog/mysql-source-private-preview https://materialize.com/blog/mysql-source-private-preview Fri, 15 Mar 2024 00:00:00 GMT <![CDATA[

Access the freshest data in MySQL to power your operational workflows

]]>
<![CDATA[Announcing our new CEO: Nate Stewart]]> https://materialize.com/blog/new-ceo-nate-stewart https://materialize.com/blog/new-ceo-nate-stewart Mon, 08 Apr 2024 00:00:00 GMT <![CDATA[

Materialize welcomes new CEO Nate Stewart, who previously served on the Materialize board and comes to us from Cockroach Labs.

]]>
<![CDATA[Now Generally Available: New Cluster Sizes]]> https://materialize.com/blog/new-cluster-sizes https://materialize.com/blog/new-cluster-sizes Wed, 01 May 2024 00:00:00 GMT <![CDATA[

New names, new sizes, plus spill-to-disk capabilities

]]>
<![CDATA[Announcing the next generation of Materialize]]> https://materialize.com/blog/next-generation https://materialize.com/blog/next-generation Mon, 03 Oct 2022 00:00:00 GMT <![CDATA[

Today, we’re excited to announce a product that we feel is transformational: a persistent, scalable, cloud-native Materialize.

]]>
<![CDATA[Materialize + Novu: Real-Time Alerting Powered by a Cloud Operational Data Store]]> https://materialize.com/blog/novu-materialize-real-time-alerting https://materialize.com/blog/novu-materialize-real-time-alerting Thu, 12 Sep 2024 00:00:00 GMT <![CDATA[

In the following blog, we’ll show you how to create real-time alerts using Materialize’s integration with Novu.

]]>
<![CDATA[

INSERT INTO materialize.auction.auction_alerts VALUES ('expensive pizza', 90, 'Best Pizza in Town' ), ('all art', 0, 'Custom Art');

CREATE VIEW active_alerts AS SELECT alert_name, id as auction_id, item_name, amount as price FROM ( SELECT id, item, amount FROM materialize.auction.winning_bids ) p, LATERAL ( SELECT price_above, item_name, alert_name FROM materialize.auction.auction_alerts a WHERE a.item_name = p.item AND a.price_above <= p.amount );

CREATE INDEX active_alerts_idx ON active_alerts (alert_name,alert_name) WITH (RETAIN HISTORY FOR '1hr');

]]>
<![CDATA[Transforming Real-Time Data with Operational Data Stores: A Dynamic Pricing Use Case]]> https://materialize.com/blog/ods-ecommerce-demo https://materialize.com/blog/ods-ecommerce-demo Wed, 23 Oct 2024 00:00:00 GMT <![CDATA[

To showcase the power of an ODS, we’ve developed a demo for an e-commerce company, based on a dynamic pricing use case.

]]>
<![CDATA[
promotion_effect AS (
    SELECT
        p.product_id,
        min(pr.promotion_discount) AS promotion_discount
    FROM public.promotions AS pr
    INNER JOIN public.products AS p ON pr.product_id = p.product_id
    WHERE pr.active = TRUE
    GROUP BY p.product_id
),

popularity_score AS (
    SELECT
        s.product_id,
        rank() OVER (PARTITION BY p.category_id ORDER BY count(s.sale_id) DESC) AS popularity_rank,
        count(s.sale_id) AS sale_count
    FROM public.sales AS s
    INNER JOIN public.products AS p ON s.product_id = p.product_id
    GROUP BY s.product_id, p.category_id
),

inventory_status AS (
    SELECT
        i.product_id,
        sum(i.stock) AS total_stock,
        rank() OVER (ORDER BY sum(i.stock) DESC) AS stock_rank
    FROM public.inventory AS i
    GROUP BY i.product_id
),

high_demand_products AS (
    SELECT
        p.product_id,
        avg(s.sale_price) AS avg_sale_price,
        count(s.sale_id) AS total_sales
    FROM public.products AS p
    INNER JOIN public.sales AS s ON p.product_id = s.product_id
    GROUP BY p.product_id
    HAVING count(s.sale_id) &gt; (SELECT avg(total_sales) FROM (SELECT count(*) AS total_sales FROM public.sales GROUP BY product_id) AS subquery)
),

dynamic_pricing AS (
    SELECT
        p.product_id,
        p.base_price,
        CASE
            WHEN pop.popularity_rank &lt;= 3 THEN 1.2
            WHEN pop.popularity_rank BETWEEN 4 AND 10 THEN 1.1
            ELSE 0.9
        END AS popularity_adjustment,
        rp.avg_price,
        coalesce(1.0 - (pe.promotion_discount / 100), 1) AS promotion_discount,
        CASE
            WHEN inv.stock_rank &lt;= 3 THEN 1.1
            WHEN inv.stock_rank BETWEEN 4 AND 10 THEN 1.05
            ELSE 1
        END AS stock_adjustment,
        CASE
            WHEN p.base_price &gt; rp.avg_price THEN 1 + (p.base_price - rp.avg_price) / rp.avg_price
            ELSE 1 - (rp.avg_price - p.base_price) / rp.avg_price
        END AS demand_multiplier,
        hd.avg_sale_price,
        CASE
            WHEN p.product_name ILIKE '%cheap%' THEN 0.8
            ELSE 1.0
        END AS additional_discount
    FROM public.products AS p
    LEFT JOIN recent_prices AS rp ON p.product_id = rp.product_id
    LEFT JOIN promotion_effect AS pe ON p.product_id = pe.product_id
    INNER JOIN popularity_score AS pop ON p.product_id = pop.product_id
    LEFT JOIN inventory_status AS inv ON p.product_id = inv.product_id
    LEFT JOIN high_demand_products AS hd ON p.product_id = hd.product_id
)

SELECT dp.product_id, round(dp.base_price * dp.popularity_adjustment * dp.stock_adjustment * dp.demand_multiplier, 2) AS adjusted_price, round(dp.base_price * dp.popularity_adjustment * dp.stock_adjustment * dp.demand_multiplier * dp.promotion_discount * dp.additional_discount, 2) AS discounted_price FROM dynamic_pricing AS dp;

ALTER TABLE public.inventory ADD CONSTRAINT inventory_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id); ALTER TABLE public.promotions ADD CONSTRAINT promotions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id); ALTER TABLE public.sales ADD CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);

CREATE INDEX idx_products_product_name ON products (product_name); CREATE INDEX idx_sales_product_id ON sales (product_id); CREATE INDEX idx_sales_sale_date ON sales (sale_date); CREATE INDEX idx_sales_product_id_sale_date ON sales (product_id, sale_date); CREATE INDEX idx_promotions_product_id ON promotions (product_id); CREATE INDEX idx_promotions_active ON promotions (active); CREATE INDEX idx_promotions_product_id_active ON promotions (product_id, active); CREATE INDEX idx_inventory_product_id ON inventory (product_id);

]]>
<![CDATA[OLTP Queries: Transfer Expensive Workloads to Materialize]]> https://materialize.com/blog/oltp-queries https://materialize.com/blog/oltp-queries Thu, 01 Aug 2024 00:00:00 GMT <![CDATA[

There are many different methods for OLTP offload, and in the following blog, we will examine the most popular options.

]]>
<![CDATA[OLTP Workloads: Offload Complex Queries From Your Operational Database]]> https://materialize.com/blog/oltp-workloads https://materialize.com/blog/oltp-workloads Tue, 23 Jul 2024 00:00:00 GMT <![CDATA[

Read the following blog to learn about OLTP vs. OLAP, problems with complex OLTP workloads, and the case for OLTP offload.

]]>
<![CDATA[View Maintenance: A New Approach to Data Processing]]> https://materialize.com/blog/olvm https://materialize.com/blog/olvm Mon, 24 Feb 2020 00:00:00 GMT <![CDATA[

Materialize's approach to data processing & view maintenance offers real-time insights for immediate action on live data.

]]>
<![CDATA[A guided tour through Materialize's product principles]]> https://materialize.com/blog/operational-attributes https://materialize.com/blog/operational-attributes Fri, 22 Sep 2023 00:00:00 GMT <![CDATA[

Take a guided tour through Materialize's three pillars of product value, and see how we think about providing value for your operational workloads.

]]>
<![CDATA[Consistency and Operational Confidence]]> https://materialize.com/blog/operational-consistency https://materialize.com/blog/operational-consistency Tue, 26 Sep 2023 00:00:00 GMT <![CDATA[

Materialize's consistency guarantees are key for confidence in data warehouses. Understand the benefits & see real-world tests in action.

]]>
<![CDATA[

-- Maintain the credits owed by each account. CREATE MATERIALIZED VIEW debits AS SELECT buyer, SUM(amount) AS total FROM winning_bids GROUP BY buyer;

-- Maintain the net balance for each account. CREATE VIEW balance AS SELECT coalesce(seller, buyer) as id, coalesce(credits.total, 0) - coalesce(debits.total, 0) AS total FROM credits FULL OUTER JOIN debits ON(credits.seller = debits.buyer);

-- This will always equal zero. SELECT SUM (total) FROM balance;

]]>
<![CDATA[Demonstrating Operational Data with SQL]]> https://materialize.com/blog/operational-data-sql https://materialize.com/blog/operational-data-sql Wed, 17 Jul 2024 00:00:00 GMT <![CDATA[

In this post, we'll build a recipe for a generic live data source using standard SQL primitives and some Materialize magic.

]]>
<![CDATA[

Time: 52.580 ms

Time: 19.711 ms

Time: 87428.589 ms (01:27.429)

Time: 61.283 ms

We have enough auctions that some folks will be both buyers and sellers, and for some fraction of them its the same item for an increased price.materialize=> select count(*) from potential_flips; count

9755 (1 row)

Time: 602.481 ms materialize=> select seller, count() from potential_flips group by seller order by count() desc limit 5; seller | count --------+------- 42091 | 7 42518 | 6 10529 | 6 39840 | 6 49317 | 6 (5 rows)

Time: 678.330 ms

This is now pretty interactive, using scant resources, over enough data and through complex views that to start from scratch would be exhausting. However, maintained indexes keep intermediate results up to date, and you get the same results as if re-run from scratch, just without the latency. -->

1716312983129 1 0

-- Supporting view to translate ids into text. CREATE VIEW items (id, item) AS VALUES (0, 'Signed Memorabilia'), (1, 'City Bar Crawl'), (2, 'Best Pizza in Town'), (3, 'Gift Basket'), (4, 'Custom Art');

-- Each year-long interval of interest CREATE VIEW years AS SELECT * FROM generate_series( '1970-01-01 00:00:00+00', '2099-01-01 00:00:00+00', '1 year') year WHERE mz_now() BETWEEN year AND year + '1 year' + '1 day';

-- Each day-long interval of interest CREATE VIEW days AS SELECT * FROM ( SELECT generate_series(year, year + '1 year' - '1 day'::interval, '1 day') as day FROM years UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN day AND day + '1 day' + '1 day';

-- Each hour-long interval of interest CREATE VIEW hours AS SELECT * FROM ( SELECT generate_series(day, day + '1 day' - '1 hour'::interval, '1 hour') as hour FROM days UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN hour AND hour + '1 hour' + '1 day';

-- Each minute-long interval of interest CREATE VIEW minutes AS SELECT * FROM ( SELECT generate_series(hour, hour + '1 hour' - '1 minute'::interval, '1 minute') AS minute FROM hours UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN minute AND minute + '1 minute' + '1 day';

-- Any second-long interval of interest CREATE VIEW seconds AS SELECT * FROM ( SELECT generate_series(minute, minute + '1 minute' - '1 second'::interval, '1 second') as second FROM minutes UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN second AND second + '1 second' + '1 day';

-- Indexes are important to ensure we expand intervals carefully. CREATE DEFAULT INDEX ON years; CREATE DEFAULT INDEX ON days; CREATE DEFAULT INDEX ON hours; CREATE DEFAULT INDEX ON minutes; CREATE DEFAULT INDEX ON seconds;

-- The final view we'll want to use . CREATE VIEW moments AS SELECT second AS moment FROM seconds WHERE mz_now() >= second AND mz_now() < second + '1 day';

-- Extract pseudorandom bytes from each moment. CREATE VIEW random AS SELECT moment, digest(moment::text, 'md5') as random FROM moments;

-- Present as auction CREATE VIEW auctions_core AS SELECT moment, random, get_byte(random, 0) + get_byte(random, 1) * 256 + get_byte(random, 2) * 65536 as id, get_byte(random, 3) + get_byte(random, 4) * 256 as seller, get_byte(random, 5) as item, -- Have each auction expire after up to 256 minutes. moment + (get_byte(random, 6)::text || ' minutes')::interval as end_time FROM random;

-- Refine and materialize auction data. CREATE MATERIALIZED VIEW auctions AS SELECT auctions_core.id, seller, items.item, end_time FROM auctions_core, items WHERE auctions_core.item % 5 = items.id;

-- Create and materialize bid data. CREATE MATERIALIZED VIEW bids AS -- Establish per-bid records and randomness. WITH prework AS ( SELECT id AS auction_id, moment as auction_start, end_time as auction_end, digest(random::text || generate_series(1, get_byte(random, 5))::text, 'md5') as random FROM auctions_core ) SELECT get_byte(random, 0) + get_byte(random, 1) * 256 + get_byte(random, 2) * 65536 as id, get_byte(random, 3) + get_byte(random, 4) * 256 AS buyer, auction_id, get_byte(random, 5)::numeric AS amount, auction_start + (get_byte(random, 6)::text || ' minutes')::interval as bid_time FROM prework;

]]>
<![CDATA[What Happened to the Operational Data Store?]]> https://materialize.com/blog/operational-data-store https://materialize.com/blog/operational-data-store Wed, 21 Aug 2024 00:00:00 GMT <![CDATA[

Operational data stores maintained real-time data, and allowed access to denormalized data across databases. But why don't you see that pattern much any more?

]]>
<![CDATA[Operational Data Warehouse: Streaming Solution for Small Data Teams]]> https://materialize.com/blog/operational-data-warehouse-small-team https://materialize.com/blog/operational-data-warehouse-small-team Wed, 10 Jul 2024 00:00:00 GMT <![CDATA[

Under-resourced small data teams can now leverage a SaaS solution with streaming data and SQL support to build real-time applications.

]]>
<![CDATA[Operational Data Warehouse Overview | Materialize]]> https://materialize.com/blog/operational-data-warehouse https://materialize.com/blog/operational-data-warehouse Tue, 12 Sep 2023 00:00:00 GMT <![CDATA[

We've built Materialize as a new kind of data warehouse, optimized to handle operational data work with the same familiar process from analytical warehouses.

]]>
<![CDATA[Real-Time CDC from Oracle to Materialize Using Estuary Flow]]> https://materialize.com/blog/oracle-materialize-estuary-flow https://materialize.com/blog/oracle-materialize-estuary-flow Tue, 24 Sep 2024 00:00:00 GMT <![CDATA[

In this tutorial, we’ll connect Oracle CDC to Materialize in just a few minutes using Estuary Flow’s Dekaf.

]]>
<![CDATA[

CREATE CONNECTION estuary_connection TO KAFKA ( BROKER 'dekaf.estuary.dev', SECURITY PROTOCOL = 'SASL_SSL', SASL MECHANISMS = 'PLAIN', SASL USERNAME = '{}', SASL PASSWORD = SECRET estuary_refresh_token );

CREATE CONNECTION csr_estuary_connection TO CONFLUENT SCHEMA REGISTRY ( URL 'https://dekaf.estuary.dev', USERNAME = '{}', PASSWORD = SECRET estuary_refresh_token );

CREATE SOURCE sales_source FROM KAFKA CONNECTION estuary_connection (TOPIC '<name-of-your-flow-collection>') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_estuary_connection ENVELOPE UPSERT;

CREATE INDEX idx_aggregated_sales ON aggregated_sales(total_sales);

]]>
<![CDATA[Rust for high-performance concurrency and network services]]> https://materialize.com/blog/our-experience-with-rust https://materialize.com/blog/our-experience-with-rust Tue, 06 Dec 2022 00:00:00 GMT <![CDATA[

Materialize is written in Rust. Why did we make that decision and how has it turned out for the project?

]]>
<![CDATA[

v.push("World"); // Now the compile knows what the vector contains

println!("Hello {}", v[0]); // And can statically guarantee the type is something we can print!

match s.find("great") { Some(idx) => println!("substring: {}", &s[idx..idx + 5]), None => { // hmm, I didn't find the substring, so I'll have to handle it somehow } }

// Add something to the vector v.push(4);

// change something in the vector *end = 3;

]]>
<![CDATA[Performance Benchmark: Aurora PostgreSQL vs. Materialize]]> https://materialize.com/blog/performance-benchmark-aurora-postgresql-materialize https://materialize.com/blog/performance-benchmark-aurora-postgresql-materialize Mon, 12 Aug 2024 00:00:00 GMT <![CDATA[

Materialize outperforms Aurora for complex queries over relatively small data volumes. Here are the benchmarks.

]]>
<![CDATA[

promotion_effect AS ( SELECT p.product_id, MIN(pr.promotion_discount) AS promotion_discount FROM promotions pr JOIN products p ON pr.product_id = p.product_id WHERE pr.active = TRUE GROUP BY p.product_id ),

popularity_score AS ( SELECT s.product_id, RANK() OVER (PARTITION BY p.category_id ORDER BY COUNT(s.sale_id) DESC) AS popularity_rank, COUNT(s.sale_id) AS sale_count FROM sales s JOIN products p ON s.product_id = p.product_id GROUP BY s.product_id, p.category_id ),

inventory_status AS ( SELECT i.product_id, SUM(i.stock) AS total_stock, RANK() OVER (ORDER BY SUM(i.stock) DESC) AS stock_rank FROM inventory i GROUP BY i.product_id ),

high_demand_products AS ( SELECT p.product_id, AVG(s.sale_price) AS avg_sale_price, COUNT(s.sale_id) AS total_sales FROM products p JOIN sales s ON p.product_id = s.product_id GROUP BY p.product_id HAVING COUNT(s.sale_id) > (SELECT AVG(total_sales) FROM (SELECT COUNT(*) AS total_sales FROM sales GROUP BY product_id) subquery) ),

dynamic_pricing AS ( SELECT p.product_id, p.base_price, CASE WHEN pop.popularity_rank <= 3 THEN 1.2 WHEN pop.popularity_rank BETWEEN 4 AND 10 THEN 1.1 ELSE 0.9 END AS popularity_adjustment, rp.avg_price, COALESCE(1.0 - (pe.promotion_discount / 100), 1) AS promotion_discount, CASE WHEN inv.stock_rank <= 3 THEN 1.1 WHEN inv.stock_rank BETWEEN 4 AND 10 THEN 1.05 ELSE 1 END AS stock_adjustment, CASE WHEN p.base_price > rp.avg_price THEN 1 + (p.base_price - rp.avg_price) / rp.avg_price ELSE 1 - (rp.avg_price - p.base_price) / rp.avg_price END AS demand_multiplier, hd.avg_sale_price, CASE WHEN p.product_name ilike '%cheap%' THEN 0.8 ELSE 1.0 END AS additional_discount FROM products p LEFT JOIN recent_prices rp ON p.product_id = rp.product_id LEFT JOIN promotion_effect pe ON p.product_id = pe.product_id JOIN popularity_score pop ON p.product_id = pop.product_id LEFT JOIN inventory_status inv ON p.product_id = inv.product_id LEFT JOIN high_demand_products hd ON p.product_id = hd.product_id )

SELECT dp.product_id, dp.base_price * dp.popularity_adjustment * dp.promotion_discount * dp.stock_adjustment * dp.demand_multiplier * dp.additional_discount AS adjusted_price FROM dynamic_pricing dp;

ALTER TABLE categories ADD CONSTRAINT categories_pkey PRIMARY KEY (category_id);

ALTER TABLE suppliers ADD CONSTRAINT suppliers_pkey PRIMARY KEY (supplier_id);

ALTER TABLE sales ADD CONSTRAINT sales_pkey PRIMARY KEY (sale_id);

ALTER TABLE inventory ADD CONSTRAINT inventory_pkey PRIMARY KEY (inventory_id);

ALTER TABLE promotions ADD CONSTRAINT promotions_pkey PRIMARY KEY (promotion_id);

ALTER TABLE public.inventory ADD CONSTRAINT inventory_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);

ALTER TABLE public.promotions ADD CONSTRAINT promotions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);

ALTER TABLE public.sales ADD CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);

CREATE INDEX idx_products_product_name ON products(product_name); CREATE INDEX idx_sales_product_id ON sales(product_id); CREATE INDEX idx_sales_sale_date ON sales(sale_date); CREATE INDEX idx_sales_product_id_sale_date ON sales(product_id, sale_date); CREATE INDEX idx_promotions_product_id ON promotions(product_id); CREATE INDEX idx_promotions_active ON promotions(active); CREATE INDEX idx_promotions_product_id_active ON promotions(product_id, active); CREATE INDEX idx_inventory_product_id ON inventory(product_id);

CREATE TABLE categories ( category_id SERIAL PRIMARY KEY, category_name VARCHAR(255) NOT NULL );

CREATE TABLE suppliers ( supplier_id SERIAL PRIMARY KEY, supplier_name VARCHAR(255) NOT NULL );

CREATE TABLE sales ( sale_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, sale_price NUMERIC(10, 2) NOT NULL, sale_date TIMESTAMP NOT NULL, price NUMERIC(10, 2) NOT NULL );

CREATE TABLE inventory ( inventory_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, stock INTEGER NOT NULL, warehouse_id INTEGER NOT NULL, restock_date TIMESTAMP NOT NULL );

CREATE TABLE promotions ( promotion_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, promotion_discount NUMERIC(10, 2) NOT NULL, start_date TIMESTAMP NOT NULL, end_date TIMESTAMP NOT NULL, active BOOLEAN NOT NULL );

]]>
<![CDATA[How and why is Materialize compatible with PostgreSQL?]]> https://materialize.com/blog/postgres-compatibility https://materialize.com/blog/postgres-compatibility Tue, 18 Oct 2022 00:00:00 GMT <![CDATA[

As an operational data store, Materialize is fundamentally different on the inside, but it's compatible with PostgreSQL in a few important ways.

]]>
<![CDATA[Real-Time Postgres Views Updates | Materialize]]> https://materialize.com/blog/postgres-source-updates https://materialize.com/blog/postgres-source-updates Thu, 18 May 2023 00:00:00 GMT <![CDATA[

Major updates to PostgreSQL streaming replication allow for real-time & incrementally updated materialized views with Materialize.

]]>
<![CDATA[Everything you need to know to be a Materialize power-user]]> https://materialize.com/blog/power-user https://materialize.com/blog/power-user Thu, 20 Apr 2023 00:00:00 GMT <![CDATA[

Master Materialize for enhanced scale, performance & power with key internal insights. A guide for aspiring power-users.

]]>
<![CDATA[How Materialize Unlocks Private Kafka Connectivity via PrivateLink and SSH]]> https://materialize.com/blog/private-kafka-connectivity https://materialize.com/blog/private-kafka-connectivity Mon, 10 Jun 2024 00:00:00 GMT <![CDATA[

Here's how we developed frictionless private networking for Kafka by using librdkafka.

]]>
<![CDATA[Testing Materialize: Our QA Process]]> https://materialize.com/blog/qa-process-overview https://materialize.com/blog/qa-process-overview Mon, 13 May 2024 00:00:00 GMT <![CDATA[

The following blog will show you we keep our customers and developers happy with our rigorous QA process, including our tools and testing methods.

]]>
<![CDATA[
    assert_eq!(
        mz_sql::catalog::ObjectType::ClusterReplica,

        conn_catalog.get_object_type(&amp;ObjectId::ClusterReplica((
            ClusterId::User(1),
            ReplicaId::User(1)
        )))
    );
    assert_eq!(
        mz_sql::catalog::ObjectType::Role,

        conn_catalog.get_object_type(&amp;ObjectId::Role(RoleId::User(1)))
    );
    catalog.expire().await;
})
.await;

}

query error column "hello world" does not exist select "hello world"

> SELECT * FROM data a

1 2

$ kafka-verify-data format=avro sink=materialize.public.sink sort-messages=true {"before": null, "after": {"a": 1}} {"before": null, "after": {"a": 2}}

def workflow_test(c: Composition): &nbsp;&nbsp;&nbsp;&nbsp;c.up("zookeeper", "kafka", "schema-registry", "materialized") &nbsp;&nbsp;&nbsp;&nbsp;c.run_testdrive_files("*.td")

&nbsp;&nbsp;&nbsp; def manipulate(self) -> list[Testdrive]: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return [ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Testdrive("> DELETE FROM delete_table WHERE f1 % 3 = 0;"), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Testdrive("> DELETE FROM delete_table WHERE f1 % 3 = 1;") &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]

&nbsp;&nbsp;&nbsp;&nbsp;def validate(self) -> Testdrive: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return Testdrive( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dedent( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;""" &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;> SELECT COUNT(*), MIN(f1), MAX(f1), COUNT(f1), COUNT(DISTINCT f1) FROM delete_table GROUP BY f1 % 3; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3333 2 9998 3333 3333 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;""" &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;]

> SELECT COUNT(*) > 0 FROM mz_internal.mz_source_statuses WHERE error LIKE '%Connection refused%'; true

&nbsp;&nbsp;&nbsp;&nbsp;def bootstrap(self) -> list[ActionOrFactory]: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return super().bootstrap() + [PostgresStart]

&nbsp;&nbsp;&nbsp;&nbsp;def actions_with_weight(self) -> dict[ActionOrFactory, float]: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return { &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CreatePostgresTable: 10, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CreatePostgresCdcTable: 10, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;KillClusterd: 5, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;StoragedKill: 5, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;StoragedStart: 5, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;PostgresRestart: 10, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CreateViewParameterized(): 10, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ValidateView: 20, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;PostgresDML: 100, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}

]]>
<![CDATA[Introducing Query History]]> https://materialize.com/blog/query-history-private-preview https://materialize.com/blog/query-history-private-preview Thu, 29 Feb 2024 00:00:00 GMT <![CDATA[

Now in Private Preview, Query History lets you monitor your SQL query performance to detect potential bottlenecks

]]>
<![CDATA[RBAC now available for all customers]]> https://materialize.com/blog/rbac https://materialize.com/blog/rbac Thu, 31 Aug 2023 00:00:00 GMT <![CDATA[

Comprehensive RBAC for Materialize users ensures secure, production-grade environment management & access control.

]]>
<![CDATA[

CREATE DATABASE data_scientists_db; CREATE CLUSTER data_scientists_cluster SIZE = 'medium';

GRANT ALL PRIVILEGES ON DATABASE data_scientists_db TO data_scientists; GRANT ALL PRIVILEGES ON CLUSTER data_scientists_cluster TO data_scientists;

materialize=> INSERT INTO payments_db.public.purchase_history VALUES (42); ERROR: permission denied for TABLE "payments_db.public.purchase_history"

]]>
<![CDATA[Real-time A/B Testing with Segment & Kinesis | Materialize]]> https://materialize.com/blog/real-time-a-b-test-results https://materialize.com/blog/real-time-a-b-test-results Wed, 21 Apr 2021 00:00:00 GMT <![CDATA[

Build a real-time A/B testing stack with Segment, Kinesis and Materialize.

]]>
<![CDATA[Real-time data quality tests using dbt and Materialize]]> https://materialize.com/blog/real-time-data-quality-tests-using-dbt-and-materialize https://materialize.com/blog/real-time-data-quality-tests-using-dbt-and-materialize Thu, 14 Jul 2022 00:00:00 GMT <![CDATA[

Real-time SQL monitoring & data quality tests with dbt & Materialize for continuous insights as data evolves.

]]>
<![CDATA[

tests: project: +store_failures: true +schema: test

materialize=> select * from public_test.not_null_stg_postgres__items_price; id | item | price | inventory ----+----------+-------+----------- 5 | NEW_ITEM | | (1 row)

materialize=> select * from public_test.dim_items_accepted_values; value_field | n_records -------------+----------- NEW_ITEM | 1 (1 row)

materialize=> select * from public_test.etl_alert; view_name | n_records ------------------------------------+----------- not_null_stg_postgres__items_price | 1 dim_items_accepted_values | 1

materialize=*> FETCH all c; mz_timestamp | mz_diff | view_name | n_records ---------------+---------+------------------------------------+----------- 1657555763000 | -1 | not_null_stg_postgres__items_price | 1 (1 row)

materialize=> select * from public.dim_users where id = 256; id | email | is_vip | revenue | orders | items_sold | last_purchase_ts | first_purchase_ts | pageviews | last_pageview_ts | first_pageview_ts -----+--------------------------+--------+---------+--------+------------+----------------------------+----------------------------+-----------+------------------------+------------------------ 256 | [email protected] | f | 2993.59 | 6 | 16 | 2022-07-14 14:18:50.849612 | 2022-07-14 14:10:26.434826 | 76 | 2022-07-14 14:18:50+00 | 2022-07-14 14:07:42+00 256 | [email protected] | f | 2993.59 | 6 | 16 | 2022-07-14 14:18:50.849612 | 2022-07-14 14:10:26.434826 | 156 | 2022-07-14 14:23:53+00 | 2022-07-14 14:11:46+00 (2 rows)

]]>
<![CDATA[Towards Real-Time dbt]]> https://materialize.com/blog/real-time-dbt https://materialize.com/blog/real-time-dbt Thu, 09 Mar 2023 00:00:00 GMT <![CDATA[

Explore strategies for unleashing real-time dbt, from materializing views to leveraging micro-batches and incrementally maintained views.

]]>
<![CDATA[Creating a Real-Time Feature Store with Materialize]]> https://materialize.com/blog/real-time-feature-store-with-materialize https://materialize.com/blog/real-time-feature-store-with-materialize Mon, 25 Apr 2022 00:00:00 GMT <![CDATA[

Materialize provides a real-time feature store that updates dimensions with new data instantly & maintains speed & accuracy.

]]>
<![CDATA[Real-Time Data Architectures: Why Small Data Teams Can't Wait]]> https://materialize.com/blog/real-time-small-data-teams https://materialize.com/blog/real-time-small-data-teams Tue, 02 Jul 2024 00:00:00 GMT <![CDATA[

Small data teams can't wait to build real-time data architectures. Find out why, and how they're approaching the problem.

]]>
<![CDATA[Recursion in Materialize]]> https://materialize.com/blog/recursion-in-materialize https://materialize.com/blog/recursion-in-materialize Wed, 11 Jan 2023 00:00:00 GMT <![CDATA[

Understanding recursion in Materialize & its significance in differential dataflow for SQL updates.

]]>
<![CDATA[

mcsherry=#

materialize=>

WITH MUTUALLY RECURSIVE -- Ranges [lower, upper) that can be produced by symbol. parses (lower int, upper int, symbol int) AS ( -- Base case: each literal is produced by some symbols. SELECT pos, pos+1, lhs FROM input, grammar_terms WHERE input.lit = grammar_terms.lit UNION -- Recursive case: two adjacent parses that follow the grammar. SELECT p1.lower, p2.upper, lhs FROM parses p1, parses p2, grammar_nonts WHERE p1.upper = p2.lower AND p1.symbol = grammar_nonts.rhs1 AND p2.symbol = grammar_nonts.rhs2 ) SELECT * FROM parses;

]]>
<![CDATA[Recursive SQL Queries in Materialize | Materialize]]> https://materialize.com/blog/recursive-ctes-in-materialize https://materialize.com/blog/recursive-ctes-in-materialize Wed, 12 Jul 2023 00:00:00 GMT <![CDATA[

Support for recursive SQL queries in Materialize is now available.

]]>
<![CDATA[Solving the Cache Invalidation Dilemma with Materialize and Redis]]> https://materialize.com/blog/redis-cache-invalidation https://materialize.com/blog/redis-cache-invalidation Tue, 17 Sep 2024 00:00:00 GMT <![CDATA[

In this post, we’ll explore the difficulties of cache invalidation, how Materialize and Redis address them, and when this solution is most effective.

]]>
<![CDATA[Materialize + Redpanda Serverless: Simplified developer experience for real-time apps]]> https://materialize.com/blog/redpanda-serverless https://materialize.com/blog/redpanda-serverless Tue, 19 Mar 2024 00:00:00 GMT <![CDATA[

Combining Redpanda Serverless with Materialize makes developing streaming data apps easier than ever before.

]]>
<![CDATA[Strategies for Reducing Data Warehouse Costs: Part 1]]> https://materialize.com/blog/reduce-data-warehouse-costs https://materialize.com/blog/reduce-data-warehouse-costs Tue, 09 Apr 2024 00:00:00 GMT <![CDATA[

With Materialize, teams can lower the cost of their data warehouse bill and implement new use cases.

]]>
<![CDATA[Reimagining Agentic Orchestration: Materialize and the Future of Autonomous Systems]]> https://materialize.com/blog/reimagining-agentic-orchestration-materialize https://materialize.com/blog/reimagining-agentic-orchestration-materialize Fri, 13 Dec 2024 00:00:00 GMT <![CDATA[

Discover how Materialize empowers intelligent agents to collaborate in real-time, ensuring cost-effective and efficient orchestration for autonomous systems. Transform the future of AI-powered ecosystems with fresh, consistent, and actionable insights.

]]>
<![CDATA[Re:Inventing Real-Time Data Integration]]> https://materialize.com/blog/reinvent-real-time-data-integration-takeaways https://materialize.com/blog/reinvent-real-time-data-integration-takeaways Mon, 09 Dec 2024 00:00:00 GMT <![CDATA[

Four Takeaways from AWS re:Invent 2024

]]>
<![CDATA[Release: 0.6]]> https://materialize.com/blog/release-0-6 https://materialize.com/blog/release-0-6 Thu, 07 Jan 2021 00:00:00 GMT <![CDATA[

Materialize's Release 0.6 enhances cloud data warehousing with real-time streaming capabilities for immediate action on live data.

]]>
<![CDATA[Release: 0.7]]> https://materialize.com/blog/release-0-7 https://materialize.com/blog/release-0-7 Tue, 09 Mar 2021 00:00:00 GMT <![CDATA[

Comprehensive insights & updates in Materialize's Release 0.7, enhancing real-time data warehouse capabilities.

]]>
<![CDATA[Release: 0.8]]> https://materialize.com/blog/release-0-8 https://materialize.com/blog/release-0-8 Mon, 14 Jun 2021 00:00:00 GMT <![CDATA[

Comprehensive insights & updates on Materialize's Release 0.8, enhancing real-time data warehousing capabilities for immediate action.

]]>
<![CDATA[Release: 0.9]]> https://materialize.com/blog/release-0-9 https://materialize.com/blog/release-0-9 Fri, 27 Aug 2021 00:00:00 GMT <![CDATA[

Materialize's Release 0.9 introduces an Operational Data Warehouse optimized for real-time data actions & cloud efficiency.

]]>
<![CDATA[Release: Materialize 0.3]]> https://materialize.com/blog/release-materialize-0-3 https://materialize.com/blog/release-materialize-0-3 Mon, 01 Jun 2020 00:00:00 GMT <![CDATA[

Materialize 0.3, an Operational Data Warehouse with cloud & streaming capabilities, optimizes real-time data action.

]]>
<![CDATA[Release: Materialize 0.4]]> https://materialize.com/blog/release-materialize-0-4 https://materialize.com/blog/release-materialize-0-4 Tue, 28 Jul 2020 00:00:00 GMT <![CDATA[

Materialize 0.4 introduces an Operational Data Warehouse with real-time streaming capabilities for immediate data action & analysis.

]]>
<![CDATA[Release: Materialize 0.5]]> https://materialize.com/blog/release-materialize-0-5 https://materialize.com/blog/release-materialize-0-5 Tue, 24 Nov 2020 00:00:00 GMT <![CDATA[

Materialize 0.5 operational data warehouse offers real-time action on live data for efficient & immediate insights.

]]>
<![CDATA[Responsiveness and Operational Agility]]> https://materialize.com/blog/responsiveness https://materialize.com/blog/responsiveness Thu, 11 Jan 2024 00:00:00 GMT <![CDATA[

See how Materialize supports operational work with responsiveness.

]]>
<![CDATA[Streaming Database Roadmap Guide | Materialize]]> https://materialize.com/blog/roadmap https://materialize.com/blog/roadmap Thu, 11 Jun 2020 00:00:00 GMT <![CDATA[

A guide to creating a streaming database with Materialize, from using a streaming framework to developing a scalable platform.

]]>
<![CDATA[Robust Reductions in Materialize]]> https://materialize.com/blog/robust-reductions-in-materialize https://materialize.com/blog/robust-reductions-in-materialize Tue, 04 Aug 2020 00:00:00 GMT <![CDATA[

Comprehensive guide to implementing robust reductions in Materialize, ensuring efficient & real-time data processing.

]]>
<![CDATA[

Time: 1741.500 ms (00:01.742) materialize=>

SELECT passenger_count, MAX(fare_amount) ..

SELECT passenger_count, COUNT(DISTINCT trip_distance) ..

]]>
<![CDATA[Rust for Data-Intensive Computation]]> https://materialize.com/blog/rust-for-data-intensive-computation https://materialize.com/blog/rust-for-data-intensive-computation Mon, 22 Jun 2020 00:00:00 GMT <![CDATA[

Harness the power of Rust for data-intensive tasks with Materialize, offering real-time insights & performance benefits.

]]>
<![CDATA[

(&self, mut predicate: P) -> Stream<G, D> where P: FnMut(&D)->bool+'static { ... }

]]>
<![CDATA[Self-Managed Materialize: Early Access Now Available]]> https://materialize.com/blog/self-managed-materialize-early-access https://materialize.com/blog/self-managed-materialize-early-access Mon, 16 Dec 2024 00:00:00 GMT <![CDATA[

Get early access to self-managed Materialize and run it within your private infrastructure. Meet governance, compliance needs, and deploy in any cloud with Materialize's real-time data transformation capabilities.

]]>
<![CDATA[It’s (almost) here: self-managed Materialize]]> https://materialize.com/blog/self-managed https://materialize.com/blog/self-managed Mon, 25 Nov 2024 00:00:00 GMT <![CDATA[

A new way to run Materialize in the cloud for organizations with unique operational requirements. Join the Early Access program today!

]]>
<![CDATA[Slicing up Temporal Aggregates in Materialize]]> https://materialize.com/blog/slicing-up-temporal-aggregates-in-materialize https://materialize.com/blog/slicing-up-temporal-aggregates-in-materialize Thu, 14 Jan 2021 00:00:00 GMT <![CDATA[

Comprehensive guide on slicing temporal aggregates with Materialize for real-time data analysis & actionable insights.

]]>
<![CDATA[When to use Materialize vs a Stream Processor]]> https://materialize.com/blog/stream-processor-comparison https://materialize.com/blog/stream-processor-comparison Thu, 11 May 2023 00:00:00 GMT <![CDATA[

If you're already familiar with stream processors you may wonder: When is it better to use Materialize vs a Stream Processor? And why?

]]>
<![CDATA[Streaming TAIL to the Browser - A One Day Project]]> https://materialize.com/blog/streaming-tail-to-the-browser-a-one-day-project https://materialize.com/blog/streaming-tail-to-the-browser-a-one-day-project Fri, 24 Jul 2020 00:00:00 GMT <![CDATA[

Real-time data streaming directly to your browser with Materialize's latest one-day project; understand the technical journey & outcomes.

]]>
<![CDATA[Subscribe to changes in a view with Materialize]]> https://materialize.com/blog/subscribe-to-changes-in-a-view-with-tail-in-materialize https://materialize.com/blog/subscribe-to-changes-in-a-view-with-tail-in-materialize Thu, 03 Mar 2022 00:00:00 GMT <![CDATA[

Real-time SQL query & view update subscriptions are made simple with Materialize's SUBSCRIBE feature.

]]>
<![CDATA[

-- Windowed aggregation CREATE MATERIALIZED VIEW avg_last_minute_temperature AS SELECT DATE_TRUNC('second', to_timestamp(updated_at / 1000)) as ts_second, AVG(temperature) FROM temperatures WHERE (updated_at + 60000) > mz_logical_timestamp() GROUP BY ts_second;

-- Indexing view (Materializing) with a custom compaction CREATE INDEX avg_last_minute_temperature_idx ON avg_last_minute_temperature (ts_second) WITH (logical_compaction_window = '1minute');

]]>
<![CDATA[Taking Materialize for a spin on NYC taxi data]]> https://materialize.com/blog/taking-materialize-for-a-spin-on-nyc-taxi-data https://materialize.com/blog/taking-materialize-for-a-spin-on-nyc-taxi-data Wed, 18 Mar 2020 00:00:00 GMT <![CDATA[

Experience real-time data analysis with Materialize on NYC taxi data, showcasing a practical application of streaming SQL.

]]>
<![CDATA[

materialize=>

Time: 796.667 ms materialize=>

Time: 1741.500 ms (00:01.742) materialize=>

Time: 0.669 ms materialize=>

Time: 608.168 ms materialize=>

Time: 0.447 ms materialize=>

Time: 0.818 ms materialize=>

Time: 11.524 ms materialize=>

Time: 58.558 ms materialize=>

Time: 0.863 ms materialize=>

Time: 23.611 ms materialize=>

Time: 25.061 ms materialize=>

Time: 169.186 ms materialize=>

Time: 4.563 ms materialize=>

Time: 12.504 ms materialize=> SELECT * FROM aggregates; passenger_count | MIN | MAX -----------------+------+-------- | | 0 | -90 | 40502 1 | -800 | 907070 2 | -498 | 214748 3 | -498 | 349026 4 | -415 | 974 5 | -300 | 1271 6 | -100 | 433 7 | -70 | 140 8 | -89 | 129 9 | 0 | 110 96 | 6 | 6 192 | 6 | 6 (13 rows)

Time: 0.935 ms materialize=>

Time: 1.067 ms materialize=>

Time: 14.764 ms materialize=>

]]>
<![CDATA[Stream Analytics with Redpanda & Materialize | Materialize]]> https://materialize.com/blog/taking-streaming-analytics-further-faster-with-redpanda-materialize https://materialize.com/blog/taking-streaming-analytics-further-faster-with-redpanda-materialize Tue, 19 Oct 2021 00:00:00 GMT <![CDATA[

Enhance your data workflows with Redpanda & Materialize for faster & more efficient streaming analytics. Get insights on integration & usage.

]]>
<![CDATA[

SELECT * FROM hvu_test LIMIT 2;

]]>
<![CDATA[Taming the beast that is a SQL database]]> https://materialize.com/blog/taming-the-beast-that-is-a-sql-database https://materialize.com/blog/taming-the-beast-that-is-a-sql-database Tue, 01 Feb 2022 00:00:00 GMT <![CDATA[

In this article, we will talk about one of the ways we approach the testing of the SQL engine of the product at Materialize. We hope to cover other modules and interesting angles in the future.

]]>
<![CDATA[Temporal Filters: Enabling Windowed Queries in Materialize]]> https://materialize.com/blog/temporal-filters https://materialize.com/blog/temporal-filters Tue, 16 Feb 2021 00:00:00 GMT <![CDATA[

Temporal filters give you a powerful SQL primitive for defining time-windowed computations over temporal data.

]]>
<![CDATA[

content | insert_ts | delete_ts | mz_logical_timestamp ---------+-----------+-----------+---------------------- (0 rows)

content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- hello | 1627380752528 | 1627380752528 | 1627380754223 welcome | 1627380752530 | 1627380752530 | 1627380754223 goodbye | 1627380752533 | 1627380752533 | 1627380754223 (3 rows)

content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- hello | 1627380752528 | 1627380752528 | 1627380755920 welcome | 1627380752530 | 1627380752530 | 1627380755920 goodbye | 1627380752533 | 1627380752533 | 1627380755920 (3 rows)

content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- welcome | 1627380752530 | 1627380752530 | 1627380757989 goodbye | 1627380752533 | 1627380752533 | 1627380757989 (2 rows)

content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- goodbye | 1627380752533 | 1627380752533 | 1627380762667 (1 row)

content | insert_ts | delete_ts | mz_logical_timestamp ---------+-----------+-----------+---------------------- (0 rows)

]]>
<![CDATA[A Terraform Provider for Materialize]]> https://materialize.com/blog/terraform-provider https://materialize.com/blog/terraform-provider Tue, 25 Apr 2023 00:00:00 GMT <![CDATA[

Materialize maintains an official Terraform Provider you can use to manage your clusters, replicas, connections and secrets as code.

]]>
<![CDATA[

resource "materialize_cluster_replica" "cluster_replica_1" { name = "r1" cluster_name = materialize_cluster.cluster.name size = "medium" }

resource "materialize_cluster_replica" "cluster_replica_2" { name = "r2" cluster_name = materialize_cluster.cluster.name size = "medium" }

resource "aws_vpc_endpoint_service" "example" { acceptance_required = false allowed_principals = [data.aws_caller_identity.current.arn] gateway_load_balancer_arns = [aws_lb.example.arn] }

resource "materialize_connection_aws_privatelink" "example_privatelink_connection" { name = "example_privatelink_connection" service_name = aws_vpc_endpoint_service.example.service_name availability_zones = ["use1-az2", "use1-az6"] }

]]>
<![CDATA[The Four ACID Questions]]> https://materialize.com/blog/the-four-acid-questions https://materialize.com/blog/the-four-acid-questions Wed, 05 Apr 2023 00:00:00 GMT <![CDATA[

Four questions, and their answers, to explain ACID transactions and how they are handled within Materialize.

]]>
<![CDATA[Upserts in Differential Dataflow]]> https://materialize.com/blog/upserts-in-differential-dataflow https://materialize.com/blog/upserts-in-differential-dataflow Fri, 27 Mar 2020 00:00:00 GMT <![CDATA[

Comprehensive guide to implementing upserts in differential dataflow with Materialize for real-time data warehouse optimization & efficiency.

]]>
<![CDATA[
for (key, mut list) in to_process.drain() {

    // Maintains the prior value associated with the key.
    let mut prev_value: Option&lt;Tr::Val&gt; = None;

    // Attempt to find the key in the trace.
    trace_cursor.seek_key(&amp;trace_storage, &amp;key);
    if trace_cursor.get_key(&amp;trace_storage) == Some(&amp;key) {
        // Determine the prior value associated with the key.
        // There may be multiple historical values; we'll want the one
        // that accumulates to a non-zero (ideally one) count.
        while let Some(val) = trace_cursor.get_val(&amp;trace_storage) {
            let mut count = 0;
            trace_cursor.map_times(&amp;trace_storage, |_time, diff| count += *diff);
            assert!(count == 0 || count == 1);
            if count == 1 {
                assert!(prev_value.is_none());
                prev_value = Some(val.clone());
            }
            trace_cursor.step_val(&amp;trace_storage);
        }
        trace_cursor.step_key(&amp;trace_storage);
    }

    // Sort the list of upserts to `key` by their time, suppress multiple updates.
    list.sort();
    list.dedup_by(|(t1,_), (t2,_)| t1 == t2);
    // Process distinct times; add updates into batch builder.
    for (time, std::cmp::Reverse(next)) in list {
        if prev_value != next {
            if let Some(prev) = prev_value {
                // A prior value exists, retract it!
                builder.push((key.clone(), prev, time.clone(), -1));
            }
            if let Some(next) = next.as_ref() {
                // A new value exists, introduce it!
                builder.push((key.clone(), next.clone(), time.clone(), 1));
            }
            prev_value = next;
        }
    }
}

]]>
<![CDATA[View your usage and billing history]]> https://materialize.com/blog/usage-and-billing https://materialize.com/blog/usage-and-billing Tue, 05 Mar 2024 00:00:00 GMT <![CDATA[

Get complete visibility into your usage trends and billing history to manage your spend effectively

]]>
<![CDATA[When to Use Indexes and Materialized Views]]> https://materialize.com/blog/views-indexes https://materialize.com/blog/views-indexes Thu, 16 Feb 2023 00:00:00 GMT <![CDATA[

If you are familiar with materialized views and indexes from other databases, this article will help you apply that understanding to Materialize.

]]>
<![CDATA[Virtual Time for Scalable Performance | Materialize]]> https://materialize.com/blog/virtual-time-consistency-scalability https://materialize.com/blog/virtual-time-consistency-scalability Tue, 14 Jun 2022 00:00:00 GMT <![CDATA[

The key to Materialize's ability to separate compute from storage and scale horizontally without sacrificing consistency is a concept called virtual time.

]]>
<![CDATA[VS Code Integration Guide | Materialize]]> https://materialize.com/blog/vs-code-integration https://materialize.com/blog/vs-code-integration Mon, 16 Oct 2023 00:00:00 GMT <![CDATA[

Integrate Materialize with VS Code for schema exploration, SQL validation & query execution, all within your IDE for efficient development.

]]>
<![CDATA[Cloud Data Warehouse Uses & Misuses | Materialize]]> https://materialize.com/blog/warehouse-abuse https://materialize.com/blog/warehouse-abuse Thu, 27 Jul 2023 00:00:00 GMT <![CDATA[

Data Warehouses are great for many things but often misused for operational workloads.

]]>
<![CDATA[Announcing Webhook Sources]]> https://materialize.com/blog/webhook-sources https://materialize.com/blog/webhook-sources Thu, 28 Sep 2023 00:00:00 GMT <![CDATA[

Today Materialize customers can create webhook sources, making it much easier to pipe in events from a long tail of SaaS platforms, services, and tools.

]]>
<![CDATA[What is a real-time analytics database?]]> https://materialize.com/blog/what-is-a-real-time-analytics-database https://materialize.com/blog/what-is-a-real-time-analytics-database Fri, 26 Jan 2024 00:00:00 GMT <![CDATA[

Discover the essentials of real-time analytics databases, their benefits, and how they compare to traditional databases for better operational decision-making.

]]>
<![CDATA[What is an operational data warehouse?]]> https://materialize.com/blog/what-is-an-operational-data-warehouse https://materialize.com/blog/what-is-an-operational-data-warehouse Fri, 02 Feb 2024 00:00:00 GMT <![CDATA[

Learn how an operational data warehouse enables organizations to use their freshest data for day-to-day decision-making

]]>
<![CDATA[Data Freshness: Why It Matters and How to Deliver It]]> https://materialize.com/blog/what-is-data-freshness https://materialize.com/blog/what-is-data-freshness Fri, 23 Feb 2024 00:00:00 GMT <![CDATA[

Data freshness is essential for real-time business use cases. Here's how an operational data warehouse powers your business processes with fresh data.

]]>
<![CDATA[What's new in Materialize? Vol. 2]]> https://materialize.com/blog/whats-new-in-materialize-vol-2 https://materialize.com/blog/whats-new-in-materialize-vol-2 Tue, 01 Mar 2022 00:00:00 GMT <![CDATA[

Comprehensive updates in Materialize Vol. 2: AWS roles, PostgreSQL enhancements, Schema Registry SSL, & more for streamlined data management.

]]>
<![CDATA[What's new in Materialize? Volume 1]]> https://materialize.com/blog/whats-new-in-materialize-vol1 https://materialize.com/blog/whats-new-in-materialize-vol1 Mon, 20 Dec 2021 00:00:00 GMT <![CDATA[

Stay updated with Materialize: Kafka source metadata, protobuf & schema registry integration, time bucketing, Metabase, cloud metrics & monitoring enhancements.

]]>
<![CDATA[Why not RocksDB for streaming storage?]]> https://materialize.com/blog/why-not-rocksdb https://materialize.com/blog/why-not-rocksdb Thu, 06 Aug 2020 00:00:00 GMT <![CDATA[

An explanation of our rationale for why Materialize chose not to use RocksDB as its underlying storage engine.

]]>
<![CDATA[Why, How, and When To Use Materialized Views]]> https://materialize.com/blog/why-use-a-materialized-view https://materialize.com/blog/why-use-a-materialized-view Tue, 11 Aug 2020 00:00:00 GMT <![CDATA[

Discover how to reduce database query costs with Materialized Views. This guide will walk you through the benefits, creation process, and impact on database efficiency.

]]>
<![CDATA[Zero-Staleness: Like using your primary, but faster]]> https://materialize.com/blog/zero-staleness-faster-primary https://materialize.com/blog/zero-staleness-faster-primary Fri, 13 Sep 2024 00:00:00 GMT <![CDATA[

Materialize can respond faster than your primary database, with results that are at least as fresh as your primary would provide.

]]>