Let's build a python application to demonstrate how developers can create real-time, event-driven experiences for their users, powered by Materialize.
]]>timestamp = 1608081358001 inserted = [('Epidosis', '4595'), ('Matlin', '5221')] deleted = [('Lockal', '4590'), ('Matlin', '5220')]
async for (timestamp, progressed, diff, *columns) in cursor:
# The progressed column serves as a synchronization primitive indicating that all
# rows for an update have been read. We should publish this update.
if progressed:
self.update(deleted, inserted, timestamp)
inserted = []
deleted = []
continue
# Simplify our implementation by creating "diff" copies of each row instead
# of tracking counts per row
if diff < 0: deleted.extend([columns] * abs(diff)) elif diff > 0:
inserted.extend([columns] * diff)
else:
raise ValueError(f"Bad data from TAIL: {row}")
# Remove any rows that have been deleted
for r in deleted:
self.current_rows.remove(r)
# And add any rows that have been inserted
self.current_rows.extend(inserted)
# If we have listeners configured, broadcast this diff
if self.listeners:
payload = {"deleted": deleted, "inserted": inserted, "timestamp": timestamp}
self.broadcast(payload)
connection.onmessage = function (event) { var data = JSON.parse(event.data); // Counter is a single row table, so every update should contain one insert and // maybe one delete (which we don't care about) document.getElementById('counter').innerHTML = data.inserted[0][0]; };
function convert_to_subject(row) { return { subject: row[0], count: parseInt(row[1]) }; }
function subject_in_array(e, arr) { return arr.find((i) => i.subject === e.subject && i.count === e.count); }
connection.onmessage = function (event) { var data = JSON.parse(event.data); var insert_values = data.inserted.map(convert_to_subject); var delete_values = data.deleted.map(convert_to_subject); var changeSet = vega .changeset() .insert(insert_values) .remove((d) => subject_in_array(d, delete_values));
chart.view.change('data', changeSet).resize().run();
}; });
]]>The Materialize team participated in Advent of Code 2023 and took a bold approach in using SQL to solve each puzzle. Check it out.
]]>-- Parse the problem input into tabular form.
lines(line TEXT) AS ( .. ),
-- SQL leading up to part 1.
part1(part1 BIGINT) AS ( .. ),
-- SQL leading up to part 2.
part2(part2 BIGINT) AS ( .. )
SELECT * FROM part1, part2;
The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
Consider your entire calibration document. What is the sum of all of the calibration values?
Your calculation isn't quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".
Equipped with this new information, you now need to find the real first and last digit on each line.
Day 1 was brought to you by: @chass, @def-, @doy-materialize, @frankmcsherry, @josharenberg, @morsapaes, @nrainer-materialize
Given a table with the following format:
Day 2 was brought to you by: @def-, @frankmcsherry, @morsapaes
Day 3 was brought to you by: @frankmcsherry, @morsapaes
Day 4 was brought to you by: @chass, @doy-materialize, @frankmcsherry, @morsapaes
Day 5 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize
Day 6 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize, @petrosagg
Day 7 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 8 was brought to you by: @doy-materialize, @frankmcsherry, @nrainer-materialize
Day 9 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 10 was brought to you by: @frankmcsherry
Day 11 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 12 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 13 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 14 was brought to you by: @frankmcsherry
Day 15 was brought to you by: @frankmcsherry, @nrainer-materialize
Day 16 was brought to you by: @frankmcsherry
Day 17 was brought to you by: @frankmcsherry
Day 18 was brought to you by: @frankmcsherry
Day 19 was brought to you by: @frankmcsherry
Day 20 was brought to you by: @frankmcsherry
Day 21 was brought to you by: @frankmcsherry
Day 22 was brought to you by: @frankmcsherry
Day 23 was brought to you by: @frankmcsherry
Day 24 was brought to you by: @frankmcsherry
Day 25 was brought to you by: @frankmcsherry
]]>An in-depth breakdown of how we architected and built a native MySQL CDC source
]]>Learn how we built an in-browser SQL shell that empowers Materialize users to interact with their databases
]]>Export a snapshot of your data to Amazon S3 object storage as an intermediary to sink data to a broader set of systems downstream
]]>An illustration of the unexpectedly high downstream cost of clever optimizations to change data capture.
]]>Explore how Materialize overcomes key microservices challenges like data silos, network fan-out, and reconvergence issues. Learn how database-level transformations unlock real-time, consistent, and efficient operations in microservices architectures.
]]>Change Data Capture (CDC) is finally gaining widespread adoption as a architectural primitive. Why now?
]]>Here we set the context for and propose a change data capture protocol: a means of writing down and reading back changes to data.
]]>// one record is "updated" (record1, time1, -1) (record2, time1, +1)
// two records are deleted (record0, time2, -1) (record2, time2, -1)
/// Frontier through which `Self` has reported updates.
///
/// All updates not beyond this frontier have been reported.
/// Any information related to times not beyond this frontier can be discarded.
///
/// This frontier tracks the meet of `progress_frontier` and `updates_frontier`,
/// our two bounds on potential uncertainty in progress and update messages.
reported_frontier: Antichain<T>,
/// Updates that have been received, but are still beyond `reported_frontier`.
///
/// These updates are retained both so that they can eventually be transmitted,
/// but also so that they can deduplicate updates that may still be received.
updates: std::collections::HashSet<(D, T, R)>,
/// Frontier of accepted progress statements.
///
/// All progress message counts for times not beyond this frontier have been
/// incorporated in to `updates_frontier`. This frontier also guides which
/// received progress statements can be incorporated: those whose for which
/// this frontier is beyond their lower bound.
progress_frontier: Antichain<T>,
/// Counts of outstanding messages at times.
///
/// These counts track the difference between message counts at times announced
/// by progress messages, and message counts at times received in distinct updates.
updates_frontier: MutableAntichain<T>,
/// Progress statements that are not yet actionable due to out-of-orderedness.
///
/// A progress statement becomes actionable once the progress frontier is beyond
/// its lower frontier. This ensures that the [0, lower) interval is already
/// covered, and that we will not leave a gap by incorporating the counts
/// and reflecting the progress statement's upper frontier.
progress_queue: Vec<Progress<T>>,
}
// Drain actionable progress messages.
unimplemented!()
// Determine if the lower bound of `progress_frontier` and `updates_frontier` has advanced.
// If so, we can determine and return a batch of updates and an newly advanced frontier.
unimplemented!()
}
// If we've exhausted our iterator, we have nothing to say.
None
]]>If you're familiar with data warehouses, this article will help you understand Materialize Clusters in relation to well-known components in Snowflake.
]]>Arjun Narayan introduces the CMU DB group to streaming databases, the problems they solve, and specific architectural decisions in Materialize.
]]>Recently, I've felt the pain of long Rust compile times at Materialize, and so was motived to improve them a bit. Here's how I did it.
]]>Materialize & Confluent partnership offers SQL on Kafka capabilities for efficient data team integration.
]]>Comprehensive guide on using PostgreSQL's write-ahead log as a data source for Materialize, with technical insights & benefits.
]]>START_REPLICATION slot_name;
]]>Understand the necessary consistency guarantees for a streaming data platform & how they ensure accurate data views.
]]>Let's demonstrate the unique features of Materialize by building the core functionality of a customer data platform.
]]>const client = new Client({ user: MATERIALIZE_USERNAME, password: MATERIALIZE_PASSWORD, host: MATERIALIZE_HOST, port: 6875, database: 'materialize', ssl: true });
async function main() { await client.connect(); const res = await client.query("SELECT * FROM cdp_users WHERE uuid = 'ABC123'"); console.log(res.rows); }
main();
]]>Let's explore why many teams rely on PostgreSQL for analytics, the challenges they face, and how Materialize solves these problems.
]]>WITH latest_orders AS ( SELECT * FROM orders WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) ), updated_totals AS ( SELECT customer_id, SUM(order_total) AS total_sales FROM latest_orders GROUP BY customer_id ), existing_totals AS ( SELECT customer_id, total_sales FROM {{ this }} WHERE customer_id NOT IN (SELECT customer_id FROM updated_totals) ) SELECT * FROM updated_totals UNION ALL SELECT * FROM existing_totals;
SELECT customer_id, SUM(order_total) AS total_sales FROM orders GROUP BY customer_id;
]]>Let's demonstrate how to manage streaming SQL in Materialize with dbt by porting the classic dbt jaffle-shop demo scenario to the world of streaming.
]]> target: dev
materialize=> SHOW MATERIALIZED VIEWS IN jaffle_shop;
name
dim_customers fct_orders raw_customers raw_orders raw_payments
materialize=> SELECT * FROM jaffle_shop.dim_customers WHERE customer_id = 1;
customer_id | first_order | most_recent_order | number_of_orders | customer_lifetime_value ------------+-------------+-------------------+------------------+------------------------- 1 | 2018-01-01 | 2018-02-10 | 2 | 33
]]>Insight into SQL subquery optimization & how Materialize's approach differs from other databases, enhancing query performance.
]]>// Filter out null posts.user_id
// (Materialize doesn't understand foreign constraints yet)
%2 =
| Get jamie.public.posts (u5)
| Filter !(isnull(#1))
// Join %1 and %2 on users.id = posts.user_id
// Group by users.id
and count distinct posts.content
%3 =
| Join %1 %2 (= #0 #2)
| | implementation = Differential %2 %1.(#0)
| | demand = (#0, #3)
| Filter !(isnull(#0))
| Reduce group=(#0)
| | agg count(distinct #3)
// Request an index on users.id
// (Materialize doesn't understand unique keys yet, so doesn't realize this index is redundant)
%4 =
| Get jamie.public.users (u3)
| ArrangeBy (#0)
// Find values of users.id
for which there are no posts and assign count 0
%5 =
| Get %3
| Negate
| Project (#0)
%6 =
| Union %5 %0
| Map 0
// Union the zero counts and the non-zero counts %7 = | Union %3 %6
// Join the results against users
to recover row counts that were erased by the group-by above
// (Materialize doesn't understand unique keys yet, so doesn't realize this join is redundant)
%8 =
| Join %4 %7 (= #0 #2)
| | implementation = Differential %7 %4.(#0)
| | demand = (#0, #3)
| Project (#0, #3)
Materialize has a subtly different cost model that is a huge advantage for operational workloads that need fresh data.
]]>Decrease your data warehouse costs by sinking precomputed results and leveraging real-time analytics.
]]>Understand how to optimize joins with indexes and late materialization.
]]>Let's build (in Python) the Differential Dataflow framework at the heart of Materialize, and explain what it's doing along the way.
]]>([((1, 1), 1), ((2, 4), 1), ((3, 9), 1), ((4, 16), 1), ((5, 25), 1)]
def geometric_series(collection): return ( collection.map(lambda data: data * 2) .concat(collection) .filter(lambda data: data <= 50) .map(lambda data: (data, ())) .distinct() .map(lambda data: data[0]) .consolidate() )
output = input_a.iterate(geometric_series).debug("iterate").connect_reader() graph = graph_builder.finalize()
while output.probe_frontier_less_than(Version(1)): graph.step()
output = input_a.iterate(example).connect_reader() graph = graph_builder.finalize()
input_a_writer.send_data(Version(0), Collection([(1, 1)])) input_a_writer.send_frontier(Antichain([Version(1)]))
while output.probe_frontier_less_than(Antichain([Version(1)])): graph.step()
]]>Learn how recursive SQL provides an elegant solution for a fundamental use case in economics - stable matching.
]]>Understand why eventual consistency isn't suitable for streaming systems & the systematic errors it can cause with Materialize's insights.
]]>-- select out surprisingly large values select data.key, data.value from data, stats_by_key where data.key = stats_by_key.key and data.value > average + 3 * devation
// Delayed map from values back to their keys. let input2 = data.delay(|t| t + 1) .map(|(key,val)| (val,key));
// Observe any results input2.semijoin(&input) .inspect(|x| println!("KEY: {:?}", x));
-- select out surprisingly large values select data.key, data.value from data, stats_by_key where data.key = stats_by_key.key and data.value > average + 3 * devation
// These collections should be always empty. let errors_a = histogram1a.concat(histogram2a.negate()); let errors_b = histogram1b.concat(histogram3b.negate()); let errors_c = histogram2c.concat(histogram3c.negate());
]]>A breakdown of how we built the Materialize Fivetran Destination with Fivetran's Partner SDK, and how this unlocks new workflows in Materialize.
]]>In this blog, we’ll explain the different roles of analytical and operational data warehouses in building real-time fraud detection systems.
]]>At the heart of freshness in Materialize is autonomous proactive work, done in response to the arrival of data rather than waiting for a user command.
]]>Differential dataflow uses simple linear operators: map
, filter
, flat_map
and complex: explode
and temporal filter operators. But, with some thinking, we can generalize them all to a restricted form of join.
Insights on how indexes impact scaling in databases & their evolution in streaming-first data warehouses.
]]>-- 15 million rows INSERT INTO contacts SELECT 'Kelly' as name, generate_series(650000000, 665500000) as phone, 1 as prefix;
CREATE INDEX contacts_name_idx ON contacts (name); CREATE INDEX contacts_phone_idx ON contacts (phone);
ANALYZE contacts;
QUERY PLAN
Seq Scan on contacts (cost=0.00..277533.14 rows=15499931 width=14) Filter: (name = 'Kelly'::text)
QUERY PLAN
Index Scan using contacts_phone_idx on contacts (cost=0.43..8.45 rows=1 width=14) Index Cond: (phone = 2)
count | name | dataflow_name -------+------------------------+---------------------------------- 3 | ArrangeBy[[Column(1)]] | Dataflow: 1.3.contacts_phone_idx
]]>Efficient SQL data transformations & real-time analytics with dbt + Materialize: a powerful operational data warehouse combo.
]]>Materialize Cloud integrates with Tailscale, offering secure & easy connection of clusters to private networks using WireGuard protocol.
]]>Materialize offers a streaming data warehouse for real-time analytics & interoperability with millisecond latency, revolutionizing data handling.
]]>IVMRs can deliver 1000x performance for read-heavy workloads, without losing freshness, and do so at a fraction of the price of a traditional replica.
]]>Debezium and Materialize can be used as powerful tools for joining high-volume streams of data from Kafka and tables from databases.
]]>CREATE SOURCE users FROM KAFKA BROKER 'kafka:9092' TOPIC 'mysql.shop.users' FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://schema-registry:8081' ENVELOPE DEBEZIUM;
]]>Comprehensive guide to implementing joins in Materialize, covering binary to delta joins for efficient streaming systems.
]]>Time: 12.927 ms materialize=>
]]>In principle, it is possible to use Kafka as a database. But in doing so you will confront every hard problem that database management systems have faced for decades
]]>Or why clear consistency guarantees are how to stay sane when programming distributed systems.
]]>Comprehensive guide to using Materialize's LATERAL
join for efficient query patterns in incremental view maintenance engines.
INSERT INTO cities VALUES ('Los_Angeles', 'CA', 3979576), ('Phoenix', 'AZ', 1680992), ('Houston', 'TX', 2320268), ('San_Diego', 'CA', 1423851), ('San_Francisco', 'CA', 881549), ('New_York', 'NY', 8336817), ('Dallas', 'TX', 1343573), ('San_Antonio', 'TX', 1547253), ('San_Jose', 'CA', 1021795), ('Chicago', 'IL', 2695598), ('Austin', 'TX', 978908);
-- same query as above, but starting from queries
.
-- also, we materialize a view to build a dataflow.
CREATE MATERIALIZED VIEW top_3s AS
SELECT state, name FROM
-- for each distinct state we are asked about ...
(SELECT DISTINCT state FROM queries) states,
-- ... extract the top 3 cities by population.
LATERAL (
SELECT name, pop
FROM cities
WHERE state = states.state
ORDER BY pop
DESC LIMIT 3
);
materialize=>
materialize=>
%1 = | Get materialize.public.cities (u8544)
%2 = | Join %0 %1 (= #0 #2) | | implementation = Differential %1 %0.(#0) | | demand = (#0, #1, #3) | TopK group=(#0) order=(#3 desc) limit=3 offset=0 | Project (#0, #1)
%1 = | Get materialize.public.queries (u8548) | Distinct group=(#0) | ArrangeBy (#0)
%2 = | Get materialize.public.cities (u8544)
%3 = | Join %1 %2 (= #0 #2) | | implementation = Differential %2 %1.(#0) | | demand = (#0, #1, #3) | TopK group=(#0) order=(#3 desc) limit=3 offset=0
%4 = | Join %0 %3 (= #0 #2) | | implementation = Differential %3 %0.(#0) | | demand = (#0, #1, #3) | Project (#1, #0, #3)
]]>What is a Data Application? How do they help our customers? What new challenges do we face when building Data Apps? Here's our perspective.
]]>How to write algorithms in differential dataflow, using Conway's Game of Life as an example.
]]>fn intersection(first: &[i32], second: &[i32]) -> Vec<i32> { let mut output = Vec::new();
let first_set: HashSet<_> = first.iter().cloned().collect();
let second_set: HashSet<_> = second.iter().cloned().collect();
for element in first_set.iter() {
if second_set.contains(element) {
output.push(*element);
}
}
output
}
// Send some sample data to our dataflow
for i in 0..10 {
// Advance time to i
first.advance_to(i);
second.advance_to(i);
for x in i..(i + 10) {
first.insert(x);
second.insert(x + 5);
}
}
})
multiplicity: 1
(x, str.to_string())
}); let output = input.concat(&successors).distinct();
let result = initial.iterate(|input| {
let successors = input.map(|(x, _)| x + 1).map(|x| {
let str = if x % 3 == 0 && x % 5 == 0 {
"FizzBuzz"
} else if x % 5 == 0 {
"Buzz"
} else if x % 3 == 0 {
"Fizz"
} else {
""
};
(x, str.to_string())
});
let output = input.concat(&successors).distinct();
output.filter(|(x, _)| *x <= 100)
});
result
.inspect(|(x, time, m)| println!("x: {:?} time: {:?} multiplicity: {}", x, time, m));
});
})
let live_with_three_neighbors = maybe_live_cells
.filter(|(_, count)| *count == 3)
.map(|(cell, _)| cell);
let live_with_two_neighbors = maybe_live_cells
.filter(|(_, count)| *count == 2)
.semijoin(&live)
.map(|(cell, _)| cell);
let live_next_round = live_with_two_neighbors
.concat(&live_with_three_neighbors)
.distinct();
live_next_round
})
]]>Real-time apps for Boston Transit with live data are easy to set up using Materialize; see two examples you can run at home.
]]>CREATE MATERIALIZED SOURCE mbta_stops FROM FILE '/workdir/workspace/MBTA_GTFS/stops.txt' FORMAT CSV WITH HEADER;
CREATE MATERIALIZED SOURCE mbta_routes FROM FILE '/workdir/workspace/MBTA_GTFS/routes.txt' FORMAT CSV WITH HEADER;
SELECT * FROM south_from_kendall ORDER BY departure_time;
CREATE MATERIALIZED VIEW parsed_all_trip as SELECT trip_id, payload->'attributes'->>'bikes_allowed' bikes_allowed, CAST(CAST(payload->'attributes'->>'direction_id' AS DECIMAL(5,1)) AS INT) direction_id, payload->'attributes'->>'headsign' headsign, payload->'attributes'->>'wheelchair_accessible' wheelchair_accessible, payload->'relationships'->'route'->'data'->>'id' route_id, payload->'relationships'->'route_pattern'->'data'->>'id' route_pattern_id, payload->'relationships'->'service'->'data'->>'id' service_id, payload->'relationships'->'shape'->'data'->>'id' shape_id FROM (SELECT key0 as trip_id, cast ("text" as jsonb) AS payload FROM all_trip);
CREATE MATERIALIZED VIEW parsed_all_vehicles as SELECT vehicle_id, payload->'attributes'->>'current_status' status, CAST(CAST(payload->'attributes'->>'direction_id' AS DECIMAL(5,1)) AS INT) direction_id, payload->'relationships'->'route'->'data'->>'id' route_id, payload->'relationships'->'stop'->'data'->>'id' stop_id, payload->'relationships'->'trip'->'data'->>'id' trip_id FROM (SELECT key0 as vehicle_id, cast ("text" as jsonb) AS payload FROM all_vehicles);
CREATE MATERIALIZED VIEW current_time_v AS SELECT max(to_timestamp(cast(text as int))) AS now FROM current_time;
CREATE INDEX countdown_stop_dir_rt ON countdown(stop_name, direction, route_name);
CREATE INDEX one_leg_stops ON one_leg_travel_time(origin, destination);
SELECT departure_time, arrival_time, headsign FROM one_leg_travel_time WHERE origin = 'Kendall/MIT' and destination = 'South Station' ORDER BY arrival_time;
]]>In this blog, we'll examine the loan underwriting process, including the current landscape, credit modeling, and the move toward big data and SQL.
]]>This blog will provide an overview of the different data architectures lenders use to power real-time loan underwriting.
]]>Here's how to save money on your data warehouse bill with normalized data models and data mesh principles.
]]>Efficiently maintain joins with shared arrangements & reduce resource usage with Materialize's innovative approach.
]]>Discover how Materialize engineered its self-managed product to support flexible deployments, improve architecture, and meet diverse customer needs, all while refining its managed cloud service.
]]>Insights on how Differential Dataflow manages & limits memory use for processing unbounded data streams, ensuring efficiency.
]]>// Build a dataflow to present most recent values for keys.
worker.dataflow(|scope| {
use differential_dataflow::operators::reduce::Reduce;
// Determine the most recent inputs for each key.
input
.to_collection(scope);
.reduce(|_key, input, output| {
// Emit the last value with a count of 1
let max = input.last().unwrap();
output.push((*max.0, 1));
})
.probe_with(&mut probe);
});
loop {
// Refresh our view of elapsed time.
let elapsed = worker.timer().elapsed();
// Refresh the maximum gap between elapsed and completed times.
// Important: this varies based on rate; low rate ups the latency.
let completed = probe.with_frontier(|frontier| frontier[0]);
if max_latency < elapsed - completed {
max_latency = elapsed - completed;
}
// Report how large a gap we just experienced.
if input.time().as_secs() != elapsed.as_secs() {
println!("{:?}\tmax latency: {:?}", elapsed, max_latency);
}
// Insert any newly released requests.
while pause * req_counter < elapsed {
input.advance_to(pause * req_counter);
input.insert((0, pause * req_counter));
req_counter += worker.peers() as u32;
}
input.advance_to(elapsed);
input.flush();
// Take just one step! (perhaps we should take more)
worker.step();
}
// Build a dataflow to present most recent values for keys.
worker.dataflow(|scope| {
use differential_dataflow::operators::reduce::Reduce;
// Give input its own name to re-use later.
let input = input.to_collection(scope);
// Determine the most recent inputs for each key.
let results = input
.reduce(|_key, input, output| {
// Emit the last value with a count of 1
let max = input.last().unwrap();
output.push((*max.0, 1));
})
.probe_with(&mut probe);
// Retract any input not present in the ouput.
let retractions = input.concat(&results.negate());
});
use differential_dataflow::operators::reduce::Reduce;
use differential_dataflow::operators::iterate::Variable;
// Prepare some delayed feedback from the output.
// Explanation of `delay` deferred for the moment.
let delay = Duration::from_nanos(delay_ns);
let retractions = Variable::new(scope, delay);
// Give input its own name to re-use later.
let input = input.to_collection(scope);
// Determine the results minus any retractions.
let results = input
.concat(&retractions.negate())
.reduce(|_key, input, output| {
let max = input.last().unwrap();
output.push((*max.0, max.1));
})
.probe_with(&mut probe);
// Retract any input that is not an output.
retractions.set(&input.concat(&results.negate()));
});
]]>Using dbt to manage and document a streaming analytics workflow from a message broker to Metabase.
]]>We reduced memory requirements for many users by nearly 2x, resulting in significant cost-savings.
]]>Materialize aims to be usable by anyone who knows SQL, but for those interested in going deeper and understanding the architecture powering Materialize, this post is for you!
]]>Materialize is now partners with Snowflake. Celebrate with us next week at Snowflake Data Cloud Summit.
]]>Materialize costs 1/20th what Aurora PostgreSQL read replicas cost, when you have non-trivial business logic.
]]>Materialize Beta offers insights on a cloud data warehouse with real-time streaming capabilities for immediate action on current data.
]]>Materialize Cloud, now in open beta, offers real-time data warehousing for immediate insights & action on live data.
]]>Read about how we give back to the open source community through our Community Sponsorship Program.
]]>Connect headless BI tool Cube.js to the read-side of Materialize to get Rest/GraphQL API's, Authentication, metrics modelling, and more out of the box.
]]>services: materialize: image: materialize/materialized:v0.26.1 ports: - 6875:6875
seed: image: jbergknoff/postgresql-client volumes: - .:/seed entrypoint: ['sh', 'seed/seed.sh'] depends_on: - materialize
cube: image: cubejs/cube:latest ports: - 4000:4000 environment: - CUBEJS_DEV_MODE=true - CUBEJS_DB_TYPE=materialize - CUBEJS_DB_HOST=materialize - CUBEJS_DB_PORT=6875 - CUBEJS_DB_NAME=materialize - CUBEJS_DB_USER=materialize - CUBEJS_API_SECRET=SECRET volumes: - .:/cube/conf depends_on: - seed
cat > seed.sql << EOL
CREATE SOURCE hn_raw FROM PUBNUB SUBSCRIBE KEY 'sub-c-c00db4fc-a1e7-11e6-8bfd-0619f8945a4f' CHANNEL 'hacker-news';
CREATE VIEW hn AS SELECT (item::jsonb)->>'link' AS link, (item::jsonb)->>'comments' AS comments, (item::jsonb)->>'title' AS title, ((item::jsonb)->>'rank')::int AS rank FROM ( SELECT jsonb_array_elements(text::jsonb) AS item FROM hn_raw );
CREATE MATERIALIZED VIEW hn_top AS SELECT link, comments, title, MIN(rank) AS rank FROM hn GROUP BY 1, 2, 3;
EOL
psql -U materialize -h materialize -p 6875 materialize -f ./seed.sql
refreshKey: {
every: '1 second'
},
measures: {
count: {
type: count
},
countTop3: {
type: `count`,
filters: [
{
sql: `${rank} <= 3`
}
]
},
bestRank: {
sql: `rank`,
type: `min`
}
},
dimensions: {
link: {
sql: link
,
type: string
},
comments: {
sql: `comments`,
type: `string`
},
title: {
sql: `title`,
type: `string`
},
rank: {
sql: `rank`,
type: `number`
}
},
segments: {
show: {
sql: ${title} LIKE 'Show HN:%'
}
}
});
Materialize & Datalot collaborate on cutting-edge real-time application development, leveraging streaming data for immediate insights & action.
]]>Here's a step-by-step walkthrough of how to use the Materialize Emulator.
]]>Issue a SQL query to get started. Need help? View documentation: https://materialize.com/s/docs Join our Slack community: https://materialize.com/s/chat
psql (15.7 (Ubuntu 15.7-0ubuntu0.23.10.1), server 9.5.0) Type "help" for help.
materialize=>
postgres=# CREATE PUBLICATION mz_source FOR ALL TABLES; CREATE PUBLICATION postgres=# CREATE TABLE t (f1 INTEGER); CREATE TABLE postgres=# ALTER TABLE t REPLICA IDENTITY FULL; ALTER TABLE postgres=# INSERT INTO t VALUES (1), (2), (3); INSERT 0 3
1000400050002 (1 row) Time: 40.362 ms
PREF="${PWD##*/}"
wait_for_health() { echo -n "waiting for container '$PREF-$1' to be healthy" while [ "$(docker inspect -f '{{.State.Health.Status}}' "$PREF-$1")" != "healthy" ]; do echo -n "." sleep 1 done printf "\ncontainer '%s' is healthy\n" "$PREF-$1" }
cat > docker-compose.yml <<EOF version: '3.8' services: materialized: image: materialize/materialized:latest container_name: $PREF-materialized environment: MZ_SYSTEM_PARAMETER_DEFAULT: "enable_copy_to_expr=true" networks: - network ports: - "127.0.0.1:6875:6875" - "127.0.0.1:6876:6876" healthcheck: test: ["CMD", "curl", "-f", "localhost:6878/api/readyz"] interval: 1s start_period: 60s
postgres: image: postgres:latest container_name: $PREF-postgres environment: POSTGRES_PASSWORD: postgres POSTGRES_INITDB_ARGS: "-c wal_level=logical" networks: - network ports: - "127.0.0.1:5432:5432" healthcheck: test: ["CMD", "pg_isready", "-d", "db_prod"] interval: 1s start_period: 60s
mysql: image: mysql:latest container_name: $PREF-mysql environment: MYSQL_ROOT_PASSWORD: mysql networks: - network ports: - "127.0.0.1:3306:3306" command: - "--log-bin=mysql-bin" - "--gtid_mode=ON" - "--enforce_gtid_consistency=ON" - "--binlog-format=row" - "--binlog-row-image=full" healthcheck: test: ["CMD", "mysqladmin", "ping", "--password=mysql", "--protocol=TCP"] interval: 1s start_period: 60s
redpanda: image: vectorized/redpanda:latest container_name: $PREF-redpanda networks: - network ports: - "127.0.0.1:9092:9092" - "127.0.0.1:8081:8081" command: - "redpanda" - "start" - "--overprovisioned" - "--smp=1" - "--memory=1G" - "--reserve-memory=0M" - "--node-id=0" - "--check=false" - "--set" - "redpanda.enable_transactions=true" - "--set" - "redpanda.enable_idempotence=true" - "--set" - "--advertise-kafka-addr=redpanda:9092" healthcheck: test: ["CMD", "curl", "-f", "localhost:9644/v1/status/ready"] interval: 1s start_period: 60s
minio: image: minio/minio:latest container_name: $PREF-minio environment: MINIO_STORAGE_CLASS_STANDARD: "EC:0" networks: - network ports: - "127.0.0.1:9000:9000" - "127.0.0.1:9001:9001" entrypoint: ["sh", "-c"] command: ["mkdir -p /data/$PREF && minio server /data --console-address :9001"] healthcheck: test: ["CMD", "curl", "-f", "localhost:9000/minio/health/live"] interval: 1s start_period: 60s
networks: network: driver: bridge EOF docker compose down || true docker compose up -d
wait_for_health postgres psql postgres://postgres:[email protected]:5432/postgres <<EOF CREATE PUBLICATION mz_source FOR ALL TABLES; CREATE TABLE pg_table (f1 INTEGER); ALTER TABLE pg_table REPLICA IDENTITY FULL; INSERT INTO pg_table VALUES (1), (2), (3); EOF
wait_for_health mysql mysql --protocol=tcp --user=root --password=mysql <<EOF CREATE DATABASE public; USE public; CREATE TABLE mysql_table (f1 INTEGER); INSERT INTO mysql_table VALUES (1), (2), (3); EOF
wait_for_health redpanda docker compose exec -T redpanda rpk topic create redpanda_table docker compose exec -T redpanda rpk topic produce redpanda_table <<EOF {"f1": 1} {"f1": 2} {"f1": 3} EOF
wait_for_health materialized psql postgres://[email protected]:6875/materialize <<EOF -- Create a Postgres source CREATE SECRET pgpass AS 'postgres'; CREATE CONNECTION pg TO POSTGRES ( HOST '$PREF-postgres', DATABASE postgres, USER postgres, PASSWORD SECRET pgpass ); CREATE SOURCE mz_source FROM POSTGRES CONNECTION pg ( PUBLICATION 'mz_source' ) FOR SCHEMAS (public);
-- Create a MySQL source CREATE SECRET mysqlpass AS 'mysql'; CREATE CONNECTION mysql TO MYSQL ( HOST '$PREF-mysql', USER root, PASSWORD SECRET mysqlpass ); CREATE SOURCE mysql_source FROM MYSQL CONNECTION mysql FOR ALL TABLES;
-- Create a Webhook source CREATE SOURCE webhook_table FROM WEBHOOK BODY FORMAT TEXT;
-- Create a Redpanda (Kafka-compatible) source CREATE CONNECTION kafka_conn TO KAFKA ( BROKER '$PREF-redpanda:9092', SECURITY PROTOCOL PLAINTEXT ); CREATE CONNECTION csr_conn TO CONFLUENT SCHEMA REGISTRY ( URL 'http://$PREF-redpanda:8081/' ); CREATE SOURCE redpanda_table FROM KAFKA CONNECTION kafka_conn ( TOPIC 'redpanda_table' ) FORMAT JSON;
-- Simple materialized view, incrementally updated, with data from all sources CREATE MATERIALIZED VIEW mv AS SELECT sum(pg_table.f1 + mysql_table.f1 + webhook_table.body::int + (redpanda_table.data->'f1')::int) FROM pg_table JOIN mysql_table ON TRUE JOIN webhook_table ON TRUE JOIN redpanda_table ON TRUE;
-- Create a sink to Redpanda so that the topic will always be up to date CREATE SINK sink FROM mv INTO KAFKA CONNECTION kafka_conn (TOPIC 'mv') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_conn ENVELOPE DEBEZIUM;
-- One-off export of our materialized view to S3-compatible MinIO CREATE SECRET miniopass AS 'minioadmin'; CREATE CONNECTION minio TO AWS ( ENDPOINT 'http://minio:9000', REGION 'minio', ACCESS KEY ID 'minioadmin', SECRET ACCESS KEY SECRET miniopass ); COPY (SELECT * FROM mv) TO 's3://$PREF/mv' WITH ( AWS CONNECTION = minio, FORMAT = 'csv' );
-- Allow HTTP API read requests without a token CREATE ROLE anonymous_http_user; GRANT SELECT ON TABLE mv TO anonymous_http_user; EOF
curl -d "1" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table curl -d "2" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table curl -d "3" -X POST http://127.0.0.1:6876/api/webhook/materialize/public/webhook_table
docker compose exec -T redpanda rpk topic consume mv --num 1
docker compose exec -T minio mc ls data/mzemulator/mv
psql postgres://[email protected]:6875/materialize <<EOF SELECT * FROM pg_table; SELECT * FROM mysql_table; SELECT * FROM webhook_table; SELECT * FROM redpanda_table; SELECT * FROM mv; EOF
curl -s -X POST -H "Content-Type: application/json"
--data '{"queries": [{"query": "SELECT * FROM mv"}]}'
http://localhost:6876/api/sql | jq -r ".results[0].rows[0][0]"
Materialize raises another round of funding to help build a cloud-native streaming data warehouse.
]]>Materialize secures Series B funding to enhance its Operational Data Warehouse with real-time streaming capabilities for immediate data action.
]]>Materialize's new cloud architecture enhances scalability & performance by breaking the materialized
binary into separate services.
An in-depth look at Materialize, the Operational Data Warehouse with streaming capabilities for real-time data action.
]]>In this guide, we’ll show you how to migrate your existing PostgreSQL dbt project to Materialize with minimal SQL tweaks.
]]>SELECT customer_id, SUM(order_total) AS total_revenue FROM orders GROUP BY customer_id;
WITH latest_orders AS ( SELECT * FROM {{ source('public', 'orders') }} WHERE updated_at > (SELECT COALESCE(MAX(updated_at), '1900-01-01'::timestamp) FROM {{ this }}) ),
updated_customers AS ( SELECT customer_id, SUM(order_total) AS total_revenue FROM latest_orders GROUP BY customer_id ),
existing_customers AS ( SELECT customer_id, total_revenue FROM {{ this }} WHERE customer_id NOT IN (SELECT customer_id FROM updated_customers) )
SELECT * FROM updated_customers UNION ALL SELECT * FROM existing_customers
SELECT customer_id, SUM(order_total) AS total_revenue FROM orders GROUP BY customer_id;
SELECT customer_id, SUM(order_value) AS total_value FROM orders GROUP BY customer_id HAVING SUM(order_value) > 1000;
SELECT order_id, customer_id, order_total, order_date FROM orders WHERE order_date + INTERVAL ‘24 hours’ >= mz_now();
]]>Learn how replacing the legacy materialized view with a new element is transformational for your data stack.
]]>A framework for understanding why and when to shift a workload from traditional cloud data warehouses to Materialize.
]]>Access the freshest data in MySQL to power your operational workflows
]]>Materialize welcomes new CEO Nate Stewart, who previously served on the Materialize board and comes to us from Cockroach Labs.
]]>New names, new sizes, plus spill-to-disk capabilities
]]>Today, we’re excited to announce a product that we feel is transformational: a persistent, scalable, cloud-native Materialize.
]]>In the following blog, we’ll show you how to create real-time alerts using Materialize’s integration with Novu.
]]>INSERT INTO materialize.auction.auction_alerts VALUES ('expensive pizza', 90, 'Best Pizza in Town' ), ('all art', 0, 'Custom Art');
CREATE VIEW active_alerts AS SELECT alert_name, id as auction_id, item_name, amount as price FROM ( SELECT id, item, amount FROM materialize.auction.winning_bids ) p, LATERAL ( SELECT price_above, item_name, alert_name FROM materialize.auction.auction_alerts a WHERE a.item_name = p.item AND a.price_above <= p.amount );
CREATE INDEX active_alerts_idx ON active_alerts (alert_name,alert_name) WITH (RETAIN HISTORY FOR '1hr');
]]>To showcase the power of an ODS, we’ve developed a demo for an e-commerce company, based on a dynamic pricing use case.
]]>promotion_effect AS (
SELECT
p.product_id,
min(pr.promotion_discount) AS promotion_discount
FROM public.promotions AS pr
INNER JOIN public.products AS p ON pr.product_id = p.product_id
WHERE pr.active = TRUE
GROUP BY p.product_id
),
popularity_score AS (
SELECT
s.product_id,
rank() OVER (PARTITION BY p.category_id ORDER BY count(s.sale_id) DESC) AS popularity_rank,
count(s.sale_id) AS sale_count
FROM public.sales AS s
INNER JOIN public.products AS p ON s.product_id = p.product_id
GROUP BY s.product_id, p.category_id
),
inventory_status AS (
SELECT
i.product_id,
sum(i.stock) AS total_stock,
rank() OVER (ORDER BY sum(i.stock) DESC) AS stock_rank
FROM public.inventory AS i
GROUP BY i.product_id
),
high_demand_products AS (
SELECT
p.product_id,
avg(s.sale_price) AS avg_sale_price,
count(s.sale_id) AS total_sales
FROM public.products AS p
INNER JOIN public.sales AS s ON p.product_id = s.product_id
GROUP BY p.product_id
HAVING count(s.sale_id) > (SELECT avg(total_sales) FROM (SELECT count(*) AS total_sales FROM public.sales GROUP BY product_id) AS subquery)
),
dynamic_pricing AS (
SELECT
p.product_id,
p.base_price,
CASE
WHEN pop.popularity_rank <= 3 THEN 1.2
WHEN pop.popularity_rank BETWEEN 4 AND 10 THEN 1.1
ELSE 0.9
END AS popularity_adjustment,
rp.avg_price,
coalesce(1.0 - (pe.promotion_discount / 100), 1) AS promotion_discount,
CASE
WHEN inv.stock_rank <= 3 THEN 1.1
WHEN inv.stock_rank BETWEEN 4 AND 10 THEN 1.05
ELSE 1
END AS stock_adjustment,
CASE
WHEN p.base_price > rp.avg_price THEN 1 + (p.base_price - rp.avg_price) / rp.avg_price
ELSE 1 - (rp.avg_price - p.base_price) / rp.avg_price
END AS demand_multiplier,
hd.avg_sale_price,
CASE
WHEN p.product_name ILIKE '%cheap%' THEN 0.8
ELSE 1.0
END AS additional_discount
FROM public.products AS p
LEFT JOIN recent_prices AS rp ON p.product_id = rp.product_id
LEFT JOIN promotion_effect AS pe ON p.product_id = pe.product_id
INNER JOIN popularity_score AS pop ON p.product_id = pop.product_id
LEFT JOIN inventory_status AS inv ON p.product_id = inv.product_id
LEFT JOIN high_demand_products AS hd ON p.product_id = hd.product_id
)
SELECT dp.product_id, round(dp.base_price * dp.popularity_adjustment * dp.stock_adjustment * dp.demand_multiplier, 2) AS adjusted_price, round(dp.base_price * dp.popularity_adjustment * dp.stock_adjustment * dp.demand_multiplier * dp.promotion_discount * dp.additional_discount, 2) AS discounted_price FROM dynamic_pricing AS dp;
ALTER TABLE public.inventory ADD CONSTRAINT inventory_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id); ALTER TABLE public.promotions ADD CONSTRAINT promotions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id); ALTER TABLE public.sales ADD CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
CREATE INDEX idx_products_product_name ON products (product_name); CREATE INDEX idx_sales_product_id ON sales (product_id); CREATE INDEX idx_sales_sale_date ON sales (sale_date); CREATE INDEX idx_sales_product_id_sale_date ON sales (product_id, sale_date); CREATE INDEX idx_promotions_product_id ON promotions (product_id); CREATE INDEX idx_promotions_active ON promotions (active); CREATE INDEX idx_promotions_product_id_active ON promotions (product_id, active); CREATE INDEX idx_inventory_product_id ON inventory (product_id);
]]>There are many different methods for OLTP offload, and in the following blog, we will examine the most popular options.
]]>Read the following blog to learn about OLTP vs. OLAP, problems with complex OLTP workloads, and the case for OLTP offload.
]]>Materialize's approach to data processing & view maintenance offers real-time insights for immediate action on live data.
]]>Take a guided tour through Materialize's three pillars of product value, and see how we think about providing value for your operational workloads.
]]>Materialize's consistency guarantees are key for confidence in data warehouses. Understand the benefits & see real-world tests in action.
]]>-- Maintain the credits owed by each account. CREATE MATERIALIZED VIEW debits AS SELECT buyer, SUM(amount) AS total FROM winning_bids GROUP BY buyer;
-- Maintain the net balance for each account. CREATE VIEW balance AS SELECT coalesce(seller, buyer) as id, coalesce(credits.total, 0) - coalesce(debits.total, 0) AS total FROM credits FULL OUTER JOIN debits ON(credits.seller = debits.buyer);
-- This will always equal zero. SELECT SUM (total) FROM balance;
]]>In this post, we'll build a recipe for a generic live data source using standard SQL primitives and some Materialize magic.
]]>Time: 52.580 ms
Time: 19.711 ms
Time: 87428.589 ms (01:27.429)
Time: 61.283 ms
9755 (1 row)
Time: 602.481 ms materialize=> select seller, count() from potential_flips group by seller order by count() desc limit 5; seller | count --------+------- 42091 | 7 42518 | 6 10529 | 6 39840 | 6 49317 | 6 (5 rows)
Time: 678.330 ms
This is now pretty interactive, using scant resources, over enough data and through complex views that to start from scratch would be exhausting. However, maintained indexes keep intermediate results up to date, and you get the same results as if re-run from scratch, just without the latency. -->
1716312983129 1 0
-- Supporting view to translate ids into text. CREATE VIEW items (id, item) AS VALUES (0, 'Signed Memorabilia'), (1, 'City Bar Crawl'), (2, 'Best Pizza in Town'), (3, 'Gift Basket'), (4, 'Custom Art');
-- Each year-long interval of interest CREATE VIEW years AS SELECT * FROM generate_series( '1970-01-01 00:00:00+00', '2099-01-01 00:00:00+00', '1 year') year WHERE mz_now() BETWEEN year AND year + '1 year' + '1 day';
-- Each day-long interval of interest CREATE VIEW days AS SELECT * FROM ( SELECT generate_series(year, year + '1 year' - '1 day'::interval, '1 day') as day FROM years UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN day AND day + '1 day' + '1 day';
-- Each hour-long interval of interest CREATE VIEW hours AS SELECT * FROM ( SELECT generate_series(day, day + '1 day' - '1 hour'::interval, '1 hour') as hour FROM days UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN hour AND hour + '1 hour' + '1 day';
-- Each minute-long interval of interest CREATE VIEW minutes AS SELECT * FROM ( SELECT generate_series(hour, hour + '1 hour' - '1 minute'::interval, '1 minute') AS minute FROM hours UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN minute AND minute + '1 minute' + '1 day';
-- Any second-long interval of interest CREATE VIEW seconds AS SELECT * FROM ( SELECT generate_series(minute, minute + '1 minute' - '1 second'::interval, '1 second') as second FROM minutes UNION ALL SELECT * FROM empty ) WHERE mz_now() BETWEEN second AND second + '1 second' + '1 day';
-- Indexes are important to ensure we expand intervals carefully. CREATE DEFAULT INDEX ON years; CREATE DEFAULT INDEX ON days; CREATE DEFAULT INDEX ON hours; CREATE DEFAULT INDEX ON minutes; CREATE DEFAULT INDEX ON seconds;
-- The final view we'll want to use . CREATE VIEW moments AS SELECT second AS moment FROM seconds WHERE mz_now() >= second AND mz_now() < second + '1 day';
-- Extract pseudorandom bytes from each moment. CREATE VIEW random AS SELECT moment, digest(moment::text, 'md5') as random FROM moments;
-- Present as auction CREATE VIEW auctions_core AS SELECT moment, random, get_byte(random, 0) + get_byte(random, 1) * 256 + get_byte(random, 2) * 65536 as id, get_byte(random, 3) + get_byte(random, 4) * 256 as seller, get_byte(random, 5) as item, -- Have each auction expire after up to 256 minutes. moment + (get_byte(random, 6)::text || ' minutes')::interval as end_time FROM random;
-- Refine and materialize auction data. CREATE MATERIALIZED VIEW auctions AS SELECT auctions_core.id, seller, items.item, end_time FROM auctions_core, items WHERE auctions_core.item % 5 = items.id;
-- Create and materialize bid data. CREATE MATERIALIZED VIEW bids AS -- Establish per-bid records and randomness. WITH prework AS ( SELECT id AS auction_id, moment as auction_start, end_time as auction_end, digest(random::text || generate_series(1, get_byte(random, 5))::text, 'md5') as random FROM auctions_core ) SELECT get_byte(random, 0) + get_byte(random, 1) * 256 + get_byte(random, 2) * 65536 as id, get_byte(random, 3) + get_byte(random, 4) * 256 AS buyer, auction_id, get_byte(random, 5)::numeric AS amount, auction_start + (get_byte(random, 6)::text || ' minutes')::interval as bid_time FROM prework;
]]>Operational data stores maintained real-time data, and allowed access to denormalized data across databases. But why don't you see that pattern much any more?
]]>Under-resourced small data teams can now leverage a SaaS solution with streaming data and SQL support to build real-time applications.
]]>We've built Materialize as a new kind of data warehouse, optimized to handle operational data work with the same familiar process from analytical warehouses.
]]>In this tutorial, we’ll connect Oracle CDC to Materialize in just a few minutes using Estuary Flow’s Dekaf.
]]>CREATE CONNECTION estuary_connection TO KAFKA ( BROKER 'dekaf.estuary.dev', SECURITY PROTOCOL = 'SASL_SSL', SASL MECHANISMS = 'PLAIN', SASL USERNAME = '{}', SASL PASSWORD = SECRET estuary_refresh_token );
CREATE CONNECTION csr_estuary_connection TO CONFLUENT SCHEMA REGISTRY ( URL 'https://dekaf.estuary.dev', USERNAME = '{}', PASSWORD = SECRET estuary_refresh_token );
CREATE SOURCE sales_source FROM KAFKA CONNECTION estuary_connection (TOPIC '<name-of-your-flow-collection>') FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION csr_estuary_connection ENVELOPE UPSERT;
CREATE INDEX idx_aggregated_sales ON aggregated_sales(total_sales);
]]>Materialize is written in Rust. Why did we make that decision and how has it turned out for the project?
]]>v.push("World"); // Now the compile knows what the vector contains
println!("Hello {}", v[0]); // And can statically guarantee the type is something we can print!
match s.find("great") { Some(idx) => println!("substring: {}", &s[idx..idx + 5]), None => { // hmm, I didn't find the substring, so I'll have to handle it somehow } }
// Add something to the vector v.push(4);
// change something in the vector *end = 3;
]]>Materialize outperforms Aurora for complex queries over relatively small data volumes. Here are the benchmarks.
]]>promotion_effect AS ( SELECT p.product_id, MIN(pr.promotion_discount) AS promotion_discount FROM promotions pr JOIN products p ON pr.product_id = p.product_id WHERE pr.active = TRUE GROUP BY p.product_id ),
popularity_score AS ( SELECT s.product_id, RANK() OVER (PARTITION BY p.category_id ORDER BY COUNT(s.sale_id) DESC) AS popularity_rank, COUNT(s.sale_id) AS sale_count FROM sales s JOIN products p ON s.product_id = p.product_id GROUP BY s.product_id, p.category_id ),
inventory_status AS ( SELECT i.product_id, SUM(i.stock) AS total_stock, RANK() OVER (ORDER BY SUM(i.stock) DESC) AS stock_rank FROM inventory i GROUP BY i.product_id ),
high_demand_products AS ( SELECT p.product_id, AVG(s.sale_price) AS avg_sale_price, COUNT(s.sale_id) AS total_sales FROM products p JOIN sales s ON p.product_id = s.product_id GROUP BY p.product_id HAVING COUNT(s.sale_id) > (SELECT AVG(total_sales) FROM (SELECT COUNT(*) AS total_sales FROM sales GROUP BY product_id) subquery) ),
dynamic_pricing AS ( SELECT p.product_id, p.base_price, CASE WHEN pop.popularity_rank <= 3 THEN 1.2 WHEN pop.popularity_rank BETWEEN 4 AND 10 THEN 1.1 ELSE 0.9 END AS popularity_adjustment, rp.avg_price, COALESCE(1.0 - (pe.promotion_discount / 100), 1) AS promotion_discount, CASE WHEN inv.stock_rank <= 3 THEN 1.1 WHEN inv.stock_rank BETWEEN 4 AND 10 THEN 1.05 ELSE 1 END AS stock_adjustment, CASE WHEN p.base_price > rp.avg_price THEN 1 + (p.base_price - rp.avg_price) / rp.avg_price ELSE 1 - (rp.avg_price - p.base_price) / rp.avg_price END AS demand_multiplier, hd.avg_sale_price, CASE WHEN p.product_name ilike '%cheap%' THEN 0.8 ELSE 1.0 END AS additional_discount FROM products p LEFT JOIN recent_prices rp ON p.product_id = rp.product_id LEFT JOIN promotion_effect pe ON p.product_id = pe.product_id JOIN popularity_score pop ON p.product_id = pop.product_id LEFT JOIN inventory_status inv ON p.product_id = inv.product_id LEFT JOIN high_demand_products hd ON p.product_id = hd.product_id )
SELECT dp.product_id, dp.base_price * dp.popularity_adjustment * dp.promotion_discount * dp.stock_adjustment * dp.demand_multiplier * dp.additional_discount AS adjusted_price FROM dynamic_pricing dp;
ALTER TABLE categories ADD CONSTRAINT categories_pkey PRIMARY KEY (category_id);
ALTER TABLE suppliers ADD CONSTRAINT suppliers_pkey PRIMARY KEY (supplier_id);
ALTER TABLE sales ADD CONSTRAINT sales_pkey PRIMARY KEY (sale_id);
ALTER TABLE inventory ADD CONSTRAINT inventory_pkey PRIMARY KEY (inventory_id);
ALTER TABLE promotions ADD CONSTRAINT promotions_pkey PRIMARY KEY (promotion_id);
ALTER TABLE public.inventory ADD CONSTRAINT inventory_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
ALTER TABLE public.promotions ADD CONSTRAINT promotions_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
ALTER TABLE public.sales ADD CONSTRAINT sales_product_id_fkey FOREIGN KEY (product_id) REFERENCES public.products (product_id);
CREATE INDEX idx_products_product_name ON products(product_name); CREATE INDEX idx_sales_product_id ON sales(product_id); CREATE INDEX idx_sales_sale_date ON sales(sale_date); CREATE INDEX idx_sales_product_id_sale_date ON sales(product_id, sale_date); CREATE INDEX idx_promotions_product_id ON promotions(product_id); CREATE INDEX idx_promotions_active ON promotions(active); CREATE INDEX idx_promotions_product_id_active ON promotions(product_id, active); CREATE INDEX idx_inventory_product_id ON inventory(product_id);
CREATE TABLE categories ( category_id SERIAL PRIMARY KEY, category_name VARCHAR(255) NOT NULL );
CREATE TABLE suppliers ( supplier_id SERIAL PRIMARY KEY, supplier_name VARCHAR(255) NOT NULL );
CREATE TABLE sales ( sale_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, sale_price NUMERIC(10, 2) NOT NULL, sale_date TIMESTAMP NOT NULL, price NUMERIC(10, 2) NOT NULL );
CREATE TABLE inventory ( inventory_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, stock INTEGER NOT NULL, warehouse_id INTEGER NOT NULL, restock_date TIMESTAMP NOT NULL );
CREATE TABLE promotions ( promotion_id SERIAL PRIMARY KEY, product_id INTEGER NOT NULL, promotion_discount NUMERIC(10, 2) NOT NULL, start_date TIMESTAMP NOT NULL, end_date TIMESTAMP NOT NULL, active BOOLEAN NOT NULL );
]]>As an operational data store, Materialize is fundamentally different on the inside, but it's compatible with PostgreSQL in a few important ways.
]]>Major updates to PostgreSQL streaming replication allow for real-time & incrementally updated materialized views with Materialize.
]]>Master Materialize for enhanced scale, performance & power with key internal insights. A guide for aspiring power-users.
]]>Here's how we developed frictionless private networking for Kafka by using librdkafka.
]]>The following blog will show you we keep our customers and developers happy with our rigorous QA process, including our tools and testing methods.
]]> assert_eq!(
mz_sql::catalog::ObjectType::ClusterReplica,
conn_catalog.get_object_type(&ObjectId::ClusterReplica((
ClusterId::User(1),
ReplicaId::User(1)
)))
);
assert_eq!(
mz_sql::catalog::ObjectType::Role,
conn_catalog.get_object_type(&ObjectId::Role(RoleId::User(1)))
);
catalog.expire().await;
})
.await;
}
query error column "hello world" does not exist select "hello world"
1 2
$ kafka-verify-data format=avro sink=materialize.public.sink sort-messages=true {"before": null, "after": {"a": 1}} {"before": null, "after": {"a": 2}}
def workflow_test(c: Composition): c.up("zookeeper", "kafka", "schema-registry", "materialized") c.run_testdrive_files("*.td")
def manipulate(self) -> list[Testdrive]: return [ Testdrive("> DELETE FROM delete_table WHERE f1 % 3 = 0;"), Testdrive("> DELETE FROM delete_table WHERE f1 % 3 = 1;") ]
def validate(self) -> Testdrive: return Testdrive( dedent( """ > SELECT COUNT(*), MIN(f1), MAX(f1), COUNT(f1), COUNT(DISTINCT f1) FROM delete_table GROUP BY f1 % 3; 3333 2 9998 3333 3333 """ ) )
]
> SELECT COUNT(*) > 0 FROM mz_internal.mz_source_statuses WHERE error LIKE '%Connection refused%'; true
def bootstrap(self) -> list[ActionOrFactory]: return super().bootstrap() + [PostgresStart]
def actions_with_weight(self) -> dict[ActionOrFactory, float]: return { CreatePostgresTable: 10, CreatePostgresCdcTable: 10, KillClusterd: 5, StoragedKill: 5, StoragedStart: 5, PostgresRestart: 10, CreateViewParameterized(): 10, ValidateView: 20, PostgresDML: 100, }
]]>Now in Private Preview, Query History lets you monitor your SQL query performance to detect potential bottlenecks
]]>Comprehensive RBAC for Materialize users ensures secure, production-grade environment management & access control.
]]>CREATE DATABASE data_scientists_db; CREATE CLUSTER data_scientists_cluster SIZE = 'medium';
GRANT ALL PRIVILEGES ON DATABASE data_scientists_db TO data_scientists; GRANT ALL PRIVILEGES ON CLUSTER data_scientists_cluster TO data_scientists;
materialize=> INSERT INTO payments_db.public.purchase_history VALUES (42); ERROR: permission denied for TABLE "payments_db.public.purchase_history"
]]>Build a real-time A/B testing stack with Segment, Kinesis and Materialize.
]]>Real-time SQL monitoring & data quality tests with dbt & Materialize for continuous insights as data evolves.
]]>tests: project: +store_failures: true +schema: test
materialize=> select * from public_test.not_null_stg_postgres__items_price; id | item | price | inventory ----+----------+-------+----------- 5 | NEW_ITEM | | (1 row)
materialize=> select * from public_test.dim_items_accepted_values; value_field | n_records -------------+----------- NEW_ITEM | 1 (1 row)
materialize=> select * from public_test.etl_alert; view_name | n_records ------------------------------------+----------- not_null_stg_postgres__items_price | 1 dim_items_accepted_values | 1
materialize=*> FETCH all c; mz_timestamp | mz_diff | view_name | n_records ---------------+---------+------------------------------------+----------- 1657555763000 | -1 | not_null_stg_postgres__items_price | 1 (1 row)
materialize=> select * from public.dim_users where id = 256; id | email | is_vip | revenue | orders | items_sold | last_purchase_ts | first_purchase_ts | pageviews | last_pageview_ts | first_pageview_ts -----+--------------------------+--------+---------+--------+------------+----------------------------+----------------------------+-----------+------------------------+------------------------ 256 | [email protected] | f | 2993.59 | 6 | 16 | 2022-07-14 14:18:50.849612 | 2022-07-14 14:10:26.434826 | 76 | 2022-07-14 14:18:50+00 | 2022-07-14 14:07:42+00 256 | [email protected] | f | 2993.59 | 6 | 16 | 2022-07-14 14:18:50.849612 | 2022-07-14 14:10:26.434826 | 156 | 2022-07-14 14:23:53+00 | 2022-07-14 14:11:46+00 (2 rows)
]]>Explore strategies for unleashing real-time dbt, from materializing views to leveraging micro-batches and incrementally maintained views.
]]>Materialize provides a real-time feature store that updates dimensions with new data instantly & maintains speed & accuracy.
]]>Small data teams can't wait to build real-time data architectures. Find out why, and how they're approaching the problem.
]]>Understanding recursion in Materialize & its significance in differential dataflow for SQL updates.
]]>mcsherry=#
materialize=>
WITH MUTUALLY RECURSIVE
-- Ranges [lower, upper)
that can be produced by symbol
.
parses (lower int, upper int, symbol int) AS (
-- Base case: each literal is produced by some symbols.
SELECT pos, pos+1, lhs
FROM input, grammar_terms
WHERE input.lit = grammar_terms.lit
UNION
-- Recursive case: two adjacent parses that follow the grammar.
SELECT p1.lower, p2.upper, lhs
FROM parses p1, parses p2, grammar_nonts
WHERE p1.upper = p2.lower
AND p1.symbol = grammar_nonts.rhs1
AND p2.symbol = grammar_nonts.rhs2
)
SELECT * FROM parses;
Support for recursive SQL queries in Materialize is now available.
]]>In this post, we’ll explore the difficulties of cache invalidation, how Materialize and Redis address them, and when this solution is most effective.
]]>Combining Redpanda Serverless with Materialize makes developing streaming data apps easier than ever before.
]]>With Materialize, teams can lower the cost of their data warehouse bill and implement new use cases.
]]>Discover how Materialize empowers intelligent agents to collaborate in real-time, ensuring cost-effective and efficient orchestration for autonomous systems. Transform the future of AI-powered ecosystems with fresh, consistent, and actionable insights.
]]>Four Takeaways from AWS re:Invent 2024
]]>Materialize's Release 0.6 enhances cloud data warehousing with real-time streaming capabilities for immediate action on live data.
]]>Comprehensive insights & updates in Materialize's Release 0.7, enhancing real-time data warehouse capabilities.
]]>Comprehensive insights & updates on Materialize's Release 0.8, enhancing real-time data warehousing capabilities for immediate action.
]]>Materialize's Release 0.9 introduces an Operational Data Warehouse optimized for real-time data actions & cloud efficiency.
]]>Materialize 0.3, an Operational Data Warehouse with cloud & streaming capabilities, optimizes real-time data action.
]]>Materialize 0.4 introduces an Operational Data Warehouse with real-time streaming capabilities for immediate data action & analysis.
]]>Materialize 0.5 operational data warehouse offers real-time action on live data for efficient & immediate insights.
]]>See how Materialize supports operational work with responsiveness.
]]>A guide to creating a streaming database with Materialize, from using a streaming framework to developing a scalable platform.
]]>Comprehensive guide to implementing robust reductions in Materialize, ensuring efficient & real-time data processing.
]]>Time: 1741.500 ms (00:01.742) materialize=>
SELECT passenger_count, MAX(fare_amount) ..
SELECT passenger_count, COUNT(DISTINCT trip_distance) ..
]]>Harness the power of Rust for data-intensive tasks with Materialize, offering real-time insights & performance benefits.
]]>(&self, mut predicate: P) -> Stream<G, D> where P: FnMut(&D)->bool+'static { ... }
]]>Get early access to self-managed Materialize and run it within your private infrastructure. Meet governance, compliance needs, and deploy in any cloud with Materialize's real-time data transformation capabilities.
]]>A new way to run Materialize in the cloud for organizations with unique operational requirements. Join the Early Access program today!
]]>Comprehensive guide on slicing temporal aggregates with Materialize for real-time data analysis & actionable insights.
]]>If you're already familiar with stream processors you may wonder: When is it better to use Materialize vs a Stream Processor? And why?
]]>Real-time data streaming directly to your browser with Materialize's latest one-day project; understand the technical journey & outcomes.
]]>Real-time SQL query & view update subscriptions are made simple with Materialize's SUBSCRIBE feature.
]]>-- Windowed aggregation CREATE MATERIALIZED VIEW avg_last_minute_temperature AS SELECT DATE_TRUNC('second', to_timestamp(updated_at / 1000)) as ts_second, AVG(temperature) FROM temperatures WHERE (updated_at + 60000) > mz_logical_timestamp() GROUP BY ts_second;
-- Indexing view (Materializing) with a custom compaction CREATE INDEX avg_last_minute_temperature_idx ON avg_last_minute_temperature (ts_second) WITH (logical_compaction_window = '1minute');
]]>Experience real-time data analysis with Materialize on NYC taxi data, showcasing a practical application of streaming SQL.
]]>materialize=>
Time: 796.667 ms materialize=>
Time: 1741.500 ms (00:01.742) materialize=>
Time: 0.669 ms materialize=>
Time: 608.168 ms materialize=>
Time: 0.447 ms materialize=>
Time: 0.818 ms materialize=>
Time: 11.524 ms materialize=>
Time: 58.558 ms materialize=>
Time: 0.863 ms materialize=>
Time: 23.611 ms materialize=>
Time: 25.061 ms materialize=>
Time: 169.186 ms materialize=>
Time: 4.563 ms materialize=>
Time: 12.504 ms materialize=> SELECT * FROM aggregates; passenger_count | MIN | MAX -----------------+------+-------- | | 0 | -90 | 40502 1 | -800 | 907070 2 | -498 | 214748 3 | -498 | 349026 4 | -415 | 974 5 | -300 | 1271 6 | -100 | 433 7 | -70 | 140 8 | -89 | 129 9 | 0 | 110 96 | 6 | 6 192 | 6 | 6 (13 rows)
Time: 0.935 ms materialize=>
Time: 1.067 ms materialize=>
Time: 14.764 ms materialize=>
]]>Enhance your data workflows with Redpanda & Materialize for faster & more efficient streaming analytics. Get insights on integration & usage.
]]>SELECT * FROM hvu_test LIMIT 2;
]]>In this article, we will talk about one of the ways we approach the testing of the SQL engine of the product at Materialize. We hope to cover other modules and interesting angles in the future.
]]>Temporal filters give you a powerful SQL primitive for defining time-windowed computations over temporal data.
]]>content | insert_ts | delete_ts | mz_logical_timestamp ---------+-----------+-----------+---------------------- (0 rows)
content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- hello | 1627380752528 | 1627380752528 | 1627380754223 welcome | 1627380752530 | 1627380752530 | 1627380754223 goodbye | 1627380752533 | 1627380752533 | 1627380754223 (3 rows)
content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- hello | 1627380752528 | 1627380752528 | 1627380755920 welcome | 1627380752530 | 1627380752530 | 1627380755920 goodbye | 1627380752533 | 1627380752533 | 1627380755920 (3 rows)
content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- welcome | 1627380752530 | 1627380752530 | 1627380757989 goodbye | 1627380752533 | 1627380752533 | 1627380757989 (2 rows)
content | insert_ts | delete_ts | mz_logical_timestamp ---------+---------------+---------------+---------------------- goodbye | 1627380752533 | 1627380752533 | 1627380762667 (1 row)
content | insert_ts | delete_ts | mz_logical_timestamp ---------+-----------+-----------+---------------------- (0 rows)
]]>Materialize maintains an official Terraform Provider you can use to manage your clusters, replicas, connections and secrets as code.
]]>resource "materialize_cluster_replica" "cluster_replica_1" { name = "r1" cluster_name = materialize_cluster.cluster.name size = "medium" }
resource "materialize_cluster_replica" "cluster_replica_2" { name = "r2" cluster_name = materialize_cluster.cluster.name size = "medium" }
resource "aws_vpc_endpoint_service" "example" { acceptance_required = false allowed_principals = [data.aws_caller_identity.current.arn] gateway_load_balancer_arns = [aws_lb.example.arn] }
resource "materialize_connection_aws_privatelink" "example_privatelink_connection" { name = "example_privatelink_connection" service_name = aws_vpc_endpoint_service.example.service_name availability_zones = ["use1-az2", "use1-az6"] }
]]>Four questions, and their answers, to explain ACID transactions and how they are handled within Materialize.
]]>Comprehensive guide to implementing upserts in differential dataflow with Materialize for real-time data warehouse optimization & efficiency.
]]>for (key, mut list) in to_process.drain() {
// Maintains the prior value associated with the key.
let mut prev_value: Option<Tr::Val> = None;
// Attempt to find the key in the trace.
trace_cursor.seek_key(&trace_storage, &key);
if trace_cursor.get_key(&trace_storage) == Some(&key) {
// Determine the prior value associated with the key.
// There may be multiple historical values; we'll want the one
// that accumulates to a non-zero (ideally one) count.
while let Some(val) = trace_cursor.get_val(&trace_storage) {
let mut count = 0;
trace_cursor.map_times(&trace_storage, |_time, diff| count += *diff);
assert!(count == 0 || count == 1);
if count == 1 {
assert!(prev_value.is_none());
prev_value = Some(val.clone());
}
trace_cursor.step_val(&trace_storage);
}
trace_cursor.step_key(&trace_storage);
}
// Sort the list of upserts to `key` by their time, suppress multiple updates.
list.sort();
list.dedup_by(|(t1,_), (t2,_)| t1 == t2);
// Process distinct times; add updates into batch builder.
for (time, std::cmp::Reverse(next)) in list {
if prev_value != next {
if let Some(prev) = prev_value {
// A prior value exists, retract it!
builder.push((key.clone(), prev, time.clone(), -1));
}
if let Some(next) = next.as_ref() {
// A new value exists, introduce it!
builder.push((key.clone(), next.clone(), time.clone(), 1));
}
prev_value = next;
}
}
}
]]>Get complete visibility into your usage trends and billing history to manage your spend effectively
]]>If you are familiar with materialized views and indexes from other databases, this article will help you apply that understanding to Materialize.
]]>The key to Materialize's ability to separate compute from storage and scale horizontally without sacrificing consistency is a concept called virtual time.
]]>Integrate Materialize with VS Code for schema exploration, SQL validation & query execution, all within your IDE for efficient development.
]]>Data Warehouses are great for many things but often misused for operational workloads.
]]>Today Materialize customers can create webhook sources, making it much easier to pipe in events from a long tail of SaaS platforms, services, and tools.
]]>Discover the essentials of real-time analytics databases, their benefits, and how they compare to traditional databases for better operational decision-making.
]]>Learn how an operational data warehouse enables organizations to use their freshest data for day-to-day decision-making
]]>Data freshness is essential for real-time business use cases. Here's how an operational data warehouse powers your business processes with fresh data.
]]>Comprehensive updates in Materialize Vol. 2: AWS roles, PostgreSQL enhancements, Schema Registry SSL, & more for streamlined data management.
]]>Stay updated with Materialize: Kafka source metadata, protobuf & schema registry integration, time bucketing, Metabase, cloud metrics & monitoring enhancements.
]]>An explanation of our rationale for why Materialize chose not to use RocksDB as its underlying storage engine.
]]>Discover how to reduce database query costs with Materialized Views. This guide will walk you through the benefits, creation process, and impact on database efficiency.
]]>Materialize can respond faster than your primary database, with results that are at least as fresh as your primary would provide.
]]>