Migrate from Postgres

In this guide, you'll learn how to migrate events from Postgres to Tinybird so that you can begin building performant, real-time analytics over your event data.

Need to create a Tinybird account? It's free! Start here.

Prerequisites

You'll need a free Tinybird account and a Workspace.

At a high level

Postgres is an incredible general purpose database, and it can even be extended to support columnar functionality for analytics.

That said, Tinybird can be a great alternative to Postgres extensions for a few reasons:

  • It uses ClickHouse® as its underlying database, which is one of the fastest real-time analytics databases in the world.
  • It provides additional services on top of the database - like an integrated API backend, ingestion load balancing, and native connectors - that will keep you from having to spin up additional services and infrastructure for your analytics service.

Tinybird is a data platform for data and engineering teams to solve complex real-time, operational, and user-facing analytics use cases at any scale, with end-to-end latency in milliseconds for streaming ingest and high QPS workloads.

It's a SQL-first analytics engine, purpose-built for the cloud, with real-time data ingest and full JOIN support. Native, managed ingest connectors make it easy to ingest data from a variety of sources. SQL queries can be published as production-grade, scalable REST APIs for public use or secured with JWTs.

Tinybird is a managed platform that scales transparently, requiring no cluster operations, shard management, or worrying about replicas.

See how Tinybird is used by industry-leading companies today in the Customer Stories hub.

Follow these steps to migrate from Postgres to Tinybird

Below you'll find an example walkthrough migrating 100M rows of events data from Postgres to Tinybird. You can apply the same workflow to your existing Postgres instance. If at any point you get stuck and would like assistance with your migration, contact Tinybird at [email protected] or in the Slack Community.

The Postgres table

Suppose you have a table in Postgres that looks like this:

postgres=# CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ NOT NULL,
    user_id TEXT NOT NULL,
    session_id TEXT NOT NULL,
    action TEXT NOT NULL,
    version TEXT NOT NULL,
    payload TEXT NOT NULL
);

The table contains 100 million rows totalling about 15GB of data:

postgres=# SELECT pg_size_pretty(pg_relation_size('events')) AS size;
 size
-------
 15 GB
(1 row)

The table stores website click events, including an unstructured JSON payload column.

Setup

Within your Postgres, create a user with read only permissions over the table (or tables) you need to export:

postgres=# CREATE USER tb_read_user WITH PASSWORD '<password>';
postgres=# GRANT CONNECT ON DATABASE test_db TO tb_read_user;
postgres=# GRANT USAGE ON SCHEMA public TO tb_read_user;
postgres=# GRANT SELECT ON TABLE events TO tb_read_user;

Limits

To perform this migration, we'll be running a series of Copy Jobs to incrementally migrate the events from Postgres to Tinybird. We break it up into chunks so as to remain under the limits of both Tinybird and Postgres.

There are two limits to take into account:

  1. Copy Pipe limits: Copy Pipes have a default max execution time of 20s for Build plans, 30s for Pro plans, 30m for Enterprise plans. If you're on a Free or Pro plan and need to temporarily extend your limits to perform the migration, please reach out to us at [email protected].
  2. The max execution time of queries in Postgres. This is controlled by the statement_timeout setting. We recommendation that you set the value in Postgres equal or similar to the max execution time of the Copy Pipe in Tinybird. For this example, we'll use three minutes:
postgres=# ALTER ROLE tb_read_user SET statement_timeout = '180000';  -- 3 minutes

Create a local Tinybird project

Install Tinybird CLI, then create a new Data Project:

export TB_ADMIN_TOKEN=<YOUR_WORKSPACE_ADMIN_TOKEN>
export TB_HOST=https://api.us-east.aws.tinybird.co #replace with your host

tb auth --host $TB_HOST --token $TB_ADMIN_TOKEN
tb init

Create the target Data Source in Tinybird:

touch datasources/events.datasource

Define a schema that matches your Postgres schema, keeping in mind that Tinybird may use different data types. For our example:

# datasources/events.datasource
SCHEMA >
    `id` Int32,
    `timestamp` DateTime64(6),
    `user_id` String,
    `session_id` String,
    `action` String,
    `version` String,
    `payload` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYear(timestamp)"
ENGINE_SORTING_KEY "timestamp, session_id, user_id"

Push the Data Source to the Tinybird server:

tb push datasources/events.datasource

Backfilling your existing Postgres data

We're going to create a parameterized Copy Pipe to perform the initial backfill in chunks. We'll use a script to run the Copy Job on demand.

Storing secrets in Tinybird

Start by adding two secrets to Tinybird using the Environment Variables API. This will prevent hard-coded credentials in your Copy Pipe.

Create one for your Postgres username:

curl \
  -X POST "${TB_HOST}/v0/variables" \
  -H "Authorization: Bearer ${TB_ADMIN_TOKEN}" \
  -d "type=secret" \
  -d "name=tb_read_user" \
  -d "value=tb_read_user"

And one for the password:

curl \
  -X POST "${TB_HOST}/v0/variables" \
  -H "Authorization: Bearer ${TB_ADMIN_TOKEN}" \
  -d "type=secret" \
  -d "name=tb_read_password" \
  -d "value=<password>"

Define the Copy Pipe

Create a new Pipe:

touch pipes/backfill_postgres.pipe

And paste the following code, changing the url/port, name, and table name of your Postgres based on your specific setup:

NODE migrate
SQL >
    %
    SELECT *
    FROM postgresql(
        'https://your.postgres.url::port',
        'your_postgres_instance_name',
        'your_postgres_table name',
        {{tb_secret('tb_read_user')}},
        {{tb_secret('tb_read_password')}},
        'public'
    )
    WHERE
        timestamp > {{DateTime(from_date, '2020-01-01 00:00:00')}} --adjust based on your data
        AND timestamp <= {{DateTime(to_date, '2020-01-01 00:00:01')}} --use a small default range

TYPE COPY
TARGET_DATASOURCE events

This uses the PostgreSQL Table Function to select data from the remote Postgres table. It pushes the timestamp filters down to Postgres, incrementally querying your Postgres table and copying them into your events Data Source in Tinybird.

Push this Pipe to the server:

tb push pipes/backfill_postgres.pipe

Backfill in one go

Depending on the size of your Postgres table, you may be able to perform the migration in a single Copy Job.

For example, get the minimum timestamp from Postgres (and the current datetime):

postgres=# SELECT min(timestamp) FROM events;
          min
------------------------
 2023-01-01 00:00:00+00
(1 row)

❯ date -u +"%Y-%m-%d %H:%M:%S"
2024-08-29 10:20:57

And run the Copy Job with those parameters:

tb pipe copy run migrate_pg_to_events --param from_date="2023-01-01 00:00:00" --param to_date="2024-08-29 10:20:57" --wait --yes

If it succeeds, you'll see something like this:

** Running migrate_pg_to_events
** Copy to 'events' job created: https://api.us-east.aws.tinybird.co/v0/jobs/4dd482f9-168b-44f7-a4c9-d1b64fc9665d
** Copying data   [████████████████████████████████████]  100%
** Data copied to 'events'

And you'll be able to query the resulting Data Source:

tb sql "select count() from events"
-------------
|   count() |
-------------
| 100000000 |
-------------

tb sql "select count() as c, action from events group by action order by c asc" --stats
** Query took 0.228730096 seconds
** Rows read: 100,000,000
** Bytes read: 1.48 GB
-----------------------
|      c   | action   |
-----------------------
| 19996881 | logout   |
| 19997421 | signup   |
| 20000982 | purchase |
| 20001649 | view     |
| 20003067 | click    |
-----------------------

Note that Copy operations in Tinybird are atomic, so a bulk backfill will either succeed or fail completely with some error.

For instance, if the statement_timeout in Postgres is not large enough to export the table with a single query, you'll get an error like this:

** Copy to 'copy_migrate_events_from_pg' job created: https://api.us-east.aws.tinybird.co/v0/jobs/ec58749a-f4c3-4302-9236-f8036f0cb67b
** Copying data
Error:
** Failed creating copy job:
** Error while running job: There was a problem while copying data: [Error] Query cancelled due to statement timeout in postgres. Make sure you use a user with a proper statement timeout to run this type of query.

In this case you can try to increaste the statement_timeout or try the backfilling in chunks. As a reference, copying 100M rows from Postgres to Tinybird takes about 150s if Postgres and Tinybird are in the same cloud and region.

The Tinybird PostgreSQL Table Function uses internally a PostgreSQL COPY TO statement. You can tweak some other settings in Postgres if necessary, but usually it's not needed, so refer to your Postgres provider or admin.

Backfilling in chunks

If you find that you're hitting the limits of either your Postgres or Tinybird's Copy Pipes, you can backfill in chunks.

First of all, make sure your Postgres table is indexed by the column you are filtering on, in this case timestamp:

postgres=# CREATE INDEX idx_events_timestamp ON events (timestamp);
postgres=# VACUUM ANALYZE events;

And make sure a query like the one sent from Tinybird will use the indexes (see the Index Scan below):

postgres=# explain select * from events where timestamp > '2024-01-01 00:00:00' and timestamp <= '2024-01-02 00:00.00';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_events_timestamp on events  (cost=0.57..607150.89 rows=151690 width=115)
   Index Cond: (("timestamp" > '2024-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2024-01-02 00:00:00+00'::timestamp with time zone))
 JIT:
   Functions: 2
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(5 rows)

Then run multiple Copy jobs, adjusting the amount of data copied to stay within your Postgres statement timeout and Tinybird max execution time. This is a trial and error process depending on the granularity of data.

For example, here's a migration script that first tries a full backfill, and if it fails uses daily chunks:

#!/bin/bash


HOST="YOUR_TB_HOST"
TOKEN="YOUR_TB_TOKEN"
PIPE_NAME="backfill_postgres"
FROM_DATE="2023-01-01 00:00:00"
TO_DATE="2024-08-31 00:00:00"
LOG_FILE="pipe_copy.log"

run_command() {
    local from_date="$1"
    local to_date="$2"
    echo "Copying from $from_date to $to_date" | tee -a $LOG_FILE

    if output=$(tb --host $HOST --token $TOKEN pipe copy run $PIPE_NAME --param from_date="$from_date" --param to_date="$to_date" --wait --yes 2>&1); then
        echo "Success $from_date - $to_date" | tee -a $LOG_FILE
        return 0
    else
        echo "Error $from_date - $to_date" | tee -a $LOG_FILE
        echo "Error detail: $output" | tee -a $LOG_FILE
        return 1
    fi
}

iterate_chunks() {
    local from_date="$1"
    local to_date="$2"

    local current_from="$from_date"
    local next_to=""

    while [[ "$(date -d "$current_from" +"%s")" -lt "$(date -d "$to_date" +"%s")" ]]; do
        # End of current day (23:59:59)
        next_to=$(date -d "$current_from +1 day -1 second" +"%Y-%m-%d")" 23:59:59"
        # Adjust next_to if it's bigger than to_date
        if [[ "$(date -d "$next_to" +"%s")" -ge "$(date -d "$to_date" +"%s")" ]]; then
            next_to="$to_date"
        fi

        # Create copy job for one single day
        if ! run_command "$current_from" "$next_to"; then
            echo "Error processing $current_from to $next_to"
            return 1
        fi

        # Go to next day (starting at 00:00:00)
        current_from=$(date -d "$(date -d "$current_from" +'%Y-%m-%d') +1 day $(date -d "$current_from" +'%H:%M:%S')" +'%Y-%m-%d %H:%M:%S')
    done
}

# Step 1: Try full backfill
echo "Running full backfill..." | tee -a $LOG_FILE

if ! run_command "$FROM_DATE" "$TO_DATE"; then
    echo "Full backfill failed, iterating in daily chunks..." | tee -a $LOG_FILE
    iterate_chunks "$FROM_DATE" "$TO_DATE"
fi

echo "Process completed." | tee -a $LOG_FILE

Using either a full backfill or backfilling in chunks, you can successfully migrate your data from Postgres to Tinybird.

Syncing new events from Postgres to Tinybird

The next step is keeping your Tinybird Data Source in sync with events in your Postgres as new events arrive.

The steps below will show you how to use Tinybird's PostgreSQL Table Function and scheduled Copy Jobs to continually sync data from Postgres to Tinybird, however, you should consider sending future events Tinybird directly using either the Events API or another streaming Data Source connector, as this will be more resource efficient (and more real-time).

Create the incremental Copy Pipe

Create another Copy Pipe to perform the incremental syncs:

touch pipes/sync_events_from_pg.pipe

Paste in this code, again updating your Postgres details as well as the desired schedule to sync. Note the Copy limits apply here.

NODE sync_from_pg
SQL >
    %
    SELECT *
    FROM postgresql(
        'https://your.postgres.url::port',
        'your_postgres_instance_name',
        'your_postgres_table name',
        {{tb_secret('tb_read_user')}},
        {{tb_secret('tb_read_password')}},
        'public'
    )
    WHERE timestamp > (SELECT max(timestamp) FROM events)

TYPE COPY
TARGET_DATASOURCE events
COPY_SCHEDULE */5 * * * *

Push this to the Tinybird server:

tb push pipes/sync_events_from_pg.pipe

It's important to first complete the backfill operation before pushing the sync Pipe. The sync Pipe uses the latest timestamp in the Tinybird copy to perform a filtered select from Postgres. Failure to backfill will result in a full scan of your Postgres table on your configured schedule.

Once you've pushed this Pipe, Tinybird will sync with your Postgres updates based on the schedule you set.

Next steps

If you'd like assistance with your migration, contact Tinybird at [email protected] or in the Community Slack.

  • Set up a free Tinybird account and build a working prototype: Sign up here.
  • Run through a quick example with your free account: Tinybird quick start.
  • Read the billing docs to understand plans and pricing on Tinybird.
Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.
Was this page helpful?
Updated