BigQuery Connector

Use the BigQuery Connector to load data from BigQuery into Tinybird so that you can quickly turn it into high-concurrency, low-latency API Endpoints. You can load full tables or the result of an SQL query.

The BigQuery Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you.

With the BigQuery Connector you can:

  • Connect to your BigQuery database with a handful of clicks. Select which tables to sync and set the schedule.
  • Use an SQL query to get the data you need from BigQuery and then run SQL queries on that data in Tinybird.
  • Use authentication tokens to control access to API endpoints. Implement access policies as you need, with support for row-level security.

Check the use case examples repository for examples of BigQuery Data Sources iteration using Git integration.

The BigQuery Connector can't access BigQuery external tables, like connected Google Sheets. If you need this functionality, reach out to [email protected].

Prerequisites

You can switch the Tinybird CLI to the correct Workspace using tb workspace use <workspace_name>.

To use version control, connect your Tinybird Workspace with your repository, and set the CI/CD configuration. For testing purposes, use a different connection than in the main branches or Workspaces.

For instance to create the connections in the main branch or Workspace using the CLI:

tb auth # Use the main Workspace admin Token
tb connection create bigquery
# Prompts are interactive and ask you to insert the necessary information

You can only create connections in the main Workspace. Even when creating the connection in the branch or as part of a Data Source creation flow, it's created in the main workspace and from there it's available for every branch.

Load a BigQuery table

Load a BigQuery table in the UI

Open the Tinybird UI and add a new Data Source by clicking Create new (+) next to the Data Sources section on the left hand side navigation bar (see Mark 1 below).

In the modal, select the BigQuery option from the list of Data Sources.

The next modal screen shows the Connection details. Follow the instructions and configure access to your BigQuery. Access the GCP IAM Dashboard by selecting the IAM & Admin link, and use the provided principal name from this modal.

In the GCP IAM Dashboard, click the Grant Access button (see Mark 1 below).

In the box that appears on the right-hand side, paste the principal name you just copied into the New principals box (see Mark 1 below). Next, in the Role box, find and select the role BigQuery Data Viewer (see Mark 2 below).

Click Save to complete.

The principal should now be listed in the View By Principals list (see Mark 1 below).

OK! Now return to the Tinybird UI. In the modal, click Next (see Mark 1 below).

Note: It can take a few seconds for the GCP permissions to apply.

The next screen allows you to browse the tables available in BigQuery, and select the table you wish to load. Start by selecting the project that the table belongs to (see Mark 1 below), then the dataset (see Mark 2 below) and finally the table (see Mark 3 below). Finish by clicking Next (see Mark 4 below).

Note: the maximum allowed table size is 50 million rows, the result will be truncated if it exceeds that limit.

You can now configure the schedule on which you wish to load data. You can configure a schedule in minutes, hours, or days by using the drop down selector, and set the value for the schedule in the text field (see Mark 1 below). The screenshot below shows a schedule of 10 minutes. Next, you can configure the Import Strategy. The strategy Replace data is selected by default (see Mark 2 below). Finish by clicking Next (see Mark 3 below).

Note: the maximum allowed frequency is 5 minutes.

The final screen of the modal shows you the interpreted schema of the table, which you can edit as needed. You can also modify what the Data Source in Tinybird will be called by changing the name at the top (see Mark 1 below). To finish, click Create Data Source (see Mark 2 below).

You are now on the Data Source data page, where you can view the data that has been loaded (see Mark 1 below) and a status chart showing executions of the loading schedule (see Mark 2 below).

Load a BigQuery table in the CLI

You need to create a connection before you can load a BigQuery table into Tinybird using the CLI. Creating a connection grants your Tinybird Workspace the appropriate permissions to view data from BigQuery.

Authenticate your CLI and switch to the desired Workspace. Then run:

tb connection create bigquery

The output of this command includes instructions to configure a GCP principal with read only access to your data in BigQuery.

The instructions include the URL to access the appropriate page in GCP's IAM Dashboard.

Copy the principal name shown in the output.

In the GCP IAM Dashboard, select the Grant Access button (see Mark 1 below).

In the box that appears on the right-hand side, paste the principal name you just copied into the New principals box (see Mark 1 below). Next, in the Role box, find and select the role BigQuery Data Viewer (see Mark 2 below).

Click Save to complete.

The principal should now be listed in the View By Principals list (see Mark 1 below).

Note: It can take a few seconds for the GCP permissions to apply.

Once done, select yes (y) to create the connection. A new bigquery.connection file is created in your project files.

Note: At the moment, the .connection file is not used and cannot be pushed to Tinybird. It is safe to delete this file. A future release will allow you to push this file to Tinybird to automate creation of connections, similar to Kafka connection.

Now that your connection is created, you can create a Data Source and configure the schedule to import data from BigQuery.

The BigQuery import is configured using the following options, which can be added at the end of your .datasource file:

  • IMPORT_SERVICE: name of the import service to use, in this case, bigquery
  • IMPORT_SCHEDULE: a cron expression (UTC) with the frequency to run imports, must be higher than 5 minutes, e.g. */5 * * * *
  • IMPORT_STRATEGY: the strategy to use when inserting data, either REPLACE or APPEND
  • IMPORT_EXTERNAL_DATASOURCE: (optional) the fully qualified name of the source table in BigQuery e.g. project.dataset.table
  • IMPORT_QUERY: (optional) the SELECT query to extract your data from BigQuery when you don't need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope: project.dataset.table

Both IMPORT_EXTERNAL_DATASOURCE and IMPORT_QUERY are optional, but you must provide one of them for the connector to work.

Note: For IMPORT_STRATEGY only REPLACE is supported today. The APPEND strategy will be enabled in a future release.

For example:

bigquery.datasource file
DESCRIPTION >
    bigquery demo data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `id` Integer `json:$.id`,
    `orderid` LowCardinality(String) `json:$.orderid`,
    `status` LowCardinality(String) `json:$.status`,
    `amount` Integer `json:$.amount`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

IMPORT_SERVICE bigquery
IMPORT_SCHEDULE */5 * * * *
IMPORT_EXTERNAL_DATASOURCE mydb.raw.events
IMPORT_STRATEGY REPLACE
IMPORT_QUERY >
    select
    timestamp,
    id,
    orderid,
    status,
    amount
        from
        mydb.raw.events

The columns you select in the IMPORT_QUERY must match the columns defined in the Data Source schema. For example, if your Data Source has the columns ColumnA, ColumnB then your IMPORT_QUERY must contain SELECT ColumnA, ColumnB FROM .... A mismatch of columns causes data to arrive in the quarantine Data Source.

With your connection created and Data Source defined, you can now push your project to Tinybird using:

tb push

The first run of the import will begin on the next lapse of the CRON expression.

Configure granular permissions

If you need to configure more granular permissions for BigQuery, you can always grant access at dataset or individual object level.

The first step is creating a new role in your IAM & Admin Console in GCP, and assigning the resourcemanager.projects.get permission.

The Connector needs this permission to list the available projects the generated Service Account has access to, so you can explore the BigQuery tables and views in the Tinybird UI.

After that, you can grant permissions to specific datasets to the Service Account by clicking on Sharing > Permissions:

Then ADD PRINCIPAL:

And finally paste the principal name copied earlier into the New principals box. Next, in the Role box, find and select the role BigQuery Data Viewer:

Now the Tinybird Connector UI only shows the specific resources you've granted permissions to.

Schema evolution

The BigQuery Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in BigQuery, the next sync job will automatically add it to the Tinybird Data Source.

Non-backwards compatible changes, such as dropping or renaming columns, are not supported and will cause the next sync to fail.

Iterate a BigQuery Data Source

To iterate a BigQuery Data Source, use the Tinybird CLI and the version control integration to handle your resources.

You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch.

Add a new BigQuery Data Source

You can add a new Data Source directly with the UI or the CLI tool, following the load of a BigQuery table section.

When adding a Data Source in a Tinybird Branch, it will work for testing purposes, but won't have any connection details internally. You must add the connection and BigQuery configuration in the .datasource Datafile when moving to production.

To add a new Data Source using the recommended version control workflow check the instructions in the examples repository.

Update a Data Source

  • BigQuery Data Sources can't be modified directly from UI
  • When you create a new Tinybird Branch, the existing BigQuery Data Sources won't be connected. You need to re-create them in the Branch.
  • In Branches, it's usually useful to work with fixtures, as they'll be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota consume from external services.

BigQuery Data Sources can be modified from the CLI tool:

tb auth
# modify the .datasource Datafile with your editor
tb push --force {datafile}
# check the command output for errors

To update it using the recommended version control workflow check the instructions in the examples repository.

Delete a Data Source

BigQuery Data Sources can be deleted directly from UI or CLI like any other Data Source.

To delete it using the recommended version control workflow check the instructions in the examples repository.

Logs

Job executions are logged in the datasources_ops_log Service Data Source. This log can be checked directly in the Data Source view page in the UI. Filter by datasource_id to monitor ingestion through the BigQuery Connector from the datasources_ops_log:

SELECT
  timestamp,
  event_type,
  result,
  error,
  job_id
FROM
  tinybird.datasources_ops_log
WHERE
  datasource_id = 't_1234'
AND
  event_type = 'replace'
ORDER BY timestamp DESC

Limits

See BigQuery Connector limits.

Was this page helpful?
Updated