BigQuery Connector¶
Use the BigQuery Connector to load data from BigQuery into Tinybird so that you can quickly turn it into high-concurrency, low-latency API Endpoints. You can load full tables or the result of an SQL query.
The BigQuery Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you.
With the BigQuery Connector you can:
- Connect to your BigQuery database with a handful of clicks. Select which tables to sync and set the schedule.
- Use an SQL query to get the data you need from BigQuery and then run SQL queries on that data in Tinybird.
- Use authentication tokens to control access to API endpoints. Implement access policies as you need, with support for row-level security.
Check the use case examples repository for examples of BigQuery Data Sources iteration using Git integration.
The BigQuery Connector can't access BigQuery external tables, like connected Google Sheets. If you need this functionality, reach out to [email protected].
Prerequisites¶
- Tinybird CLI. See the Tinybird CLI quick start.
- Tinybird CLI authenticated with the desired Workspace.
You can switch the Tinybird CLI to the correct Workspace using tb workspace use <workspace_name>
.
To use version control, connect your Tinybird Workspace with your repository, and set the CI/CD configuration. For testing purposes, use a different connection than in the main branches or Workspaces.
For instance to create the connections in the main branch or Workspace using the CLI:
tb auth # Use the main Workspace admin Token tb connection create bigquery # Prompts are interactive and ask you to insert the necessary information
You can only create connections in the main Workspace. Even when creating the connection in the branch or as part of a Data Source creation flow, it's created in the main workspace and from there it's available for every branch.
Load a BigQuery table¶
Load a BigQuery table in the UI¶
Open the Tinybird UI and add a new Data Source by clicking Create new (+) next to the Data Sources section on the left hand side navigation bar (see Mark 1 below).
In the modal, select the BigQuery option from the list of Data Sources.
The next modal screen shows the Connection details. Follow the instructions and configure access to your BigQuery. Access the GCP IAM Dashboard by selecting the IAM & Admin link, and use the provided principal name from this modal.
In the GCP IAM Dashboard, click the Grant Access button (see Mark 1 below).
In the box that appears on the right-hand side, paste the principal name you just copied into the New principals box (see Mark 1 below). Next, in the Role box, find and select the role BigQuery Data Viewer (see Mark 2 below).
Click Save to complete.
The principal should now be listed in the View By Principals list (see Mark 1 below).
OK! Now return to the Tinybird UI. In the modal, click Next (see Mark 1 below).
Note: It can take a few seconds for the GCP permissions to apply.
The next screen allows you to browse the tables available in BigQuery, and select the table you wish to load. Start by selecting the project that the table belongs to (see Mark 1 below), then the dataset (see Mark 2 below) and finally the table (see Mark 3 below). Finish by clicking Next (see Mark 4 below).
Note: the maximum allowed table size is 50 million rows, the result will be truncated if it exceeds that limit.
You can now configure the schedule on which you wish to load data. You can configure a schedule in minutes, hours, or days by using the drop down selector, and set the value for the schedule in the text field (see Mark 1 below). The screenshot below shows a schedule of 10 minutes. Next, you can configure the Import Strategy. The strategy Replace data is selected by default (see Mark 2 below). Finish by clicking Next (see Mark 3 below).
Note: the maximum allowed frequency is 5 minutes.
The final screen of the modal shows you the interpreted schema of the table, which you can edit as needed. You can also modify what the Data Source in Tinybird will be called by changing the name at the top (see Mark 1 below). To finish, click Create Data Source (see Mark 2 below).
You are now on the Data Source data page, where you can view the data that has been loaded (see Mark 1 below) and a status chart showing executions of the loading schedule (see Mark 2 below).
Load a BigQuery table in the CLI¶
You need to create a connection before you can load a BigQuery table into Tinybird using the CLI. Creating a connection grants your Tinybird Workspace the appropriate permissions to view data from BigQuery.
Authenticate your CLI and switch to the desired Workspace. Then run:
tb connection create bigquery
The output of this command includes instructions to configure a GCP principal with read only access to your data in BigQuery.
The instructions include the URL to access the appropriate page in GCP's IAM Dashboard.
Copy the principal name shown in the output.
In the GCP IAM Dashboard, select the Grant Access button (see Mark 1 below).
In the box that appears on the right-hand side, paste the principal name you just copied into the New principals box (see Mark 1 below). Next, in the Role box, find and select the role BigQuery Data Viewer (see Mark 2 below).
Click Save to complete.
The principal should now be listed in the View By Principals list (see Mark 1 below).
Note: It can take a few seconds for the GCP permissions to apply.
Once done, select yes (y) to create the connection. A new bigquery.connection
file is created in your project files.
Note: At the moment, the .connection
file is not used and cannot be pushed to Tinybird. It is safe to delete this file. A future release will allow you to push this file to Tinybird to automate creation of connections, similar to Kafka connection.
Now that your connection is created, you can create a Data Source and configure the schedule to import data from BigQuery.
The BigQuery import is configured using the following options, which can be added at the end of your .datasource file:
IMPORT_SERVICE
: name of the import service to use, in this case,bigquery
IMPORT_SCHEDULE
: a cron expression (UTC) with the frequency to run imports, must be higher than 5 minutes, e.g.*/5 * * * *
IMPORT_STRATEGY
: the strategy to use when inserting data, eitherREPLACE
orAPPEND
IMPORT_EXTERNAL_DATASOURCE
: (optional) the fully qualified name of the source table in BigQuery e.g.project.dataset.table
IMPORT_QUERY
: (optional) the SELECT query to extract your data from BigQuery when you don't need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope:project.dataset.table
Both IMPORT_EXTERNAL_DATASOURCE
and IMPORT_QUERY
are optional, but you must provide one of them for the connector to work.
Note: For IMPORT_STRATEGY
only REPLACE
is supported today. The APPEND
strategy will be enabled in a future release.
For example:
bigquery.datasource file
DESCRIPTION > bigquery demo data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `id` Integer `json:$.id`, `orderid` LowCardinality(String) `json:$.orderid`, `status` LowCardinality(String) `json:$.status`, `amount` Integer `json:$.amount` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE bigquery IMPORT_SCHEDULE */5 * * * * IMPORT_EXTERNAL_DATASOURCE mydb.raw.events IMPORT_STRATEGY REPLACE IMPORT_QUERY > select timestamp, id, orderid, status, amount from mydb.raw.events
The columns you select in the IMPORT_QUERY
must match the columns defined in the Data Source schema. For example, if your Data Source has the columns ColumnA, ColumnB
then your IMPORT_QUERY
must contain SELECT ColumnA, ColumnB FROM ...
. A mismatch of columns causes data to arrive in the quarantine Data Source.
With your connection created and Data Source defined, you can now push your project to Tinybird using:
tb push
The first run of the import will begin on the next lapse of the CRON expression.
Configure granular permissions¶
If you need to configure more granular permissions for BigQuery, you can always grant access at dataset or individual object level.
The first step is creating a new role in your IAM & Admin Console in GCP, and assigning the resourcemanager.projects.get
permission.
The Connector needs this permission to list the available projects the generated Service Account has access to, so you can explore the BigQuery tables and views in the Tinybird UI.
After that, you can grant permissions to specific datasets to the Service Account by clicking on Sharing > Permissions:
Then ADD PRINCIPAL:
And finally paste the principal name copied earlier into the New principals box. Next, in the Role box, find and select the role BigQuery Data Viewer:
Now the Tinybird Connector UI only shows the specific resources you've granted permissions to.
Schema evolution¶
The BigQuery Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in BigQuery, the next sync job will automatically add it to the Tinybird Data Source.
Non-backwards compatible changes, such as dropping or renaming columns, are not supported and will cause the next sync to fail.
Iterate a BigQuery Data Source¶
To iterate a BigQuery Data Source, use the Tinybird CLI and the version control integration to handle your resources.
You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch.
Add a new BigQuery Data Source¶
You can add a new Data Source directly with the UI or the CLI tool, following the load of a BigQuery table section.
When adding a Data Source in a Tinybird Branch, it will work for testing purposes, but won't have any connection details internally. You must add the connection and BigQuery configuration in the .datasource Datafile when moving to production.
To add a new Data Source using the recommended version control workflow check the instructions in the examples repository.
Update a Data Source¶
- BigQuery Data Sources can't be modified directly from UI
- When you create a new Tinybird Branch, the existing BigQuery Data Sources won't be connected. You need to re-create them in the Branch.
- In Branches, it's usually useful to work with fixtures, as they'll be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota consume from external services.
BigQuery Data Sources can be modified from the CLI tool:
tb auth # modify the .datasource Datafile with your editor tb push --force {datafile} # check the command output for errors
To update it using the recommended version control workflow check the instructions in the examples repository.
Delete a Data Source¶
BigQuery Data Sources can be deleted directly from UI or CLI like any other Data Source.
To delete it using the recommended version control workflow check the instructions in the examples repository.
Logs¶
Job executions are logged in the datasources_ops_log
Service Data Source. This log can be checked directly in the Data Source view page in the UI. Filter by datasource_id
to monitor ingestion through the BigQuery Connector from the datasources_ops_log
:
SELECT timestamp, event_type, result, error, job_id FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'replace' ORDER BY timestamp DESC