Skip to content

Latest commit

 

History

History

benchmark

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

BigQuery Benchmark

This directory contains benchmark scripts for BigQuery client. It is created primarily for project maintainers to measure library performance.

Usage

python benchmark.py

Flags

Run python benchmark.py -h for detailed information on available flags.

--reruns can be used to override the default number of times a query is rerun. Must be a positive integer. Default value is 3.

--projectid can be used to run benchmarks in a different project. If unset, the GOOGLE_CLOUD_PROJECT environment variable is used.

--queryfile can be used to override the default file which contains queries to be instrumented.

--table can be used to specify a table to which benchmarking results should be streamed. The format for this string is in BigQuery standard SQL notation without escapes, e.g. projectid.datasetid.tableid

--create_table can be used to have the benchmarking tool create the destination table prior to streaming.

--tag allows arbitrary key:value pairs to be set. This flag can be specified multiple times.

When --create_table flag is set, must also specify the name of the new table using --table.

Example invocations

Setting all the flags

python benchmark.py \
  --reruns 5 \
  --projectid test_project_id \
  --table logging_project_id.querybenchmarks.measurements \
  --create_table \
  --tag source:myhostname \
  --tag somekeywithnovalue \
  --tag experiment:special_environment_thing

Or, a more realistic invocation using shell substitions:

python benchmark.py \
  --reruns 5 \
  --table $BENCHMARK_TABLE \
  --tag origin:$(hostname) \
  --tag branch:$(git branch --show-current) \
  --tag latestcommit:$(git log --pretty=format:'%H' -n 1)

Stream Results To A BigQuery Table

When streaming benchmarking results to a BigQuery table, the table schema is as follows:

[
  {
    "name": "groupname",
    "type": "STRING"
  },
  {
    "name": "name",
    "type": "STRING"
  },
  {
    "name": "tags",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "key",
        "type": "STRING"
      },
      {
        "name": "value",
        "type": "STRING"
      }
    ]
  },
  {
    "name": "SQL",
    "type": "STRING"
  },
  {
    "name": "runs",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "errorstring",
        "type": "STRING"
      },
      {
        "name": "start_time",
        "type": "TIMESTAMP"
      },
      {
        "name": "query_end_time",
        "type": "TIMESTAMP"
      },
      {
        "name": "first_row_returned_time",
        "type": "TIMESTAMP"
      },
      {
        "name": "all_rows_returned_time",
        "type": "TIMESTAMP"
      },
      {
        "name": "total_rows",
        "type": "INTEGER"
      }
    ]
  },
  {
    "name": "event_time",
    "type": "TIMESTAMP"
  }
]

The table schema is the same as the benchmark in go, so results from both languages can be streamed to the same table.

BigQuery Benchmarks In Other Languages