Skip to content

Latest commit

 

History

History

googlebigtable2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Google Bigtable Driver for YCSB

This driver provides a YCSB workload binding for Google's hosted Bigtable, the inspiration for a number of key-value stores like HBase and Cassandra. The Bigtable Java client provides both an idiomatic java and HBase client APIs. This binding implements the idiomatic java API for testing the native client. To test Bigtable using the HBase API, see the hbase1 binding. Please note, that this driver replaces googlebigtable driver which used a deprecated API.

Quickstart

1. Setup a Bigtable Instance

Login to the Google Cloud Console and follow the Creating Instance steps. Make a note of your instance ID and project ID.

2. Launch the Bigtable Shell

From the Cloud Console, launch a shell and follow the Quickstart up to step 4 where you install .

3. Create a Table

For best results, use the pre-splitting strategy recommended in HBASE-4163:

PROJECT=<PROJECT_ID>
INSTANCE=<INSTANCE>
FAMILY=cf
SPLITS=$(echo 'num_splits = 200; puts (1..num_splits).map {|i| "user#{1000+i*(9999-1000)/num_splits}"}.join(",")' | ruby)
cbt -project $PROJECT -instance=$INSTANCE createtable usertable families=$FAMILY:maxversions=1 splits=$SPLITS

Make a note of the column family, in this example it's `cf``.

4. Download JSON Credentials

Follow these instructions for Generating a JSON key and save it to your host.

5. Load a Workload

Switch to the root of the YCSB repo and choose the workload you want to run and load it first. With the CLI you must provide the column family and instance properties to load.

GOOGLE_APPLICATION_CREDENTIALS=<PATH_TO_JSON_KEY> \
  ./bin/ycsb load googlebigtable2 \
  -p googlebigtable2.project=$PROJECT -p googlebigtable2.instance=$INSTANCE -p googlebigtable2.family=cf \
  -P workloads/workloada

Make sure to replace the variables in the angle brackets above with the proper value from your instance. Additional configuration parameters are available below.

The load step only executes inserts into the datastore. After loading data, run the same workload to mix reads with writes.

GOOGLE_APPLICATION_CREDENTIALS=<PATH_TO_JSON_KEY> \
  bin/ycsb run googlebigtable2 \
  -p googlebigtable2.project=$PROJECT -p googlebigtable2.instance=$INSTANCE -p googlebigtable2.family=cf \
  -P workloads/workloada

Configuration Options

The following options can be configured using CLI (using the -p parameter).

  • googlebigtable2.project: (Required) The ID of a Bigtable project.
  • googlebigtable2.instance: (Required) The name of a Bigtable instance.
  • googlebigtable2.app-profile: (Optional) The app profile to use.
  • googlebigtable2.family: (Required) The Bigtable column family to target.
  • debug: If true, prints debug information to standard out. The default is false.
  • googlebigtable2.use-batching: (Optional) Whether or not to use client side buffering and batching of write operations. This can significantly improve performance and defaults to true.
  • googlebigtable2..max-outstanding-bytes: (Optional) When batching is enabled, override the limit of number of outstanding mutation bytes.
  • googlebigtable2.reverse-scans: (Optional) When enabled, scan start keys will be treated as end keys
  • googlebigtable2.timestamp: (Optional) When set, the timestamp will be used for all mutations, avoiding unbounded growth of cell versions.

Bigtable client version

As of this writing, Cloud Bigtable releases a new version of the client every 2 weeks. Newer client versions will have performance optimizations not present in the version referenced by YCSB. However when invoking ycsb using maven >= 3.9.0, the end user can override the Cloud Bigtable client version via the MAVEN_ARGS environment variable. Please note, currently, the oldest version of the client that this driver supports is 2.37.0 (released 2024/03/27).

Here are a couple of examples:

# Force version 2.47.0 of Cloud Bigtable client (released 2024/11/13
GOOGLE_APPLICATION_CREDENTIALS=<PATH_TO_JSON_KEY> \
MAVEN_ARGS="-Dgooglebigtable2.version=2.47.0" \
  bin/ycsb run googlebigtable2 \
  -p googlebigtable2.project=$PROJECT -p googlebigtable2.instance=$INSTANCE -p googlebigtable2.family=cf \
  -P workloads/workloada

# Use the latest version of Cloud Bigtable client (released 2024/11/13
GOOGLE_APPLICATION_CREDENTIALS=<PATH_TO_JSON_KEY> \
MAVEN_ARGS="-Dgooglebigtable2.version=RELEASE" \
  bin/ycsb run googlebigtable2 \
  -p googlebigtable2.project=$PROJECT -p googlebigtable2.instance=$INSTANCE -p googlebigtable2.family=cf \
  -P workloads/workloada