Spanner provides lock statistics that enable you to identify the row key and
table column(s) that were the main sources of transaction lock conflicts in your
database during a particular time period. You can retrieve these statistics from
the SPANNER_SYS.LOCK_STATS*
system tables using SQL statements.
Availability
SPANNER_SYS
data is available only through SQL interfaces; for example:
-
A database's Spanner Studio page in the Google Cloud console
The
gcloud spanner databases execute-sql
commandThe Lock insights dashboard
The
executeQuery
API
Other single read methods that Spanner provides don't support
SPANNER_SYS
.
Lock statistics by row key
The following tables track the row key with the highest wait time:
SPANNER_SYS.LOCK_STATS_TOP_MINUTE
: Row keys with the highest lock wait times during 1 minute intervals.SPANNER_SYS.LOCK_STATS_TOP_10MINUTE
: Row keys with the highest lock wait times during 10 minute intervals.SPANNER_SYS.LOCK_STATS_TOP_HOUR
: Row keys with the highest lock wait times during 1 hour intervals
These tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour. After each interval, Spanner collects data from all servers and then makes the data available in the SPANNER_SYS tables shortly thereafter.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries are:
- 1 minute: 11:58:00–11:58:59 AM
- 10 minute: 11:40:00–11:49:59 AM
- 1 hour: 10:00:00–10:59:59 AM
Spanner groups the statistics by starting row key range.
Each row contains statistics for total lock wait time of a particular starting row key range that Spanner captures statistics for during the specified interval.
If Spanner is unable to store information about every row key range for lock waits during the interval, the system prioritizes row key range with the highest lock wait time during the specified interval.
All columns in the tables are nullable.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval in which the included lock conflicts occurred. |
ROW_RANGE_START_KEY |
BYTES(MAX) |
The row key where the lock conflict occurred. When the conflict
involves a range of rows, this value represents the starting
key of that range. A plus sign, + , signifies a range.
For more information,
see What's a row range start key.
|
LOCK_WAIT_SECONDS |
FLOAT64 |
The cumulative lock wait time of lock conflicts recorded for all the columns in the row key range, in seconds. |
SAMPLE_LOCK_REQUESTS |
ARRAY<STRUCT<
|
Each entry in this array corresponds to a sample lock request that
contributed to the lock conflict by either waiting for a lock or
blocking other transactions from taking the lock, on the given row
key (range). The maximum number of samples in this array is 20.
Each sample contains the following three fields:
|
Lock modes
Spanner operations acquire locks when the operations are part of a read-write transaction. Read-only transactions don't acquire locks. Spanner uses different lock modes to maximize the number of transactions that have access to a particular data cell at a given time. Different locks have different characteristics. For example, some locks can be shared among multiple transactions, while others can't.
A lock conflict can occur when you attempt to acquire one of the following lock modes in a transaction.
ReaderShared
Lock - A lock which allows other reads to still access the data until your transaction is ready to commit. This shared lock is acquired when a read-write transaction reads data.WriterShared
Lock - This lock is acquired when a read-write transaction tries to commit a write.Exclusive
Lock - an exclusive lock is acquired when a read-write transaction, which has already acquired a ReaderShared lock, tries to write data after the completion of read. An exclusive lock is an upgrade from aReaderShared
lock. An exclusive lock is a special case of a transaction holding both theReaderShared
lock and theWriterShared
lock at the same time. No other transaction can acquire any lock on the same cell.WriterSharedTimestamp
Lock - a special type ofWriterShared
lock which is acquired when inserting new rows into a table that has a commit timestamp as part of the primary key. This type of lock prevents transaction participants from creating the exact same row and, therefore, conflicting with each other. Spanner updates the key of the inserted row to match the commit timestamp of the transaction that performed the insert.
For more information on transaction types and the kinds of locks that are available, see Transactions.
Lock mode conflicts
The following table shows the possible conflicts between different lock modes.
Lock Modes | ReaderShared |
WriterShared |
Exclusive |
WriterSharedTimestamp |
---|---|---|---|---|
ReaderShared |
No | Yes | Yes | Yes |
WriterShared |
Yes | No | Yes | Not applicable |
Exclusive |
Yes | Yes | Yes | Not applicable |
WriterSharedTimestamp |
Yes | Not applicable | Not applicable | Yes |
WriterSharedTimestamp
locks are only used when inserting new rows with a
timestamp as part of its primary key. WriterShared
and Exclusive
locks are
used when writing to existing cells or inserting new rows without timestamps. As
a result, WriterSharedTimestamp
can't conflict with other types of locks, and
those scenarios are shown as Not applicable in the preceding table.
The only exception is ReaderShared
, which can be applied to non-existing rows
and, therefore, could potentially conflict with WriterSharedTimestamp
. For
example, a full table scan locks the whole table even for rows that haven't been
created, so it's possible for ReaderShared
to conflict with
WriterSharedTimestamp
.
What is a row range start key?
The ROW_RANGE_START_KEY
column identifies the composite primary key, or
starting primary key of a row range, that has lock conflicts. The following
schema is used to illustrate an example.
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
CREATE TABLE Songs (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
TrackId INT64 NOT NULL,
SongName STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId, TrackId),
INTERLEAVE IN PARENT Albums ON DELETE CASCADE;
CREATE TABLE Users (
UserId INT64 NOT NULL,
LastAccess TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true),
...
) PRIMARY KEY (UserId, LastAccess);
As the following table of row key and row key ranges shows, a range is represented with a plus, '+', sign in the key. The key in those cases represents the starting key of a key range in which a lock conflict occurred.
ROW_RANGE_START_KEY | Explanation |
---|---|
singers(2) | Singers table at key SingerId=2 |
albums(2,1) | Albums table at key SingerId=2,AlbumId=1 |
songs(2,1,5) | Songs table at key SingerId=2,AlbumId=1,TrackId=5 |
songs(2,1,5+) | Songs table key range starting at SingerId=2,AlbumId=1,TrackId=5 |
albums(2,1+) | Albums table key range starting at SingerId=2,AlbumId=1 |
users(3, 2020-11-01 12:34:56.426426+00:00) | Users table at key UserId=3, LastAccess=commit_timestamp |
Aggregate statistics
SPANNER_SYS
also contains tables to store aggregate data for lock statistics
captured by Spanner in a specific time period:
SPANNER_SYS.LOCK_STATS_TOTAL_MINUTE
: Aggregate statistics for all lock waits during 1 minute intervals.SPANNER_SYS.LOCK_STATS_TOTAL_10MINUTE
: Aggregate statistics for all lock waits during 10 minute intervals.SPANNER_SYS.LOCK_STATS_TOTAL_HOUR
: Aggregate statistics for all lock waits during 1 hour intervals.
Aggregate statistics tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries on aggregate lock statistics are:
- 1 minute: 11:58:00–11:58:59 AM
- 10 minute: 11:40:00–11:49:59 AM
- 1 hour: 10:00:00–10:59:59 AM
Each row contains statistics for all lock waits on the database during the specified interval, aggregated together. There is only one row per time interval.
The statistics captured in the
SPANNER_SYS.LOCK_STATS_TOTAL_*
tables include lock waits that Spanner did not capture in theSPANNER_SYS.LOCK_STATS_TOP_*
tables.Some columns in these tables are exposed as metrics in Cloud Monitoring. The exposed metrics are:
- Lock wait time
For more information, see Spanner metrics.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval in which the lock conflict occurred. |
TOTAL_LOCK_WAIT_SECONDS |
FLOAT64 |
Total lock wait time for lock conflicts recorded for the entire database, in seconds. |
Example queries
The following is an example of a SQL statement that you can use to retrieve lock statistics. You can run these SQL statements using the client libraries, the gcloud spanner, or the Google Cloud console.
List the lock statistics for the previous 1-minute interval
The following query returns the lock wait information for each row key with a lock conflict, including the fraction of total lock conflicts, during the most recent 1-minute time interval.
The CAST()
function converts the row_range_start_key BYTES
field to a STRING.
SELECT CAST(s.row_range_start_key AS STRING) AS row_range_start_key,
t.total_lock_wait_seconds,
s.lock_wait_seconds,
s.lock_wait_seconds/t.total_lock_wait_seconds frac_of_total,
s.sample_lock_requests
FROM spanner_sys.lock_stats_total_minute t, spanner_sys.lock_stats_top_minute s
WHERE t.interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.lock_stats_total_minute)
AND s.interval_end = t.interval_end
ORDER BY s.lock_wait_seconds DESC;
Query output
row_range_start_key | total_lock_wait_seconds | lock_wait_seconds | frac_of_total | sample_lock_requests |
---|---|---|---|---|
Songs(2,1,1) | 2.37 | 1.76 | 0.7426 | LOCK_MODE: ReaderShared COLUMN: Singers.SingerInfo LOCK_MODE: WriterShared COLUMN: Singers.SingerInfo |
Users(3, 2020-11-01 12:34:56.426426+00:00) | 2.37 | 0.61 | 0.2573 | LOCK_MODE: ReaderShared COLUMN: users._exists1 LOCK_MODE: WriterShared COLUMN: users._exists1 |
1 _exists
is an internal field that is used to check whether a
certain row exists or not.
Data retention
At a minimum, Spanner keeps data for each table for the following time periods:
SPANNER_SYS.LOCK_STATS_TOP_MINUTE
andSPANNER_SYS.LOCK_STATS_TOTAL_MINUTE
: Intervals covering the previous 6 hours.SPANNER_SYS.LOCK_STATS_TOP_10MINUTE
andSPANNER_SYS.LOCK_STATS_TOTAL_10MINUTE
: Intervals covering the previous 4 days.SPANNER_SYS.LOCK_STATS_TOP_HOUR
andSPANNER_SYS.LOCK_STATS_TOTAL_HOUR
: Intervals covering the previous 30 days.
Troubleshoot lock conflicts in your database using lock statistics
You can use SQL or the Lock insights dashboard to view lock conflicts in your database.
The following topics show how you can investigate such lock conflicts using SQL code.
Select a time period to investigate
You examine the Latency metrics for your Spanner database and discover a time period when your app is experiencing high latency and CPU usage. For example, the issue started occurring around 10:50 PM on November 12th, 2020.
Determine whether transaction commit latency increased along with the lock wait time during the selected period
Locks are acquired by transactions so, if lock conflicts cause long wait times, we should be able to see the increase in the transaction commit latency along with the increase in the lock wait time.
Having selected a time period to start our investigation, we'll join the
transaction statistics TXN_STATS_TOTAL_10MINUTE
with lock statistics LOCK_STATS_TOTAL_10MINUTE
around that time to help us
understand if the increase of the average commit latency is contributed to by
the increase of the lock waiting time.
SELECT t.interval_end, t.avg_commit_latency_seconds, l.total_lock_wait_seconds
FROM spanner_sys.txn_stats_total_10minute t
LEFT JOIN spanner_sys.lock_stats_total_10minute l
ON t.interval_end = l.interval_end
WHERE
t.interval_end >= "2020-11-12T21:50:00Z"
AND t.interval_end <= "2020-11-12T23:50:00Z"
ORDER BY interval_end;
Take the following data as an example of the results we get back from our query.
interval_end | avg_commit_latency_seconds | total_lock_wait_seconds |
---|---|---|
2020-11-12 21:40:00-07:00 | 0.002 | 0.090 |
2020-11-12 21:50:00-07:00 | 0.003 | 0.110 |
2020-11-12 22:00:00-07:00 | 0.002 | 0.100 |
2020-11-12 22:10:00-07:00 | 0.002 | 0.080 |
2020-11-12 22:20:00-07:00 | 0.030 | 0.240 |
2020-11-12 22:30:00-07:00 | 0.034 | 0.220 |
2020-11-12 22:40:00-07:00 | 0.034 | 0.218 |
2020-11-12 22:50:00-07:00 | 3.741 | 780.193 |
2020-11-12 23:00:00-07:00 | 0.042 | 0.240 |
2020-11-12 23:10:00-07:00 | 0.038 | 0.129 |
2020-11-12 23:20:00-07:00 | 0.021 | 0.128 |
2020-11-12 23:30:00-07:00 | 0.038 | 0.231 |
This preceding results show a dramatic increase in avg_commit_latency_seconds
and total_lock_wait_seconds
during the same time period from 2020-11-12
22:40:00 to 2020-11-12 22:50:00, and dropped after that. One thing to note
is that the avg_commit_latency_seconds
is the average time spent for only
the commit step. On the other hand, total_lock_wait_seconds
is the
aggregate lock time for the period, so the time looks much longer than the
transaction commit time.
Now that we've confirmed the lock wait time is closely related to the increase in write latency, we'll investigate in the next step which rows and columns cause the long wait.
Discover which row keys and columns had long lock wait times during the selected period
To find out which row keys and columns experienced the high lock wait times
during the period we are investigating, we query the LOCK_STAT_TOP_10MINUTE
table, which lists the row keys and columns that contribute the most to lock
wait.
The CAST()
function in the following query converts the
row_range_start_key BYTES field to a STRING.
SELECT CAST(s.row_range_start_key AS STRING) AS row_range_start_key,
t.total_lock_wait_seconds,
s.lock_wait_seconds,
s.lock_wait_seconds/t.total_lock_wait_seconds frac_of_total,
s.sample_lock_requests
FROM spanner_sys.lock_stats_total_10minute t, spanner_sys.lock_stats_top_10minute s
WHERE
t.interval_end = "2020-11-12T22:50:00Z" and s.interval_end = t.interval_end;
row_range_start_key | total_lock_wait_seconds | lock_wait_seconds | frac_of_total | sample_lock_requests |
---|---|---|---|---|
Singers(32) | 780.193 | 780.193 | 1 | LOCK_MODE: WriterShared COLUMN: Singers.SingerInfo LOCK_MODE: ReaderShared COLUMN: Singers.SingerInfo |
From this table of results, we can see the conflict happened on the Singers
table at key SingerId=32. The Singers.SingerInfo
is the column where the
lock conflict happened between ReaderShared
and WriterShared
.
This is a common type of the conflict when there's one transaction trying to read a certain cell and the other transaction is trying to write to the same cell. We now know the exact data cell for which the transactions are contending the lock, so in the next step we'll identify the transactions that are contending for the locks.
Find which transactions are accessing the columns involved in the lock conflict
To identify the transactions that are experiencing significant commit latency
within a specific time interval due to lock conflicts, you need to query for
the following columns from the
SPANNER_SYS.TXN_STATS_TOTAL_10MINUTE
table:
fprint
read_columns
write_constructive_columns
avg_commit_latency_seconds
You need to filter for locked columns identified from the
SPANNER_SYS.LOCK_STATS_TOP_10MINUTE
table:
Transactions that read any column that incurred a lock conflict when attempting to acquire the
ReaderShared
lock.Transactions that write to any column that incurred a lock conflict when attempting to acquire a
WriterShared
lock.
SELECT
fprint,
read_columns,
write_constructive_columns,
avg_commit_latency_seconds
FROM spanner_sys.txn_stats_top_10minute t2
WHERE (
EXISTS (
SELECT * FROM t2.read_columns columns WHERE columns IN (
SELECT DISTINCT(req.COLUMN)
FROM spanner_sys.lock_stats_top_10minute t, t.SAMPLE_LOCK_REQUESTS req
WHERE req.LOCK_MODE = "ReaderShared" AND t.interval_end ="2020-11-12T23:50:00Z"))
OR
EXISTS (
SELECT * FROM t2.write_constructive_columns columns WHERE columns IN (
SELECT DISTINCT(req.COLUMN)
FROM spanner_sys.lock_stats_top_10minute t, t.SAMPLE_LOCK_REQUESTS req
WHERE req.LOCK_MODE = "WriterShared" AND t.interval_end ="2020-11-12T23:50:00Z"))
)
AND t2.interval_end ="2020-11-12T23:50:00Z"
ORDER BY avg_commit_latency_seconds DESC;
The query result is sorted by the avg_commit_latency_seconds
column so that you see the
transaction experiencing the highest commit latency first.
fprint | read_columns | write_constructive_columns | avg_commit_latency_seconds |
---|---|---|---|
1866043996151916800 |
['Singers.SingerInfo', 'Singers.FirstName', 'Singers.LastName', 'Singers._exists'] |
['Singers.SingerInfo'] | 4.89 |
4168578515815911936 | [] | ['Singers.SingerInfo'] | 3.65 |
The query results show that two transactions tried to access the
Singers.SingerInfo
column, which is the column that had lock conflicts during the time period.
Once you identify the transactions causing the lock conflicts, you can analyze
the transactions using their fingerprint, fprint
, to identify potential issues
that contributed to the lock conflict.
After reviewing the transaction with fprint=1866043996151916800, you can use the
read_columns
and write_constructive_columns
columns to identify which part
of your application code triggered the transaction. You can then view the
underlying DML that's not filtering on the primary key, SingerId
. This caused
a full table scan and locked the table until the transaction was committed.
To resolve the lock conflict, you can do the following:
- Use a read-only transaction to identify the required
SingerId
values. - Use a separate read-write transaction to update the rows for the required
SingerId
values.
Apply best practices to reduce lock contention
In our example scenario, we were able to use lock statistics and transaction statistics to narrow down our problem to a transaction that wasn't using the primary key of our table when making updates. We came up with ideas to improve the transaction based on whether we knew the keys of the rows we wanted to update beforehand or not.
When looking at potential issues in your solution, or even when designing your solution, consider these best practices to reduce the number of lock conflicts in your database.
Use read-only transactions whenever possible, because they don't acquire any locks.
Avoid full table scans in a read-write transaction. This includes writing a DML conditional on the primary key or assigning a specific key range when using the Read API.
Keep the locking period short by committing the change as soon after you read the data as possible in a read-write transaction. A Read-write transaction guarantees that the data remains unchanged after you read the data until you successfully commit the change. To achieve this, the transaction requires locking the data cells during the read and during the commit. As a result, if you can keep the locking period short, transactions are less likely to have lock conflicts.
Favor small transactions over large transactions, or consider Partitioned DML for long running DML transactions. A long running transaction acquires a lock for a long time, so consider breaking a transaction which touches thousands of rows into multiple smaller transactions which update hundreds of rows whenever possible.
If you don't need the guarantee provided by a read-write transaction, avoid reading any data in the read-write transaction before committing the change, for example, by reading the data in a separate read-only transaction. Most lock conflicts occur due to the strong guarantee, to ensure data remain unchanged between the read and commit. So, if the read-write transaction doesn't read any data, it doesn't need to lock the cells for a long time.
Specify only the minimal set of columns required in a read-write transaction. As Spanner locks are per data cell, when a read-write transaction reads excessive columns, it acquires a
ReaderShared
lock on these cells. This might cause lock conflicts when other transactions acquire aWriterShared
lock on writes to the excessive columns. For example, consider specifying a set of columns instead of*
on read.Minimize API calls in a read-write transaction. The latency of API calls might lead to lock contention in Spanner, as API calls are subject to network delays as well as service-side delays. We recommend making API calls outside of read-write transactions whenever possible. If you must execute API calls inside a read-write transaction, make sure to monitor the latency of your API calls to minimize the impact on the lock acquisition period.
Follow schema design best practices.
What's next
- Learn about other Introspection tools.
- Learn about other information Spanner stores for each database in the databases information schema table.
- Learn more about SQL best practices for Spanner.