Software Transactional Memory Library

General information
Configurations
XL compiler
Platforms
Functions
Statistics

General information

The software transactional memory (STM) library code provides several configurations and policies for implementing transactional memory (TM). A transaction performs a set of reads and writes to shared memory. The TM system, under certain conditions, guarantees that each transaction appears as if all of its reads and writes are performed atomically together in relation to other transactions.

For the current implementation, it is assumed that shared locations are either always accessed inside transactions or always outside transactions, and that transactions include only revocable operations without side effects, and hence can be safely undone and retried.

The STM implementations use metadata to synchronize access to shared locations accessed transactionally. Each shared location is associated with a metadata entry. A metadata entry serves as a version number tracking updates of associated shared locations, and as a lock protecting updates of these shared locations.

In general, at the beginning of a transaction, the thread sets private per-thread transactional status data and in some configurations read some global data. For transactional reads, the thread reads and checks the metadata corresponding to the target location and then reads the target location, and in some configurations, checks the consistency of the read set. For transactional writes, the thread records the target address and the value to be written, and in some configurations acquires the corresponding metadata lock. At the end of a transaction, the thread acquires the metadata locks corresponding to its write set if not already acquired, validates the consistency of its read set, writes the values to the addresses in its write set, releases the metadata locks, and finally resets its private transactional data.

The main design configuration issues are:

Read set inconsistency may lead to failures such as segmentation faults, division by zero and infinite loops. Under configurations that allow read set inconsistency, the STM system attempts to catch and recover from these failures. However, these attempts are not guaranteed to catch every possible failure. For example, the STM library cannot detect and recover from an infinite loop that does not generate any calls to the STM code. In some cases all possible failure types that may arise due to inconsistent reads can be anticipated, and it may be advantageous to forego read set consistency, in order to avoid the costs of frequent read set validations that are needed to guarantee read set consistency.
If read set consistency is to be guaranteed, there are multiple configurations. One is to check the metadata of each read in the read set after every read, which leads to a quadratic cost. The advantage of this choice is that it is scalable, as in general disjoint transactions do not interfere with each other.
Another configuration employs a global version number to enable fast validation without checking each individual metadata entry corresponding to the reads in the transaction's read set unless a change in the global version number is detected. However, this requires each writing transactions to increment the global version number and hence this may limit concurrency, even among non-conflicting transactions.
For programs with transactions with small read sets, the cost of full validation after every read may not be high, while incrementing a global version may be expensive relative to the small transactions and limits concurrency, and thus it may be a better choice to use full validation.
For programs with transactions with large read sets, the cost of updating the global version number may be small relative to the large transactions, and the low frequency of global version number updates may not pose significant limitations on concurrency, while the quadratic cost of full validation after every read can be very high. Thus, in such cases it may be better to use a configuration with a global version number.
Another design issue is when to acquire the metadata locks for the write set. Acquiring locks at encounter time enables fast determination for reads whether the location has been written by the same transaction or not without searching the write set, but increases the time where locks are held and thus increases the chances of inter-transaction conflicts. Acquiring locks at the end of the transaction minimizes the time where locks are held, but requires other mechanisms for detecting read after write cases using Bloom filters for fast search of write sets.

Configurations

The default implementation uses the following policy and configuration options:

Consistent read sets
Global version number
Write set lock acquisition at the end of the transaction
1M metadata data entries
8 byte conflict unit
512-bit write set Bloom filters

In order to use different policies and configurations, add EXTRA_FLAGS settings to the make command. For example, to use an encounter-time acquire policy for transactional writes use the command: make all EXTRA_FLAGS="-DENCOUNTER_ACQUIRE" Some of the main configuration flags are:

NOINC_VALIDATION: To allow inconsistent reads and use signals to catch failures.
NOGLOBAL_VERSION: To avoid the use of a global version number.
ENCOUNTER_ACQUIRE: To acquire metadata locks on encountering transactional writes instead of at the end of the transaction
LOG_2_NUM_ORECS=<integer>: Log 2 of the number of metadata entries. Default 20.
LOG_2_BLOOM_FILTER_BITS=<integer>: Log 2 of the write set Bloom filter size. Default 9.
LOG_2_BLOCKS_SIZE=<integer>: Log 2 of the conflict unit block size. Default 3.
MAX_TXNS=<integer>: Maximum number of static transactions in the programs. Used only for statistics. Default 64.

XL compiler

The STM implementations support a low-level interface. The IBM XL C/C++ compiler for Transactional Memory available from IBM AlphaWorks (http://www.alphaworks.ibm.com/tech/xlcstm/) uses this interface. Therefore, STM libraries built from this code release can be linked with user programs using that compiler without need to instrument individual reads and writes inside transactions. Programs need to use a high-level interface as specified in http://www.alphaworks.ibm.com/tech/xlcstm/. Note that the IBM XL C/C++ compiler for Transactional Memory runs on AIX systems only at this point, while the STM code runs on a variety of systems as noted below.

It is possible for other compilers to use this STM code, if they use the same low level interface described below.

Platforms

The STM code runs on several platforms including AIX-PowerPC, Linux-PowerPC, and Linux-X86, and Linux-X86_64. The code may run on Solaris-SPARC but it has not been tested.

Functions

The low-level STM interface is defined in the file stm.h. The main functions and macros are:

The macros STM_BEGIN and STM_END are to be used at the beginning and end of transactions.
stm_thr_init takes no arguments and returns a pointer to the thread's private transactional descriptor. This function should be called once before a thread starts using transactions.
stm_thr_retire takes no arguments and has no return value. This function should be called once when a thread is no longer to use transactions. A thread may call stm_thr_init after it calls stm_thr_retire in order to resume using transactions.
stm_desc takes no arguments and returns a pointer to the per-thread private transactional descriptor that is passed as an argument to most other STM functions as described below.
stm_read_* functions, e.g., stm_read_int, stm_read_float, etc. These functions take two arguments: a pointer to a location of the specified type and a pointer to the per-thread private transactional descriptor.
stm_write_* functions, e.g., stm_write_int, stm_write_double, etc. These functions take three arguments: a pointer to the target location, the value to be written, and a pointer to the per-thread private transactional descriptor.
stm_malloc, stm_free are similar to the standard malloc and free functions but take an additional argument, a pointer to the thread's transactional descriptor.

Statistics

The STM code can collect runtime statistics related to the inherent transactional characteristics of the program independent of the STM implementation, such as transaction sizes and frequencies, as well as the interaction of the program with the STM implementation specifics, such as Bloom filter matches, metadata locks acquired.

In order to build an STM library that collects statistics use:

make all STATS=on

Note that the performance of the STM library is significantly lower when collecting statistics is enabled.

Runtime statistics are categorized by static transactions. Each static transaction is identified by the file name and line number where it starts. Aggregate statistics are also generated.

In order to generate statistics files, the program needs to call the function stm_stats_out(). The program may call this function multiple times. Note that if the function is called by a thread while other threads are actively using transactions, then the statistics may be inconsistent as statistics of other threads may change while taking a snapshot of the statistics. Typically, calls to stm_stats_out should be at stable points of the programs such as at the end where only the main thread is active. In such cases, the statistics should be consistent.

The output files take the form stm_stats_tag<xx>_txn_<filename>_<lineno>.out for statistics per static transaction and stm_stats_tag<xx>_all.out for aggregate statistics. The tag indicates the number of times stm_stats_out has been called. The file name and line number indicate the associated static transaction. For example, on the second call to stm_stats_out, the runtime statistics for the static transaction starting at line 100 in the file foo.c will be included in the file stm_stats_tag02_foo.c_100.out.

A list of statistics is in the file stats.h. The following are some of the main statistics collected by the STM code:

READ_ONLY_COMMITS: Number of committed transactions with no writes
READ_WRITE_COMMITS: Number of committed transactions with writes
TOTAL_COMMITS: Total number of successfully committed transactions
TOTAL_RETRIES: Total number of retried to transactions
AVG_RETRIES_PER_TXN: Average number retries per committed transaction
READ_SET_SIZES: Total number of items in read sets of committed transactions excluding duplicates
WRITE_SET_SIZES: Total number of items in write sets of committed transactions excluding duplicates
AVG_READ_SET_SIZE: Average number of transactional reads to unique locations per transaction
AVG_WRITE_SET_SIZE: Average number of transactional writes to unique locations per transaction
PCT_DUPLICATE_WRITES: Percent of transactional writes that are to locations already written in the same transaction
NUM_SILENT_WRITES: Number of writes with new value the same as the current value
PCT_SILENT_WRITES: Percent of transactional writes with values the same as the current values
DUPLICATE_WRITES: Number of writes to locations already written in the same transaction
DUPLICATE_READS: Number of reads of locations already read in the same transaction
PCT_DUPLICATE_READS: Percent of transactional reads that are to locations already read in the same transaction
READ_AFTER_WRITE_MATCHES: Number of read after write of the same location in the same transaction
PCT_READ_AFTER_WRITE: Percent of transactional reads that are to locations already written by the same transaction
NUM_MALLOCS: Number of calls to malloc inside transactions
NUM_FREES: Number of calls to free inside transactions
NUM_FREE_PRIVATE: Number of calls to free of blocks allocated in the same transaction
READ_LIST_MAX_SIZE: Maximum number of items in a transaction read list
WRITE_LIST_MAX_SIZE: Maximum number of items in a transaction write list
READ_SET_MAX_SIZE: Maximum number of unique items in a transaction read set
WRITE_SET_MAX_SIZE: Maximum number of unique items in a transaction write set
MAX_NESTING: Maximum level of nesting