Software Transactional Memory Library
- General information
- Configurations
- XL compiler
- Platforms
- Functions
- Statistics
General information
The software transactional memory (STM) library code provides
several configurations and policies for implementing transactional
memory (TM). A transaction performs a set of reads and writes to shared
memory. The TM system, under certain conditions, guarantees that each
transaction appears as if all of its reads and writes are performed
atomically together in relation to other transactions.
For the current implementation, it is assumed that shared locations
are either always accessed inside transactions or always outside
transactions, and that transactions include only revocable operations
without side effects, and hence can be safely undone and retried.
The STM implementations use metadata to synchronize access to shared
locations accessed transactionally. Each shared location is associated
with a metadata entry. A metadata entry serves as a version number
tracking updates of associated shared locations, and as a lock
protecting updates of these shared locations.
In general, at the beginning of a transaction, the thread sets
private per-thread transactional status data and in some configurations
read some global data. For transactional reads, the thread reads and
checks the metadata corresponding to the target location and then reads
the target location, and in some configurations, checks the consistency
of the read set. For transactional writes, the thread records the
target address and the value to be written, and in some configurations
acquires the corresponding metadata lock. At the end of a transaction,
the thread acquires the metadata locks corresponding to its write set
if not already acquired, validates the consistency of its read set,
writes the values to the addresses in its write set, releases the
metadata locks, and finally resets its private transactional data.
The main design configuration issues are:
- Read set inconsistency may lead to failures such as segmentation
faults, division by zero and infinite loops. Under
configurations that allow read set inconsistency, the STM
system attempts to catch and recover from these failures.
However, these attempts are not guaranteed to catch every
possible failure. For example, the STM library cannot detect
and recover from an infinite loop that does not generate any
calls to the STM code. In some cases all possible failure types
that may arise due to inconsistent reads can be anticipated,
and it may be advantageous to forego read set consistency, in
order to avoid the costs of frequent read set validations that
are needed to guarantee read set consistency.
- If read set consistency is to be guaranteed, there are multiple
configurations. One is to check the metadata of each read in
the read set after every read, which leads to a quadratic cost.
The advantage of this choice is that it is scalable, as in
general disjoint transactions do not interfere with each
other.
- Another configuration employs a global version number to enable
fast validation without checking each individual metadata entry
corresponding to the reads in the transaction's read set unless
a change in the global version number is detected. However,
this requires each writing transactions to increment the global
version number and hence this may limit concurrency, even among
non-conflicting transactions.
- For programs with transactions with small read sets, the cost of
full validation after every read may not be high, while
incrementing a global version may be expensive relative to the
small transactions and limits concurrency, and thus it may be a
better choice to use full validation.
- For programs with transactions with large read sets, the cost of
updating the global version number may be small relative to the
large transactions, and the low frequency of global version number
updates may not pose significant limitations on concurrency, while
the quadratic cost of full validation after every read can be very
high. Thus, in such cases it may be better to use a configuration
with a global version number.
- Another design issue is when to acquire the metadata locks for the
write set. Acquiring locks at encounter time enables fast
determination for reads whether the location has been written by
the same transaction or not without searching the write set, but
increases the time where locks are held and thus increases the
chances of inter-transaction conflicts. Acquiring locks at the end
of the transaction minimizes the time where locks are held, but
requires other mechanisms for detecting read after write cases
using Bloom filters for fast search of write sets.
Configurations
The default implementation uses the following policy and configuration
options:
- Consistent read sets
- Global version number
- Write set lock acquisition at the end of the transaction
- 1M metadata data entries
- 8 byte conflict unit
- 512-bit write set Bloom filters
In order to use different policies and configurations, add EXTRA_FLAGS
settings to the make command. For example, to use an encounter-time
acquire policy for transactional writes use the command:
make all EXTRA_FLAGS="-DENCOUNTER_ACQUIRE"
Some of the main configuration flags are:
- NOINC_VALIDATION: To allow inconsistent reads and use signals to catch failures.
- NOGLOBAL_VERSION: To avoid the use of a global version number.
- ENCOUNTER_ACQUIRE: To acquire metadata locks on encountering transactional writes instead of at the end of the transaction
- LOG_2_NUM_ORECS=<integer>: Log 2 of the number of metadata entries. Default 20.
- LOG_2_BLOOM_FILTER_BITS=<integer>: Log 2 of the write set Bloom filter size. Default 9.
- LOG_2_BLOCKS_SIZE=<integer>: Log 2 of the conflict unit block size. Default 3.
- MAX_TXNS=<integer>: Maximum number of static transactions in the programs. Used only for statistics. Default 64.
XL compiler
The STM implementations support a low-level interface. The IBM XL
C/C++ compiler for Transactional Memory available from IBM AlphaWorks
(http://www.alphaworks.ibm.com/tech/xlcstm/) uses this
interface. Therefore, STM libraries built from this code release can
be linked with user programs using that compiler without need to
instrument individual reads and writes inside transactions. Programs
need to use a high-level interface as specified in
http://www.alphaworks.ibm.com/tech/xlcstm/. Note that the IBM XL C/C++
compiler for Transactional Memory runs on AIX systems only at this
point, while the STM code runs on a variety of systems as noted below.
It is possible for other compilers to use this STM code, if they use
the same low level interface described below.
Platforms
The STM code runs on several platforms including AIX-PowerPC,
Linux-PowerPC, and Linux-X86, and Linux-X86_64. The code may run on Solaris-SPARC
but it has not been tested.
Functions
The low-level STM interface is defined in the file stm.h.
The main functions and macros are:
- The macros STM_BEGIN and STM_END are to be used at the beginning and
end of transactions.
- stm_thr_init takes no arguments and returns a pointer to the thread's
private transactional descriptor. This function should be called once
before a thread starts using transactions.
- stm_thr_retire takes no arguments and has no return value. This
function should be called once when a thread is no longer to use
transactions. A thread may call stm_thr_init after it calls stm_thr_retire
in order to resume using transactions.
- stm_desc takes no arguments and returns a pointer to the per-thread
private transactional descriptor that is passed as an argument to most
other STM functions as described below.
- stm_read_* functions, e.g., stm_read_int, stm_read_float, etc. These
functions take two arguments: a pointer to a location of the specified type
and a pointer to the per-thread private transactional descriptor.
- stm_write_* functions, e.g., stm_write_int, stm_write_double, etc.
These functions take three arguments: a pointer to the target location, the
value to be written, and a pointer to the per-thread private transactional
descriptor.
- stm_malloc, stm_free are similar to the standard malloc and free
functions but take an additional argument, a pointer to the thread's
transactional descriptor.
Statistics
The STM code can collect runtime statistics related to the inherent
transactional characteristics of the program independent of the STM
implementation, such as transaction sizes and frequencies, as well as the
interaction of the program with the STM implementation specifics, such as Bloom
filter matches, metadata locks acquired.
In order to build an STM library that collects statistics use:
make all STATS=on
Note that the performance of the STM library is significantly lower when
collecting statistics is enabled.
Runtime statistics are categorized by static transactions. Each static
transaction is identified by the file name and line number where it starts.
Aggregate statistics are also generated.
In order to generate statistics files, the program needs to call the
function stm_stats_out(). The program may call this function multiple times.
Note that if the function is called by a thread while other threads are
actively using transactions, then the statistics may be inconsistent as
statistics of other threads may change while taking a snapshot of the
statistics. Typically, calls to stm_stats_out should be at stable points of the
programs such as at the end where only the main thread is active. In such
cases, the statistics should be consistent.
The output files take the form
stm_stats_tag<xx>_txn_<filename>_<lineno>.out for statistics
per static transaction and stm_stats_tag<xx>_all.out for aggregate
statistics. The tag indicates the number of times stm_stats_out has been
called. The file name and line number indicate the associated static
transaction. For example, on the second call to stm_stats_out, the runtime
statistics for the static transaction starting at line 100 in the file foo.c
will be included in the file stm_stats_tag02_foo.c_100.out.
A list of statistics is in the file stats.h. The following are some of the
main statistics collected by the STM code:
- READ_ONLY_COMMITS: Number of committed transactions with no writes
- READ_WRITE_COMMITS: Number of committed transactions with writes
- TOTAL_COMMITS: Total number of successfully committed transactions
- TOTAL_RETRIES: Total number of retried to transactions
- AVG_RETRIES_PER_TXN: Average number retries per committed
transaction
- READ_SET_SIZES: Total number of items in read sets of committed
transactions excluding duplicates
- WRITE_SET_SIZES: Total number of items in write sets of committed
transactions excluding duplicates
- AVG_READ_SET_SIZE: Average number of transactional reads to unique
locations per transaction
- AVG_WRITE_SET_SIZE: Average number of transactional writes to unique
locations per transaction
- PCT_DUPLICATE_WRITES: Percent of transactional writes that are to
locations already written in the same transaction
- NUM_SILENT_WRITES: Number of writes with new value the same as the
current value
- PCT_SILENT_WRITES: Percent of transactional writes with values the same
as the current values
- DUPLICATE_WRITES: Number of writes to locations already written in the
same transaction
- DUPLICATE_READS: Number of reads of locations already read in the same
transaction
- PCT_DUPLICATE_READS: Percent of transactional reads that are to
locations already read in the same transaction
- READ_AFTER_WRITE_MATCHES: Number of read after write of the same
location in the same transaction
- PCT_READ_AFTER_WRITE: Percent of transactional reads that are to
locations already written by the same transaction
- NUM_MALLOCS: Number of calls to malloc inside transactions
- NUM_FREES: Number of calls to free inside transactions
- NUM_FREE_PRIVATE: Number of calls to free of blocks allocated in the
same transaction
- READ_LIST_MAX_SIZE: Maximum number of items in a transaction read
list
- WRITE_LIST_MAX_SIZE: Maximum number of items in a transaction write
list
- READ_SET_MAX_SIZE: Maximum number of unique items in a transaction read
set
- WRITE_SET_MAX_SIZE: Maximum number of unique items in a transaction
write set
- MAX_NESTING: Maximum level of nesting