Skip to content

alexander-yu/stream

Repository files navigation

Stream

GoDoc Build Status Go Report Card codecov GitHub license

Stream is a Go library for online statistical algorithms. Provided statistics can be computed globally over an entire stream, or over a rolling window.

Table of Contents

Installation

Use go get:

go get github.com/alexander-yu/stream

Example Usage

In-depth examples are provided in the examples directory, but a small taste is provided below:

// tracks the autocorrelation over a
// rolling window of size 15 and lag of 5
autocorr, err := joint.NewAutocorr(5, 15)
// handle err

// all metrics in the joint package must be passed
// through joint.Init in order to consume values
err = joint.Init(autocorr)
// handle err

// tracks the global median using a pair of heaps
median, err := quantile.NewGlobalHeapMedian()
// handle err

for i := 0., i < 100; i++ {
    err = autocorr.Push(i)
    // handle err

    err = median.Push(i)
    // handle err
}

autocorrVal, err := autocorr.Value()
// handle err

medianVal, err := median.Value()
// handle err

fmt.Println("%s: %f", autocorr.String(), autocorrVal)
fmt.Println("%s: %f", median.String(), medianVal)

Statistics

For time/space complexity details on the algorithms listed below, see here.

Quantile

Quantile keeps track of the quantiles of a stream. Quantile can calculate the global quantiles of a stream, or over a rolling window. You can also configure which implementation to use as the underlying data structure, as well as which interpolation method to use in the case that a quantile actually lies in between two elements. For now skip lists as well as order statistic trees (in particular modified forms of AVL trees and red black trees) are supported.

Median

Median keeps track of the median of a stream; this is simply a convenient wrapper over Quantile, that automatically sets the quantile to be 0.5 and the interpolation method to be the midpoint method.

IQR

IQR keeps track of the interquartile range of a stream; this is simply a convenient wrapper over Quantile, that retrieves the 1st and 3rd quartiles and sets the interpolation method to be the midpoint method.

HeapMedian

HeapMedian keeps track of the median of a stream with a pair of heaps. In particular, it uses a max-heap and a min-heap to keep track of elements below and above the median, respectively. HeapMedian can calculate the global median of a stream, or over a rolling window.

Min

Min keeps track of the minimum of a stream; it can track either the global minimum, or over a rolling window.

Max

Max keeps track of the maximum of a stream; it can track either the global maximum, or over a rolling window.

Mean

Mean keeps track of the mean of a stream; it can track either the global mean, or over a rolling window.

EWMA

EWMA keeps track of the global exponentially weighted moving average.

Moment

Moment keeps track of the k-th sample central moment; it can track either the global moment, or over a rolling window.

EWMMoment

EWMMoment keeps track of the global k-sample exponentially weighted moving sample central moment. This uses the exponentially weighted moving average as its center of mass, and uses the same exponential weights for its power terms.

Std

Std keeps track of the sample standard deviation of a stream; it can track either the global standard deviation, or over a rolling window. To track the sample variance instead, you should use Moment, i.e.

variance := New(2, window)

EWMStd

EWMStd keeps track of the global exponentially weighted moving standard deviation. To track the exponentially weighted moving variance instead, you should use EWMMoment, i.e.

variance := NewEWMMoment(2, decay)

Skewness

Skewness keeps track of the sample skewness of a stream (in particular, the adjusted Fisher-Pearson standardized moment coefficient); it can track either the global skewness, or over a rolling window.

Kurtosis

Kurtosis keeps track of the sample kurtosis of a stream (in particular, the sample excess kurtosis); it can track either the global kurtosis, or over a rolling window.

Core (Univariate)

Core is the struct powering all of the statistics in the stream/moment subpackage; it keeps track of a pre-configured set of centralized k-th power sums of a stream in an efficient, numerically stable way; it can track either the global sums, or over a rolling window.

To configure which sums to track, you'll need to instantiate a CoreConfig struct and provide it to NewCore:

config := &moment.CoreConfig{
    Sums: SumsConfig{
        2: true, // tracks the sum of squared differences
        3: true, // tracks the sum of cubed differences
    },
    Window: stream.IntPtr(0),    // tracks global sums
    Decay: stream.FloatPtr(0.3), // tracks exponentially weighted sums with a decay factor of 0.3
}
core, err := NewCore(config)

See the godoc entry for more details on Core's methods.

Cov

Cov keeps track of the sample covariance of a stream; it can track either the global covariance, or over a rolling window.

EWMCov

EWMCov keeps track of the global exponentially weighted sample covariance of a stream. This uses the exponentially weighted moving average as its center of mass, and uses the same exponential weights for its power terms.

Corr

Corr keeps track of the sample correlation of a stream (in particular, the sample Pearson correlation coefficient); it can track either the global correlation, or over a rolling window.

EWMCorr

EWMCorr keeps track of the global sample exponentially weighted correlation of a stream (in particular, the exponentially weighted sample Pearson correlation coefficient). This uses the exponentially weighted moving average as its center of mass, and uses the same exponential weights for its power terms.

Autocorr

Autocorr keeps track of the sample autocorrelation of a stream (in particular, the sample autocorrelation) for a given lag; it can track either the global autocorrelation, or over a rolling window.

Autocov

Autocov keeps track of the sample autocovariance of a stream (in particular, the sample autocovariance) for a given lag; it can track either the global autocovariance, or over a rolling window.

Core (Multivariate)

Core is the struct powering all of the statistics in the stream/joint subpackage; it keeps track of a pre-configured set of joint centralized power sums of a stream in an efficient, numerically stable way; it can track either the global sums, or over a rolling window.

To configure which sums to track, you'll need to instantiate a CoreConfig struct and provide it to NewCore:

config := &joint.CoreConfig{
    Sums: SumsConfig{
        {1, 1}, // tracks the joint sum of differences
        {2, 0}, // tracks the sum of squared differences of variable 1
    },
    Vars: stream.IntPtr(2),      // declares that there are 2 variables to track (optional if Sums is set)
    Window: stream.IntPtr(0),    // tracks global sums
    Decay: stream.FloatPtr(0.3), // tracks exponentially weighted sums with a decay factor of 0.3
}
core, err := NewCore(config)

See the godoc entry for more details on Core's methods.

SimpleAggregateMetric

SimpleAggregateMetric is a convenience wrapper that stores multiple univariate metrics and will push a value to all metrics simultaneously; instead of returning a single scalar, it returns a map of metrics to their corresponding values.

SimpleJointAggregateMetric

SimpleJointAggregateMetric is a convenience wrapper that stores multiple multivariate metrics and will push a value to all metrics simultaneously; instead of returning a single scalar, it returns a map of metrics to their corresponding values.