Skip to content

MeanThreshold, MedianThreshold, and other threshold support in GenericUnivariateSelect #21699

@charlesbmi

Description

@charlesbmi

Describe the workflow you want to enable

I would like to select features by thresholding their mean value (i.e., mean-across-samples), similar to how VarianceThreshold selects features by thresholding their variance-across-samples.

Describe your proposed solution

Two possible options:

Describe alternatives you've considered, if relevant

Another alternative, although this seems counter to how these functions are designed

Additional context

Setting a MeanThreshold would be useful when working with non-negative features, such as pixel intensity in images. For example, we might want to exclude pixels that are regularly saturated in our dataset, as they may be less informative.

Specifically, in my research field of neuroscience (single-neuron recordings), our "features" are the (non-negative) action-potential-counts for each neuron. We often exclude neurons with very-low-firing-rates to minimize discretization error. Here are a few examples of neuroscience papers that set a MeanThreshold per neuron (i.e., feature):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions