Our implementation allows users to effectively explore the relationship between the proportion of selected variables (
- Orientative limits for the lambda grid are set based on a preliminary whole path run.
- For each subsampling draw, a distinct path is created.
- We iterate through the selection states of all paths at each lambda value. During each iteration:
- The average selection probability for each variable is calculated across all models.
- We ensure that the selection probability for each variable is monotonically decreasing as we progress through the lambda values.
- At each step, we take the element-wise minimum between the current selected and the new average selection probability.
- The outcome is that the selection probability curves for each variable will be monotonically decreasing across the lambda values.
- Subsequently, we plot
$\pi^\lambda_{1}, ..., \pi^\lambda_{p}$ against$\frac{q_{\lambda}}{p}$ .
This approach follows the guidance provided in the original stability selection paper, which states:
To do this, we need knowledge about
$q_{\Lambda}$ . This can be easily achieved by regularization of the selection procedure$\hat{S}=\hat{S}^q$ in terms of the number of selected variables$q$ , i.e., the domain$\Lambda$ for the regularization parameter$\lambda$ determines the number$q$ of selected variables, i.e.,$q=q(\Lambda)$ . For example, with$l_1$ -norm penalization, the number$q$ is given by the variables which enter first in the regularization path when varying from a maximal value$\lambda_{\max }$ to some minimal value$\lambda_{\min }$ . Mathematically,$\lambda_{\min }$ is such that$\left|\cup_{\lambda_{\max }\geq \lambda \geq \lambda_{\min } }\hat{S}^\lambda\right| \leq q$ .
Note:: Recall the bound for the expected number of falsely selected variables under the assumptions in Theorem 1:
\noindent Dividing both sides by
Let us define
Finally, we can add these curves to the
The code is integrated in LassoNet.
pip install lassonet
See main.py
for an example of how to run the stability selection algorithm.
generate.py
: Generate the data.main.py
: Run the stability selection algorithm.
Meixide, C. G., Matabuena, M., Abraham, L., & Kosorok, M. R. (2024). Neural interval‐censored survival regression with feature selection. Statistical Analysis and Data Mining: The ASA Data Science Journal, 17(4), e11704. https://doi.org/10.1002/sam.11704