SLEP022: Fixing randomness ambiguities #88

NicolasHug · 2023-05-12T10:29:27Z

This SLEP is NOT the same as its predecessor #24

Disclaimer: I won't be able to champion this SLEP.
I'm opening it here now because I hope it can help better framing the discussions happening in scikit-learn/scikit-learn#26148.

…s into random_state

Co-authored-by: Adrin Jalali <[email protected]>

…osals into random_state

…into random_state_v2

betatim · 2023-05-15T07:31:06Z

I'd like to take on this SLEP if it is Ok with you.

NicolasHug · 2023-05-15T08:33:28Z

Sure, happy for you to take over @betatim , thanks!

betatim · 2023-06-30T15:24:45Z

slep022/proposal.rst

+
+
+Abstract
+========


Mention that we make these changes in behaviour together with the switch to the new Numpy random infrastructure. Combined with the plan of random_state= to rng= transition for the argument name.

betatim · 2023-06-30T15:25:44Z

slep022/proposal.rst

+      if 'random_state' not in est.get_params():
+          raise ValueError("This estimator isn't random and can only have exact clones")
+
+    def cross_val_score(est, X, y, cv, use_exact_clones=True):


Rename use_exact_clones so that statistical clones are allowed but it is fine to have deterministic estimators as well.

Maybe allow_statistical_clone=

ogrisel

Quick notes based on our drafting meeting discussion:

ogrisel · 2023-06-30T13:54:58Z

slep022/proposal.rst

+This means that ideally, calling `est.fit(X, y)` should yield the same model
+twice. We have a check for that in the `check_estimator()` suite:
+`check_fit_idempotent()`. Clearly, this fit-idempotency property is violated
+when None is used.


Suggested change

when None is used.

when None or a `np.random.RandomState` instance is used.

ogrisel · 2023-06-30T13:57:03Z

slep022/proposal.rst

+twice. We have a check for that in the `check_estimator()` suite:
+`check_fit_idempotent()`. Clearly, this fit-idempotency property is violated
+when None is used.
+


Note: we could improve our doc and states that scikit-learn compatible estimator fit results should be at least "statistically idempotent" (or "idempotent in expected value") unless and random_state is passed as an integer seed, in which case they should be strictly idempotent.

I think this is a sufficient requirement.

ogrisel · 2023-06-30T15:31:45Z

slep022/proposal.rst

+are using.
+
+This also introduces a private attribute, so we would need more intrusive
+changes to `set_params`, `get_params`, and `clone`.


I think we could alternatively explore the possibility to do stateless consumption in fit instead of seeding in __init__.

def fit(self): # or split() rng = check_random_state(self.random_state, copy=True) ...

This way we would still enforce the stateless fit semantics for the est.random_state attribute.

I will try to find the time to explore this option an alternative notebook in a gist.

joelostblom · 2023-08-07T12:02:52Z

Thanks for working on this! I think these clarification in the docs and the updated behavior would be really helpful to reduce ambiguity. I noticed that in scikit-learn/scikit-learn#26148 (comment) point 3 there was a mention that the new NumPy Generators would support a different behavior than the old RandomState objects (if I understood correctly). I didn't see a mention of that distinction in this proposal, is this difference still planned to be implemented?

NicolasHug and others added 24 commits November 26, 2019 18:14

WIP

5a53892

still WIP

926ad39

more

e5ba859

title

47011fa

some more

e3f02ac

Added abstract

18a0f25

Merge branch 'master' of github.com:scikit-learn/enhancement_proposal…

023cc4f

…s into random_state

Added note about SH forbidding stateful splitters

18ed43c

Update slep011/proposal.rst

c3f5495

Co-authored-by: Adrin Jalali <[email protected]>

some important details after chat with Alex

907672c

Merge branch 'random_state' of github.com:NicolasHug/enhancement_prop…

5417b7a

…osals into random_state

Wrote new requirements section

95b9123

Basically re-wrote the entire thing

7a4a2fe

moved some sections

0f5ba58

more about grid search

dd646b2

more on grid search

7aa7c20

minor comments

832f023

Added new alternative solution

bcb3dd1

minor additions

c806293

updated deprecation path section

d017d74

ajbdfajdfbjladgljadg

33fdd12

Merge branch 'main' of github.com:scikit-learn/enhancement_proposals …

103dbb1

…into random_state_v2

AJBFAJDBF

9917a9e

21 -> 22

66216e6

NicolasHug mentioned this pull request May 12, 2023

Cloned estimators have identical randomness but different RNG instances scikit-learn/scikit-learn#26148

Open

NicolasHug added 2 commits June 14, 2023 21:36

Some cleanups

c7e3105

typo

fa81443

betatim reviewed Jun 30, 2023

View reviewed changes

ogrisel reviewed Jun 30, 2023

View reviewed changes

joelostblom mentioned this pull request Aug 7, 2023

setting random seeds UBC-DSCI/introduction-to-datascience-python#163

Closed

thomasjpfan mentioned this pull request Jun 20, 2024

Using rng= keyword argument for NumPy randomness scikit-learn/scikit-learn#29315

Open

lorentzenchr mentioned this pull request Aug 14, 2024

ENH: stats.Normal: add new continuous distribution infrastructure + normal distribution scipy/scipy#21050

Merged

glemaitre mentioned this pull request Sep 2, 2024

Expose Seed in FeatureHasher and HashingVectorizer scikit-learn/scikit-learn#29748

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLEP022: Fixing randomness ambiguities #88

SLEP022: Fixing randomness ambiguities #88

NicolasHug commented May 12, 2023 •

edited

Loading

betatim commented May 15, 2023

NicolasHug commented May 15, 2023

betatim Jun 30, 2023

betatim Jun 30, 2023

ogrisel left a comment

ogrisel Jun 30, 2023

ogrisel Jun 30, 2023

ogrisel Jun 30, 2023 •

edited

Loading

joelostblom commented Aug 7, 2023

	when None is used.
	when None or a `np.random.RandomState` instance is used.



		Abstract
		========

SLEP022: Fixing randomness ambiguities #88

Are you sure you want to change the base?

SLEP022: Fixing randomness ambiguities #88

Conversation

NicolasHug commented May 12, 2023 • edited Loading

betatim commented May 15, 2023

NicolasHug commented May 15, 2023

betatim Jun 30, 2023

Choose a reason for hiding this comment

betatim Jun 30, 2023

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Jun 30, 2023

Choose a reason for hiding this comment

ogrisel Jun 30, 2023

Choose a reason for hiding this comment

ogrisel Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

joelostblom commented Aug 7, 2023

NicolasHug commented May 12, 2023 •

edited

Loading

ogrisel Jun 30, 2023 •

edited

Loading