Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC readability and clarity on permutation_test_score in userguide and example #30351

Merged
merged 13 commits into from
Jan 13, 2025

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Nov 26, 2024

Here, I suggest some improvements in clarity in the documentation for permutation_test_score.

changes include:

  • clarify in Userguide that permutation_test_score can be used on more than on classifiers
  • mentioning n_jobs after sentence on brute force procedure
  • readability on some difficult to understand sentences
  • in example, clear distinguishing between original data and data without permutations (as the term "original data" was used in two different contexts for different things)
  • add conclusions on results from permutation_test_score in example: what does the result infer for the null hypothesis?
  • call p_value a proportion instead of a percentage

@lucyleeow, since you have worked on these docs before, would you like to have a look?

Copy link

github-actions bot commented Nov 26, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: b9f9ba6. Link to the linter CI: here

@StefanieSenger
Copy link
Contributor Author

This doesn't need a changelog entry.

Comment on lines 105 to 108
# because the permutation always destroys any feature-label dependency present.
# The score obtained on the randomized data without label permutation in this case
# though, is very poor. This results in a large p-value, confirming that there was no
# feature-label dependency in the randomized data before labels where permuted.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm trying to not use the word "original", because it can be confused with the iris data that is called "original data" in the other parts of the example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, maybe we could also amend the legend in the plots, which both say 'original', we could change to 'original iris data' and 'original random data' ?

# distribution for the null hypothesis which states there is no dependency
# between the features and labels. An empirical p-value is then calculated as
# the percentage of permutations for which the score obtained is greater
# that the score obtained using the original data.
# the proportion of permutations for which the score obtained by the model trained on
Copy link
Contributor Author

@StefanieSenger StefanieSenger Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substituting "percentage" by "proportion", because the range of possible values is from 1/(n_permutations + 1) to 1.0, not 0.0 to 1.0.

@lucyleeow
Copy link
Member

This is on my to-do, I'll try and get to it soon!

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took me so long.

Good pick up on the user guide use of 'classifier' vs 'estimator' (it does sound like permutation_test_score can be use with any estimator that you can score on incl regressor and clusterings)

Some nitpicks

Comment on lines 967 to 968
between features and targets (there is no systematic relationship between these two and
any observed patterns are likely due to random chance) or because the estimator was not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
between features and targets (there is no systematic relationship between these two and
any observed patterns are likely due to random chance) or because the estimator was not
between features and targets (i.e., there is no systematic relationship between these two and
any observed patterns are likely due to random chance) or because the estimator was not

this sentence is really long but I am not sure how to improve it. I love bullet points (because i find wall of text difficult) but not sure it would suit here, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me do an experiment involving bullet points and also highlighting the logic with the "and both" vs. negation with "either one of".

What do you think?

However, I don't manage to render this without a greyish box. Relying on your expertise here! 🙏

Comment on lines 970 to 971
latter case, using a more appropriate estimator that is able to utilize the structure in
the data, would result in a lower p-value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key thing users may want to know here is how to tell the difference between no structure vs model was not able to use the structure. I don't think there is a 'guaranteed' way to to tell but it sounds like one way to tell would be to try a more suitable model and see if you get a lower p-value. This is essentially what we say here but maybe we could make it more explicit?

the data, would result in a lower p-value.

Cross-validation provides information about how well an estimator generalizes
by estimating the range of its expected errors. However, an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about terminology any 'metric' can be used with permutation_test_score right? Does the term 'error' only cover a subset of all metrics, e.g., ones where lower is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true, cross_validation and also permutation_test_score should only take metrics with "higher is better". I change errors into scores then. Don't think it was previously wrong though.

Comment on lines 105 to 108
# because the permutation always destroys any feature-label dependency present.
# The score obtained on the randomized data without label permutation in this case
# though, is very poor. This results in a large p-value, confirming that there was no
# feature-label dependency in the randomized data before labels where permuted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, maybe we could also amend the legend in the plots, which both say 'original', we could change to 'original iris data' and 'original random data' ?

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, @lucyleeow!

I have addressed your suggestions. Do you want to have another look?

Comment on lines 967 to 968
between features and targets (there is no systematic relationship between these two and
any observed patterns are likely due to random chance) or because the estimator was not
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me do an experiment involving bullet points and also highlighting the logic with the "and both" vs. negation with "either one of".

What do you think?

However, I don't manage to render this without a greyish box. Relying on your expertise here! 🙏

the data, would result in a lower p-value.

Cross-validation provides information about how well an estimator generalizes
by estimating the range of its expected errors. However, an
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true, cross_validation and also permutation_test_score should only take metrics with "higher is better". I change errors into scores then. Don't think it was previously wrong though.

Comment on lines 967 to 970
- a lack of dependency between features and targets (i.e., there is no systematic
relationship and any observed patterns are likely due to random chance)
- **or** because the estimator was not able to use the dependency in the data (for
instance because it under fit).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a blank line at the top may fix it? If still not working, the bullets may need not need to be indented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing both in combination worked. 🎆

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @lucyleeow! I have implemented your suggestions.
Do you think this PR is now ready to be merged?

Comment on lines 967 to 970
- a lack of dependency between features and targets (i.e., there is no systematic
relationship and any observed patterns are likely due to random chance)
- **or** because the estimator was not able to use the dependency in the data (for
instance because it under fit).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing both in combination worked. 🎆

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great!

@lucyleeow lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 18, 2024
Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @adrinjalali!
I have submitted your suggestions and answered one with a new idea on how to describe the pvalue param.

@adrinjalali adrinjalali removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 9, 2025
@glemaitre glemaitre self-requested a review January 13, 2025 18:48
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @StefanieSenger everything looks good.

I just pushed 2 small tiny nitpicks. I'll mark this PR as auto-merge.

@glemaitre glemaitre enabled auto-merge (squash) January 13, 2025 18:59
@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jan 13, 2025

Oh nice, @glemaitre, then let me quickly push another little thing that I had not seen Adrin had approved a few days before.

Edit: Done, can you enable auto-merge again?

auto-merge was automatically disabled January 13, 2025 19:26

Head branch was pushed to by a user without write access

@glemaitre glemaitre enabled auto-merge (squash) January 13, 2025 19:47
@glemaitre glemaitre merged commit 36ad7b3 into scikit-learn:main Jan 13, 2025
29 checks passed
@StefanieSenger StefanieSenger deleted the permutation_test_scores branch January 13, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants