-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior of sklearn.feature_selection.mutual_info_regression if copy=False #28793
Comments
It kinds of remind me of #27307 and #27691. In general, I think there are two complementary ways to improve the situation:
|
Regarding the first way. In this case, I guess the docstring should qualify that only I am a bit biased here as I want |
It would help a lot if you could explain your use case a bit 🙏. In particular how does it affect you that |
This issue may go in a bit different direction now :) Let me know if i should separate it into a new one. I am updating a third party sklearn plug-in for feature selection, where we want to add a possibility for feature selection using conditional mutual information based on the paper. Note that unlike the paper itself, our solution will use a proprietary solver, but the plug-in itself is open-source. Mathematically we will aim to solve where The conditional mutual information algorithm will follow mostly this implementation, which is in turn based on the Essentially, I want to compute the values
The second option would also allow for using the problem formulation described in the mRMR paper. Similar to an existing issue #8889 but using our solver. I was actually considering making the second option (making I am happy to submit a PR on some of these issues. |
Thanks for the details! So my understanding is that you do something like this: mutual_info_1 = mutual_info_regression(X, y, copy=False)
# make assumptions on how X and y have been modified and call `mutual_info_regression` again
mutual_info_2 = mutual_info_regression(..., copy=False) How much do you care about using My current understanding that may not be fully accurate: when using Even if Then there is the question on |
Thanks for the response! An example code would be something like # We rescale X and y in the sklearn method computing I(x_i; y) for all i:
mutual_info = mutual_info_regression(X, y, copy=False)
# We re-use the rescaled X and y in our method computing I(x_i; y| x_j)
# for all i and j , where we do not modify X or y
conditional_mutual_info = conditional_mutual_info(X, y) Note that the mutual_info methods rescale and add noise to But if I understood you correctly using On another note, I think importing the private |
Yes, I asked other maintainers inputs and basically there seems to be agreement on this: if you use I am going to label this issue as Documentation because I think the most reasonable thing to do would be to document this in the glossary. I guess adding a link to the glossary entry in all the function/classes that uses |
Many thanks! |
Describe the bug
The parameter
copy
of the functionmutual_info_regression
is described as followsscikit-learn/sklearn/feature_selection/_mutual_info.py
Lines 381 to 383 in d1d1596
I read it as both
X
andy
should be modified ifcopy=False
andX
has continuous features. However,y
is always copied. I think the linesscikit-learn/sklearn/feature_selection/_mutual_info.py
Lines 309 to 310 in d1d1596
Similarly to the the treatment of
X
scikit-learn/sklearn/feature_selection/_mutual_info.py
Lines 295 to 299 in d1d1596
Steps/Code to Reproduce
Expected Results
The result should be
since both
X
andy
should be modified in place by the functionmutual_info_regression
.Actual Results
Versions
The text was updated successfully, but these errors were encountered: