Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of binning of continous variables for chi2 #30430

Open
bykhov opened this issue Dec 8, 2024 · 2 comments · May be fixed by #30473
Open

Example of binning of continous variables for chi2 #30430

bykhov opened this issue Dec 8, 2024 · 2 comments · May be fixed by #30473
Labels
Documentation Needs Triage Issue requires triage

Comments

@bykhov
Copy link

bykhov commented Dec 8, 2024

Describe the issue linked to the documentation

The chi2 doesn't work on continuous variables. This issue has numerous discussions, e.g. here.

The Matlab counterpart command, fscchi2, solves this issue by automatically binning data. I believe that the example of chi2 feature selection with pre-binning may be beneficial.

Suggest a potential alternative/fix

No response

@bykhov bykhov added Documentation Needs Triage Issue requires triage labels Dec 8, 2024
@hugoboulenger
Copy link

Hello !
I would like to work on this issue.

I am wondering whether it would be better to add an "automatic_binning" parameter to the chi2 function or to clarify the documentation by explicitly stating that continuous data must be pre-binned before using chi2.

In the second case, it might also be helpful to include an example demonstrating how to bin continuous data before applying chi2. What do you think would be the preferred approach?

@bykhov
Copy link
Author

bykhov commented Dec 10, 2024

It is great to know that this issue is relevant.

As an immediate solution, some basic binning example is recommended.

In the long run, it is recommended that some binning technique has to be implemented. It is important to note, that the multivariate binning is non-trivial. For example, there are different binning strategies, e.g. R binning for Chi2 and handling infinite values.

Personally, I am not proficient enough in statistics to propose and theoretically justify any particular binning strategy and its influence on the feature selection performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Needs Triage Issue requires triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants