Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN Refactors find binning threshold #18395

Merged
merged 2 commits into from
Oct 14, 2020

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Related to #16909 and #18394

What does this implement/fix? Explain your changes.

This PR is used to make the above PRs easier to review.

CC @NicolasHug

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

@@ -16,76 +16,53 @@
from .common import X_DTYPE, X_BINNED_DTYPE, ALMOST_INF


def _find_binning_thresholds(data, max_bins, subsample, random_state):
def _find_binning_threshold(col_data, max_bins):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still have an s:

Suggested change
def _find_binning_threshold(col_data, max_bins):
def _find_binning_thresholds(col_data, max_bins):

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the following edge case that would need to be handled and tested, LGTM. Because it's unrelated to this PR, let's merge.

missing_mask = np.isnan(col_data)
if missing_mask.any():
col_data = col_data[~missing_mask]
col_data = np.ascontiguousarray(col_data, dtype=X_DTYPE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a problem if a column has 100% missing values. But this problem is already present in master.

@ogrisel ogrisel merged commit cd67347 into scikit-learn:master Oct 14, 2020
amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants