Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Applies black formatting to most of the code base #18948

Merged
merged 14 commits into from
Jun 17, 2021

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Dec 1, 2020

Reference Issues/PRs

Related to #11336

What does this implement/fix? Explain your changes.

  1. This runs black on most of the codebase and fixes the flake8 issues as well.
  2. The configuration for flake8 is in setup.cfg.
  3. The configuration for black is in pyproject.toml.
  4. Here is a reference on how we would setup git blame with black. It requires merging this PR and then having another PR that adds a .git-blame-ignore-revs. I would say that git blame is an advanced feature of git, which is often used to track down the PR that created the diff.
  5. Makes the max-line-length 88, which is the default of black.

Any other comments?

For reference, django created a enhancement proposal for why they introduced using black for formatting.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thomasjpfan !

Here is a reference on how we would setup git blame with black.

Aww that's interesting, it's unfortunate that Github still doesn't support that.

A few comments below. Could you please check that we are running black on the externals dir?
Also please add it to the .pre-commit-config.yaml

-   repo: https://github.com/psf/black
    rev: 19.3b0
    hooks:
    -   id: black
        types: [file, python]

That's the recommended way of not forgetting to apply black on each commit..

pyproject.toml Outdated
\.eggs # exclude a few common directories in the
| \.git # root of the project
| \.mypy_cache
| examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use black for code, it might make sense to use it for examples as well maybe? Because if we don't use consistent formatting for all python it can be confusing for contributors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In yesterday's meeting we discussed that applying black to examples could potentially lead to big /scary code sections. No strong opinion on my side, although it's true that we've been following slightly different conventions for examples. Typically in the notebook-style ones, we write the import at the beginning of each cell instead of at the top of the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it comes down to a UX thing. When reading an example and the code block uses up a significant portion of the vertical space, does this make the example harder to read?

pyproject.toml Outdated
| build
| dist
| doc
| build_tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there python files in there, that we should avoid to format?

pyproject.toml Outdated
| examples
| build
| dist
| doc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config.py can be formatted at least, I think.

setup.cfg Outdated
dist,
examples,
sklearn/externals,
doc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs and examples should still be linted by flake8 I think.

@rth
Copy link
Member

rth commented Dec 1, 2020

Do we plan to then apply black automatically on PRs or do we ask contributors to do that manually?

@NicolasHug
Copy link
Member

NicolasHug commented Dec 1, 2020

Thanks @thomasjpfan

Other stuff we'll need to do reasonably fast after this is merged (EDIT: I think we should actually do them before):

  • have a black CI check
  • update our contributing guidelines to indicate to run black. Also mention the pre-commit hook as a possibility (but not a requirement)
  • make a pinned issue (sort of like Lots of new untracked files after merging with master? #17341) with instructions for PR authors on how to apply black and then solve potential merge conflicts.

Should we be doing all this (including this PR) in another branch, and merge into master in one shot?

Do we plan to then apply black automatically on PRs or do we ask contributors to do that manually?

Thing is, applying black automatically to PRs will likely force contributors to git pull and git merge each time. If they're not tech savy enough to read our docs and apply black themselves, they'll likely be confused by that process as well.

Also @thomasjpfan do you know if we can use black on Cython files as well?

@thomasjpfan
Copy link
Member Author

Should we be doing all this (including this PR) in another branch, and merge into master in one shot?

Given the size of this PR, I wouldn't want to do anymore than apply black in this PR. I can get those PRs which can quickly be merged after this one.

Thing is, applying black automatically to PRs will likely force contributors to git pull and git merge each time. If they're not tech savy enough to read our docs and apply black themselves, they'll likely be confused by that process as well.

I think the best way to do this is with a script they can use locally. This script would:

  1. Make sure the local repo is clean.
  2. Fetch upstream/master
  3. Cherry pick the pyproject.toml and setup.cfg from upstream/master
  4. Run black and commit.
  5. Try to merge with upstream/master.
  6. Ask for help if there is an issue with the merging. I think 90% of the time the merge will be not too bad.

Also @thomasjpfan do you know if we can use black on Cython files as well?

Not supported yet: psf/black#359

@NicolasHug
Copy link
Member

Given the size of this PR, I wouldn't want to do anymore than apply black in this PR.

I agree. I'm suggesting to merge this PR and the other black-related ones into a new branch rather than on master. This way, we can then merge all black-related changes in one shot into master, and avoid any asynchronicity.

I think the best way to do this is with a script they can use locally. This script would...

This is quite complex and we'd need to support 2 of these (unix and windows). It's probably simpler for now to have a black CI and to document how to apply black locally, mentioning commit hooks as an option.

@thomasjpfan thomasjpfan force-pushed the black_formating branch 2 times, most recently from 66be950 to 5ea7d64 Compare December 13, 2020 01:38
@glemaitre
Copy link
Member

Regarding the timing, I think that just after the release of 0.24 looks like a good idea.

Regarding the pre-commit, we could use a documentation likewise pandas: https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#python-pep8-black

It shows how to do it manually and refers to pre-commit if people want to. Since we have a section pre-commit, we can add additional stuff there.
Pandas as a CI for the style as well.

@NicolasHug
Copy link
Member

No opinion from my side but if we do that after 0.24, we'll need to make sure the stable docs contain the black-related info, which probably means cherry-picking the relevant PRs and pushing a new doc release.

Regardless, I still think we should do all this in a separate branch. And this PR should actually be the last one to be merged: it will introduce tons of merge conflicts and we need to point contributors to docs indicating how to apply black. So we should focus on the docs PRs first.

@glemaitre
Copy link
Member

No opinion from my side but if we do that after 0.24, we'll need to make sure the stable docs contain the black-related info, which probably means cherry-picking the relevant PRs and pushing a new doc release.

Good point. It could avoid updating the documentation for that matter.

Regardless, I still think we should do all this in a separate branch. And this PR should actually be the last one to be merged: it will introduce tons of merge conflicts and we need to point contributors to docs indicating how to apply black. So we should focus on the docs PRs first.

Looks like a good plan. I can create a PR and we can always change the targeted branch at the last minute.

@thomasjpfan
Copy link
Member Author

I opened #19031 to apply style changes that would make it so this PR can just be "apply black". I think this would simplify things because this PR would just be for applying black.

Regardless, I still think we should do all this in a separate branch. And this PR should actually be the last one to be merged: it will introduce tons of merge conflicts and we need to point contributors to docs indicating how to apply black. So we should focus on the docs PRs first.

I think I am missing something. What kind of work do you see us doing on this separate branch?

@NicolasHug
Copy link
Member

I think I am missing something. What kind of work do you see us doing on this separate branch?

Basically what I listed in #18948 (comment). I find it safer and simpler to have everything ready in one branch and just merge the whole branch in one shot than do incremental changes. When this PR is merged, we need to have the docs already, the CI check, and ideally a sticky issue pointing to conflict resolution instructions. Otherwise, current PR authors will be lost in what to do.

Base automatically changed from master to main January 22, 2021 10:53
@lorentzenchr lorentzenchr added this to the 1.0 milestone Mar 19, 2021
@rth
Copy link
Member

rth commented Jun 12, 2021

Any way we could move forward on this PR? #19031 is waiting for a second review.

It would be good to have documentation on how to deal with existing PRs, but just a pinned issue could do too. If contributors encounter difficulties in the PRs they will say so/open issues and we can move forward from there. Which is still better than being stuck here discussing how many PRs we need for this :)

@NicolasHug
Copy link
Member

A pinned issue could do without the docs, as long as this situation doesn't last too long. That is, we should release soon, with up-to-date docs. We will need to have these docs at some point anyway, so I don't see why we can't write them now.

At the very very least we'll need a CI check, otherwise we'll have to submit re-formatting PRs every now and then and this will keep creating more conflicts.

We can do all that here, or in a separate branch as suggested above, which would be IMHO much smoother

@thomasjpfan
Copy link
Member Author

Was there a discussion on moving to the 88 line length limit somewhere? (it is black's default)

I'll begin work on the separate branch this weekend (now 😅).

@NicolasHug
Copy link
Member

Was there a discussion on moving to the 88 line length limit somewhere? (it is black's default)

Yes, but I think people ended up being fine with it. IMHO using black with a smaller line length would cause even more vertical space being used, so I would stick to the default

@thomasjpfan thomasjpfan force-pushed the black_formating branch 2 times, most recently from 160f7c8 to 1db386c Compare June 14, 2021 14:00
@thomasjpfan
Copy link
Member Author

I updated this PR to build on top of #20260

This PR does two things:

  1. Enables black check in CI.
  2. Runs black .

@thomasjpfan
Copy link
Member Author

BTW @thomasjpfan maybe we should have used extend-ignore in flake8 config instead of re-listing the default exceptions?

I prefer the way we have it now because it allows us to write the human readable for each ignored violation. Secondly, the flake8 default may change with different versions of flake8. By explicitly setting them, we have more control over ignored violations.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the only commit here is applying black + enabling CI once #20260 is merged LTGM.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as @rth. @thomasjpfan feel free to update the PR and merge them in the right order.

@thomasjpfan thomasjpfan merged commit 82df489 into scikit-learn:main Jun 17, 2021
@ogrisel
Copy link
Member

ogrisel commented Jun 17, 2021

Thanks @thomasjpfan!

@rth
Copy link
Member

rth commented Jun 17, 2021

Glad that this happened!

@rth rth mentioned this pull request Jun 17, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jul 23, 2021
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
ivannz added a commit to ivannz/scikit-learn that referenced this pull request May 15, 2022
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 14, 2022
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 29, 2022
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Sep 5, 2022
ensure SVDD passes numpydoc validation (scikit-learn#20463)
check for svdd in `test_sparse.py:check_svm_model_equal` to avoid calling `.predict_proba`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants