Skip to content

Conversation

@glemaitre
Copy link
Member

partially addresses #27662

Avoid to use np.asarray that creates a reference to the memory view and seems to not be garbage collected in PyPy.

@github-actions
Copy link

github-actions bot commented Oct 26, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 32e9f17. Link to the linter CI: here

@lesteve
Copy link
Member

lesteve commented Oct 26, 2023

PyPy tests go until 98%, almost there!

@betatim
Copy link
Member

betatim commented Oct 27, 2023

Is the plan to merge this even before we get to 100%? From a quick look at the diff it looks uncontroversial. I think technically it changes the API of the loss functions (they now return None) but I think they are internal so it isn't something we need to worry about.

@lorentzenchr do you want to have a look and think about this (I think you wrote a lot of this code?)?

@lorentzenchr
Copy link
Member

We can remove those return statements, I guess the only use is testing and the Python loss.py submodule.
I had them there because I don’t like Fortran style coding and it made using those functions more Pythonic.

The docstrings need to be modified, too.

@betatim
Copy link
Member

betatim commented Oct 30, 2023

I assume the fortran style is needed to make sure we re-use the same array/memory each time and results in a significant performance improvement. Is that the right assumption? I don't know enough about the history of this code and the discussions around it, so if this has been discussed before: I don't want to re-start an old discussion, just check that someone thought about this topic. Because I agree having a more pythonic API would be nicer.

@lorentzenchr
Copy link
Member

I assume the fortran style is needed to make sure we re-use the same array/memory each time and results in a significant performance improvement.

We want to achieve ufunc like behaviour, e.g. np.add(a, b, out=c) and provide array c instead of creating it each time (c can be element-wise loss or a gradient). This saves memory and the time to allocate that memory each time. It also suits perfectly in our fitting alogs, where we usually only need the current gradient (and sometimes the previous one, but not more).

@glemaitre
Copy link
Member Author

This saves memory and the time to allocate that memory each time

I also like the fact that we allocate the memory via NumPy in Python usually and just interact with it. We don't have to free the buffer ourselves.

@lesteve lesteve merged commit a5fed0d into scikit-learn:main Oct 31, 2023
@lesteve
Copy link
Member

lesteve commented Oct 31, 2023

Merging, thanks! Let's see if the PyPy build manages to complete from time to time with this fix 🤞 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants