Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF float32 propagation in GaussianMixture #30415

Merged
merged 13 commits into from
Jan 17, 2025

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Dec 5, 2024

This is a draft PR to explore how much work is need to ensure floating-point precision propagation in GaussianMixture as part of the discussion in #30382.

There are still some failing tests to fix.

Copy link

github-actions bot commented Dec 5, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 8fd0b71. Link to the linter CI: here

@ogrisel ogrisel marked this pull request as draft December 5, 2024 17:53
@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jan 8, 2025

@ogrisel I added a small adjustment and the float32 tests seem to pass now. Do we have any way to invoke them in the CI as well?
Edit: I also updated two tests and adjusted the tolerance which checks for normalized weights because in the test test_gaussian_mixture_fit_predict it was failing due to the weights summing to around 1.0000001

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

Do we have any way to invoke them in the CI as well?

Yes, by pushing a commit with the [float32] marker in the commit message:

https://scikit-learn.org/stable/developers/contributing.html#commit-message-markers

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jan 8, 2025

Yes, by pushing a commit with the [float32] marker in the commit message:

https://scikit-learn.org/stable/developers/contributing.html#commit-message-markers

Okay nice let me see if the CI passes first before trying out float32 because for the initial pipelines the np.log did not accept a dtype argument.

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

I confirm that the tests now pass locally. Let me add a changelog entry.

@ogrisel ogrisel marked this pull request as ready for review January 8, 2025 10:58
@ogrisel ogrisel changed the title WIP float32 propagation in GaussianMixture PERF float32 propagation in GaussianMixture Jan 8, 2025
@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

Let me do a quick benchmark to see the impact of this change.

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

On my Apple M1 with accelerate (mamba install "libblas=*=*accelerate") I get the following numbers:

  • on this branch: 7.8 s with a 1 GB memory usage peak
  • on main: 23.1 s with a 2 GB memory usage peak

So a nice ~3X speed up and a 2X memory efficiency improvement.

Benchmark script:

from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
import numpy as np
from time import perf_counter

X, y = make_blobs(n_samples=int(1e6), n_features=50, centers=10, random_state=0)
X = X.astype(np.float32)

tic = perf_counter()
GaussianMixture(n_components=10, covariance_type="full", random_state=0).fit(X)
toc = perf_counter()
print(f"Elapsed time: {toc - tic:.1f} s")

I used the scalene profiler to report memory usage (scalene bench_gmm.py with the script above to reproduce).

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

I also tried with OpenBLAS (mamba install "libblas=*=*openblas") and got a 1.9x speed up instead of nearly 3x. I suspect that Accelerate can use the Apple M1 GPU for some operations when working with float32 data while OpenBLAS is limited to the CPU vector instruction set.

Memory usage is the same for OpenBLAS and Accelerate.

@ogrisel ogrisel added the float32 Issues related to support for 32bit data label Jan 8, 2025
@OmarManzoor
Copy link
Contributor

I also tried with OpenBLAS (mamba install "libblas=*=*openblas") and got a 1.9x speed up instead of nearly 3x. I suspect that Accelerate can use the Apple M1 GPU for some operations when working with float32 data while OpenBLAS is limited to the CPU vector instruction set.

Even then there is a nice speedup.

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

Once this is merged, I think this estimator would be a nice candidate for array API support.

@mekleo
Copy link
Contributor

mekleo commented Jan 8, 2025

On my Apple M1 with accelerate (mamba install "libblas=*=*accelerate") I get the following numbers:

  • on this branch: 7.8 s with a 1 GB memory usage peak
  • on main: 23.1 s with a 2 GB memory usage peak

So a nice ~3X speed up and a 2X memory efficiency improvement.

Benchmark script:

from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
import numpy as np
from time import perf_counter

X, y = make_blobs(n_samples=int(1e6), n_features=50, centers=10, random_state=0)
X = X.astype(np.float32)

tic = perf_counter()
GaussianMixture(n_components=10, covariance_type="full", random_state=0).fit(X)
toc = perf_counter()
print(f"Elapsed time: {toc - tic:.1f} s")

I used the scalene profiler to report memory usage (scalene bench_gmm.py with the script above to reproduce).

Great work! Beyond performance enhancement, in your tests, do you obtain identical results after downcasting X to np.float32?

@ogrisel
Copy link
Member Author

ogrisel commented Jan 8, 2025

Great work! Beyond performance enhancement, in your tests, do you obtain identical results after downcasting X to np.float32?

The existing tests still pass when we run them with float32 (up to some tolerance level that can depend on the choice of the dtype).

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. A few little comments.

One question: is there a unit test that checks that the dtype of gmm.predict(X) matches the dtype of X or should it match the dtype of X that was used for fitting? Or do we not need that kind of test for some reason?

@ogrisel
Copy link
Member Author

ogrisel commented Jan 13, 2025

One question: is there a unit test that checks that the dtype of gmm.predict(X) matches the dtype of X or should it match the dtype of X that was used for fitting?

gmm.predict(X) returns the integer index of the component with the highest likelihood. So it is unrelated to the dtype of X.

@ogrisel ogrisel added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 15, 2025
@@ -120,15 +120,15 @@ def _initialize_parameters(self, X, random_state):
resp[np.arange(n_samples), label] = 1
elif self.init_params == "random":
resp = random_state.uniform(size=(n_samples, self.n_components))
resp /= resp.sum(axis=1)[:, np.newaxis]
resp /= resp.sum(axis=1)[:, np.newaxis].astype(X.dtype)
Copy link
Member

@lesteve lesteve Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it matters that much, but it seems like resp is always going to be float64 (apparently .uniform does not let you specify a dtype). In particular the .astype(X.dtype) is not going to affect resp.dtype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! TIL. If resp.dtype is not float32 then dividing by a float32 won't change the dtype of it :-/

I guess we'd have to add a resp = resp.astype(X.dtype) and maybe can remove the astype from resp /= resp.sum(axis=1)[:, np.newaxis].astype(X.dtype)?

Copy link
Contributor

@OmarManzoor OmarManzoor Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this was a valid point. Thanks for catching it @lesteve
I updated this and also added global_dtype in a test that uses this particular initialization.

@ogrisel
Copy link
Member Author

ogrisel commented Jan 16, 2025

Thanks, @lesteve and @OmarManzoor for the latest round of fixes / testing.

Comment on lines 122 to 124
resp = random_state.uniform(size=(n_samples, self.n_components)).astype(
X.dtype
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this makes a big difference but we may as well avoid a copy if X.dtype is float64 (which I guess is the most common use case)?

Suggested change
resp = random_state.uniform(size=(n_samples, self.n_components)).astype(
X.dtype
)
resp = np.asarray(
random_state.uniform(size=(n_samples, self.n_components)), dtype=X.dtype
)

@lesteve lesteve enabled auto-merge (squash) January 17, 2025 08:25
Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have enabled auto-merge!

@lesteve lesteve merged commit c9f9b04 into scikit-learn:main Jan 17, 2025
29 checks passed
@ogrisel ogrisel deleted the gmm-float32 branch January 20, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
float32 Issues related to support for 32bit data module:mixture Performance Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants