PERF float32 propagation in GaussianMixture #30415

ogrisel · 2024-12-05T17:47:23Z

This is a draft PR to explore how much work is need to ensure floating-point precision propagation in GaussianMixture as part of the discussion in #30382.

There are still some failing tests to fix.

github-actions · 2024-12-05T17:48:45Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 8fd0b71. Link to the linter CI: here}

…n_prob

OmarManzoor · 2025-01-08T08:04:20Z

@ogrisel I added a small adjustment and the float32 tests seem to pass now. Do we have any way to invoke them in the CI as well?
Edit: I also updated two tests and adjusted the tolerance which checks for normalized weights because in the test test_gaussian_mixture_fit_predict it was failing due to the weights summing to around 1.0000001

ogrisel · 2025-01-08T10:32:21Z

Do we have any way to invoke them in the CI as well?

Yes, by pushing a commit with the [float32] marker in the commit message:

https://scikit-learn.org/stable/developers/contributing.html#commit-message-markers

OmarManzoor · 2025-01-08T10:34:48Z

Yes, by pushing a commit with the [float32] marker in the commit message:

https://scikit-learn.org/stable/developers/contributing.html#commit-message-markers

Okay nice let me see if the CI passes first before trying out float32 because for the initial pipelines the np.log did not accept a dtype argument.

ogrisel · 2025-01-08T10:45:13Z

I confirm that the tests now pass locally. Let me add a changelog entry.

ogrisel · 2025-01-08T10:59:11Z

Let me do a quick benchmark to see the impact of this change.

ogrisel · 2025-01-08T11:14:39Z

On my Apple M1 with accelerate (mamba install "libblas=*=*accelerate") I get the following numbers:

on this branch: 7.8 s with a 1 GB memory usage peak
on main: 23.1 s with a 2 GB memory usage peak

So a nice ~3X speed up and a 2X memory efficiency improvement.

Benchmark script:

from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
import numpy as np
from time import perf_counter

X, y = make_blobs(n_samples=int(1e6), n_features=50, centers=10, random_state=0)
X = X.astype(np.float32)

tic = perf_counter()
GaussianMixture(n_components=10, covariance_type="full", random_state=0).fit(X)
toc = perf_counter()
print(f"Elapsed time: {toc - tic:.1f} s")

I used the scalene profiler to report memory usage (scalene bench_gmm.py with the script above to reproduce).

ogrisel · 2025-01-08T11:24:51Z

I also tried with OpenBLAS (mamba install "libblas=*=*openblas") and got a 1.9x speed up instead of nearly 3x. I suspect that Accelerate can use the Apple M1 GPU for some operations when working with float32 data while OpenBLAS is limited to the CPU vector instruction set.

Memory usage is the same for OpenBLAS and Accelerate.

OmarManzoor · 2025-01-08T11:26:50Z

I also tried with OpenBLAS (mamba install "libblas=*=*openblas") and got a 1.9x speed up instead of nearly 3x. I suspect that Accelerate can use the Apple M1 GPU for some operations when working with float32 data while OpenBLAS is limited to the CPU vector instruction set.

Even then there is a nice speedup.

ogrisel · 2025-01-08T11:28:15Z

Once this is merged, I think this estimator would be a nice candidate for array API support.

mekleo · 2025-01-08T12:38:11Z

On my Apple M1 with accelerate (mamba install "libblas=*=*accelerate") I get the following numbers:

on this branch: 7.8 s with a 1 GB memory usage peak

on main: 23.1 s with a 2 GB memory usage peak

So a nice ~3X speed up and a 2X memory efficiency improvement.

Benchmark script:
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
import numpy as np
from time import perf_counter

X, y = make_blobs(n_samples=int(1e6), n_features=50, centers=10, random_state=0)
X = X.astype(np.float32)

tic = perf_counter()
GaussianMixture(n_components=10, covariance_type="full", random_state=0).fit(X)
toc = perf_counter()
print(f"Elapsed time: {toc - tic:.1f} s")
I used the scalene profiler to report memory usage (scalene bench_gmm.py with the script above to reproduce).

Great work! Beyond performance enhancement, in your tests, do you obtain identical results after downcasting X to np.float32?

ogrisel · 2025-01-08T13:16:33Z

Great work! Beyond performance enhancement, in your tests, do you obtain identical results after downcasting X to np.float32?

The existing tests still pass when we run them with float32 (up to some tolerance level that can depend on the choice of the dtype).

sklearn/mixture/_gaussian_mixture.py

betatim

Looks reasonable to me. A few little comments.

One question: is there a unit test that checks that the dtype of gmm.predict(X) matches the dtype of X or should it match the dtype of X that was used for fitting? Or do we not need that kind of test for some reason?

ogrisel · 2025-01-13T17:24:38Z

One question: is there a unit test that checks that the dtype of gmm.predict(X) matches the dtype of X or should it match the dtype of X that was used for fitting?

gmm.predict(X) returns the integer index of the component with the highest likelihood. So it is unrelated to the dtype of X.

lesteve · 2025-01-15T14:29:35Z

sklearn/mixture/_base.py

@@ -120,15 +120,15 @@ def _initialize_parameters(self, X, random_state):
            resp[np.arange(n_samples), label] = 1
        elif self.init_params == "random":
            resp = random_state.uniform(size=(n_samples, self.n_components))
-            resp /= resp.sum(axis=1)[:, np.newaxis]
+            resp /= resp.sum(axis=1)[:, np.newaxis].astype(X.dtype)


I am not sure it matters that much, but it seems like resp is always going to be float64 (apparently .uniform does not let you specify a dtype). In particular the .astype(X.dtype) is not going to affect resp.dtype?

Indeed! TIL. If resp.dtype is not float32 then dividing by a float32 won't change the dtype of it :-/

I guess we'd have to add a resp = resp.astype(X.dtype) and maybe can remove the astype from resp /= resp.sum(axis=1)[:, np.newaxis].astype(X.dtype)?

Yes this was a valid point. Thanks for catching it @lesteve
I updated this and also added global_dtype in a test that uses this particular initialization.

doc/whats_new/upcoming_changes/sklearn.mixture/30415.efficiency.rst

test_means_for_all_inits

ogrisel · 2025-01-16T14:44:03Z

Thanks, @lesteve and @OmarManzoor for the latest round of fixes / testing.

lesteve · 2025-01-17T07:52:29Z

sklearn/mixture/_base.py

+            resp = random_state.uniform(size=(n_samples, self.n_components)).astype(
+                X.dtype
+            )


I am not sure this makes a big difference but we may as well avoid a copy if X.dtype is float64 (which I guess is the most common use case)?

Suggested change

resp = random_state.uniform(size=(n_samples, self.n_components)).astype(

X.dtype

)

resp = np.asarray(

random_state.uniform(size=(n_samples, self.n_components)), dtype=X.dtype

)

lesteve

Thanks, I have enabled auto-merge!

WIP float32 propagation in GaussianMixture

44b5599

github-actions bot added the module:mixture label Dec 5, 2024

ogrisel marked this pull request as draft December 5, 2024 17:53

ogrisel added No Changelog Needed and removed No Changelog Needed labels Dec 9, 2024

Add a conversion to correct dtype at the end of _estimate_log_gaussia…

1418f29

…n_prob

Use astype instead of dtype and update two tests

f4dc908

Changelog entry [float32]

deb2a3a

ogrisel marked this pull request as ready for review January 8, 2025 10:58

ogrisel changed the title ~~WIP float32 propagation in GaussianMixture~~ PERF float32 propagation in GaussianMixture Jan 8, 2025

ogrisel added 2 commits January 8, 2025 12:06

Merge branch 'main' into gmm-float32

66f7970

Trigger CI for [float32]

58237e5

ogrisel added the Performance label Jan 8, 2025

ogrisel mentioned this pull request Jan 8, 2025

Gaussian Mixture: Diagonal covariance vectors might contain unreasonably negative values when the input datatype is np.float32 #30382

Open

ogrisel added the float32 Issues related to support for 32bit data label Jan 8, 2025

ogrisel mentioned this pull request Jan 8, 2025

PERF Avoid repetitively allocating large temporary arrays when fitting GaussianMixture #30614

Draft

4 tasks

betatim reviewed Jan 13, 2025

View reviewed changes

sklearn/mixture/_gaussian_mixture.py Outdated Show resolved Hide resolved

betatim reviewed Jan 13, 2025

View reviewed changes

sklearn/mixture/_gaussian_mixture.py Outdated Show resolved Hide resolved

betatim approved these changes Jan 13, 2025

View reviewed changes

ogrisel added 2 commits January 13, 2025 18:27

Simpler code via np.empty_like

100a7c1

Make np.allclose condition depend on weights.dtype

df0f6f6

ogrisel added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 15, 2025

lesteve reviewed Jan 15, 2025

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.mixture/30415.efficiency.rst Outdated Show resolved Hide resolved

lesteve reviewed Jan 15, 2025

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.mixture/30415.efficiency.rst Outdated Show resolved Hide resolved

OmarManzoor added 4 commits January 16, 2025 16:56

Address PR suggestions and fix the dtype in 'random' init

50b39f5

Trigger CI [float32] [all random seeds]

f71aabe

test_means_for_all_inits

Merge branch 'main' into gmm-float32

5f3be4c

Trigger CI [float32]

a4183ff

lesteve reviewed Jan 17, 2025

View reviewed changes

Use np.asarray to avoid copy when dtype is float64 [float32]

8fd0b71

lesteve enabled auto-merge (squash) January 17, 2025 08:25

lesteve approved these changes Jan 17, 2025

View reviewed changes

lesteve merged commit c9f9b04 into scikit-learn:main Jan 17, 2025
29 checks passed

ogrisel deleted the gmm-float32 branch January 20, 2025 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF float32 propagation in GaussianMixture #30415

PERF float32 propagation in GaussianMixture #30415

ogrisel commented Dec 5, 2024

github-actions bot commented Dec 5, 2024 •

edited

Loading

OmarManzoor commented Jan 8, 2025 •

edited

Loading

ogrisel commented Jan 8, 2025

OmarManzoor commented Jan 8, 2025 •

edited

Loading

ogrisel commented Jan 8, 2025

ogrisel commented Jan 8, 2025

ogrisel commented Jan 8, 2025 •

edited

Loading

ogrisel commented Jan 8, 2025 •

edited

Loading

OmarManzoor commented Jan 8, 2025

ogrisel commented Jan 8, 2025

mekleo commented Jan 8, 2025

ogrisel commented Jan 8, 2025

betatim left a comment

ogrisel commented Jan 13, 2025

lesteve Jan 15, 2025 •

edited

Loading

betatim Jan 16, 2025

OmarManzoor Jan 16, 2025 •

edited

Loading

ogrisel commented Jan 16, 2025

lesteve Jan 17, 2025

lesteve left a comment

-            resp = random_state.uniform(size=(n_samples, self.n_components)).astype(
-                X.dtype
-            )
+            resp = np.asarray(
+                random_state.uniform(size=(n_samples, self.n_components)), dtype=X.dtype
+            )

PERF float32 propagation in GaussianMixture #30415

PERF float32 propagation in GaussianMixture #30415

Conversation

ogrisel commented Dec 5, 2024

github-actions bot commented Dec 5, 2024 • edited Loading

✔️ Linting Passed

OmarManzoor commented Jan 8, 2025 • edited Loading

ogrisel commented Jan 8, 2025

OmarManzoor commented Jan 8, 2025 • edited Loading

ogrisel commented Jan 8, 2025

ogrisel commented Jan 8, 2025

ogrisel commented Jan 8, 2025 • edited Loading

ogrisel commented Jan 8, 2025 • edited Loading

OmarManzoor commented Jan 8, 2025

ogrisel commented Jan 8, 2025

mekleo commented Jan 8, 2025

ogrisel commented Jan 8, 2025

betatim left a comment

Choose a reason for hiding this comment

ogrisel commented Jan 13, 2025

lesteve Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

betatim Jan 16, 2025

Choose a reason for hiding this comment

OmarManzoor Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

ogrisel commented Jan 16, 2025

lesteve Jan 17, 2025

Choose a reason for hiding this comment

lesteve left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 5, 2024 •

edited

Loading

OmarManzoor commented Jan 8, 2025 •

edited

Loading

OmarManzoor commented Jan 8, 2025 •

edited

Loading

ogrisel commented Jan 8, 2025 •

edited

Loading

ogrisel commented Jan 8, 2025 •

edited

Loading

lesteve Jan 15, 2025 •

edited

Loading

OmarManzoor Jan 16, 2025 •

edited

Loading