DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. #30339

AhmedThahir · 2024-11-23T19:46:07Z

Describe the bug

From my understanding, currently there is no way to minimize the MAE (Mean Absolute Error). Quantile regression with quantile=0.5 will optimize for the Median Absolute Error. This would be different from optimizing the MAE when the conditional distribution of the response variable is not symmetrically-distributed.

scikit-learn/sklearn/_loss/loss.py

Lines 574 to 577 in 46a7c9a

    
           if sample_weight is None: 
        
               return np.median(y_true, axis=0) 
        
           else: 
        
               return _weighted_percentile(y_true, sample_weight, 50)

What I expect

Using HistGradientBoostingRegressor(loss="absolute_error") should optimize for the mean of absolute errors.
Using HistGradientBoostingRegressor(loss="quantile", quantile=0.5) should optimize for the median of absolute errors.

        if sample_weight is None:
            return np.mean(y_true, axis=0)
        else:
            return _weighted_mean(y_true, sample_weight)

What happens
Both give the same results

Using HistGradientBoostingRegressor(loss="absolute_error") optimizes for the median of absolute errors
Using HistGradientBoostingRegressor(loss="quantile", quantile=0.5) optimizes for the median of absolute errors

Suggested Actions

If this is intended behavior:

Feel free to close this issue marked as resolved.
Kindly add a note in the documentation that "Absolute Error optimizes for Median Absolute Error, not Mean Absolute Error" as "absolute_error" is not very clear.
I would appreciate if there was more explanation regarding on using custom loss functions Custom Loss function? #21614. This way, we could optimize for Mean Absolute Error, Median Absolute Error, Log Cosh, etc. as per the requirement.

Note
I have tried my best to go through the documentation prior to creating this issue. I am a fresh graduate in Computer Science, and if you believe this issue is not well-framed due to a misunderstanding of my concepts, kindly advise me and I'll work on it.

Steps/Code to Reproduce

# Imports
from sklearn.ensemble import HistGradientBoostingRegressor
import numpy as np

# Dataset Generation
x = np.linspace(start=0, stop=10, num=100)

n_repeat = 100 # no of x for each x
X = np.repeat(x, n_repeat)[:, np.newaxis]
y_true_mean = 1 * np.repeat(x, n_repeat)
noise = np.random.RandomState(0).lognormal(mean=0, sigma=1, size=y_true_mean.shape[0])
y_noisy = y_true_mean + noise

# Model Creation
mae = HistGradientBoostingRegressor(loss="absolute_error") # should be mean of absolute errors
quantile = HistGradientBoostingRegressor(loss="quantile", quantile=0.5) # should be median of absolute errors

# Fit & Prediction
y_pred_mae = mae.fit(X, y_noisy).predict(X)
y_pred_quantile = quantile.fit(X, y_noisy).predict(X)

# Prediction Comparison
print((y_pred_mae - y_pred_quantile).sum()) # both give same results

Expected Results

Median and mean of absolute errors should give different results for a log-normally distributed response. Hence, the predictions should be different from each other, and the difference of their predictions, should total as a non-zero value.

Actual Results

Predictions by both models are the same, which can be seen in the difference of their predictions, totaling as 0.

0.

Versions

System:
    python: 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
executable: /usr/bin/python3
   machine: Linux-6.1.85+-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.5.2
          pip: 24.1.2
   setuptools: 75.1.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.11
       pandas: 2.2.2
   matplotlib: 3.8.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 2
         prefix: libopenblas
       filepath: /usr/local/lib/python3.10/dist-packages/scipy.libs/libopenblasp-r0-01191904.3.27.so
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 2
         prefix: libgomp
       filepath: /usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-11-24T10:16:17Z

The part of the code that you are pointing out is the computation of the intercept. The estimator of a model minimizing the mean absolute error is the median. So the computation of the intercept looks logical to me.

AhmedThahir · 2024-11-25T19:25:59Z

From my understanding, optimizing Mean Absolute Error and Median Absolute Error should give different results, but I believe that either I have not been able to communicate my question clearly, or have I not provided sufficient backing for my question, or I have confused the code for the intercept above.

Any resources would be much appreciated.

I would appreciate if there was more explanation regarding on using custom loss functions Custom Loss function? #21614. This way, we could optimize for Mean Absolute Error, Median Absolute Error, Log Cosh, etc. as per the requirement.

Aside from that, I will try my best to research further and formulate a better question in the future. You may close this issue. Thank you!

ogrisel · 2024-12-02T17:38:34Z

The median is the minimizer of the mean absolute error. Let's sample same data distributed in such a way that the mean and the median are different:

import numpy as np

data = np.random.lognormal(mean=0, sigma=1, size=1000)
np.median(data).round(4), np.mean(data).round(4)

(np.float64(0.9989), np.float64(1.6216))

Let's find the minimizer of MAE using scipy.optimize:

from scipy.optimize import fmin


def mae(x, data):
    return np.mean(np.abs(data - x))

x = np.zeros(shape=1, dtype=np.float64)
fmin(mae, x, args=(data,), disp=False)

array([0.99875])

ogrisel · 2024-12-02T17:40:50Z

Using HistGradientBoostingRegressor(loss="absolute_error") optimizes for the median of absolute errors

No, HistGradientBoostingRegressor(loss="absolute_error") minimizes the sum or mean absolute error of the conditional estimates (as does the quantile loss with q=0.5). Both choices lead to an estimate of the conditional median of Y|X as a result.

ogrisel · 2024-12-02T17:44:45Z

Please point us to the specific part of the documentation that was not clear and suggest a way to improve it. Otherwise I think we can close.

AhmedThahir · 2024-12-02T20:59:38Z

The median is then minimizer of the mean absolute error.

Noted. You are correct; I replicated your code, thanks!

AhmedThahir · 2024-12-02T21:01:26Z

No, HistGradientBoostingRegressor(loss="absolute_error") minimizes the (conditional) mean absolute error (as does the quantile loss with q=0.5) and both estimate the conditional median of Y|X as a result.

But code for quantile = 0.5 (ie, median) gives me a different result from that of mean; this is my point:

def quantileae(x, data):
    return np.quantile(np.abs(data - x), 0.5)

x = np.zeros(shape=1, dtype=np.float64)
fmin(quantileae, x, args=(data,), disp=False)

array([0.58675])

def medianae(x, data):
    return np.median(np.abs(data - x))

x = np.zeros(shape=1, dtype=np.float64)
fmin(medianae, x, args=(data,), disp=False)

array([0.58675])

AhmedThahir · 2024-12-02T21:06:53Z

Please point us to the specific part of the documentation that was not clear and suggest a way to improve it. Otherwise I think we can close.

Problem
The term "squared_error" and "absolute_error" is not very clear: ie, is it Mean Squared Error or Median Squared Error; Mean Absolute Error or Median Absolute Error. The documentation should make it clear that it is Mean Squared Error and Mean Absolute Error.

Specific part of the documentation
https://scikit-learn.org/1.5/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html

loss{‘squared_error’, ‘absolute_error’, ‘gamma’, ‘poisson’, ‘quantile’}, default=’squared_error’
The loss function to use in the boosting process. Note that the “squared error”, “gamma” and “poisson” losses actually implement “half least squares loss”, “half gamma deviance” and “half poisson deviance” to simplify the computation of the gradient. Furthermore, “gamma” and “poisson” losses internally use a log-link, “gamma” requires y > 0 and “poisson” requires y >= 0. “quantile” uses the pinball loss.

Suggestion

The loss function to use in the boosting process. Note that the “squared error”, “gamma” and “poisson” losses actually implement “half least squares loss”, “half gamma deviance” and “half poisson deviance” to simplify the computation of the gradient. Furthermore, “gamma” and “poisson” losses internally use a log-link, “gamma” requires y > 0 and “poisson” requires y >= 0. “quantile” uses the pinball loss. “absolute_error” refers to Mean Absolute Error; for other variants of absolute error, you need to write your own loss function, “squared_error” refers to Mean Squared Error; for other variants of squared error, you need to write your own loss function

ogrisel · 2024-12-03T09:46:06Z

But code for quantile = 0.5 (ie, median) gives me a different result from that of mean; this is my point

The quantile loss of HistGradientBoostingRegressor is not equivalent to the code you wrote above. The quantile loss is also known as the pinball loss, and it is documented here:

https://scikit-learn.org/stable/modules/model_evaluation.html#pinball-loss

Here is the equivalent Python code:

def quantile_loss(x, data, q=0.5):
    return np.mean(np.maximum(q * (data - x), (q - 1) * (data - x)))

x = np.zeros(shape=1, dtype=np.float64)
fmin(quantile_loss, x, args=(data,), disp=False)

array([0.99875])

What we could do is expand the following section:

https://scikit-learn.org/stable/modules/ensemble.html#loss-functions

to link to the matching metrics function from https://scikit-learn.org/stable/modules/model_evaluation.html.

To make that more explicit and reference this section from the docstring.

ogrisel · 2024-12-03T10:20:17Z

“absolute_error” refers to Mean Absolute Error; for other variants of absolute error, you need to write your own loss function, “squared_error” refers to Mean Squared Error; for other variants of squared error, you need to write your own loss function

I would not write that: for all loss functions, it's always computed as the sum (or equivalently the mean) of the individual loss function computed on each data point in the training set: this is the case for absolute and squared error but also for quantile, Poisson, gamma... Furthermore, we should not advertise for custom loss functions in the doc as there is no official public API to do so in scikit-learn.

AhmedThahir · 2024-12-03T10:35:21Z

we should not advertise for custom loss functions in the doc as there is no official public API to do so in scikit-learn.

Noted. That is right.

AhmedThahir · 2024-12-03T10:38:42Z

for all loss functions, it's always computed as the sum (or equivalently the mean) of the individual loss function computed on each data point in the training set

This is the point I'm not able to understand. Why is that so: I'm assuming this is implied or an unspoken rule in Machine Learning that mean is the standard aggregate function for loss functions, is it?

So far, I've always specified mean/median whenever I mention a loss function as I was not aware of such. If you could share any resource that I could read, I would appreciate it.

glemaitre · 2024-12-03T10:45:12Z

This is the point I'm not able to understand. Why is that so: I'm assuming this is implied or an unspoken rule in Machine Learning that mean is the standard aggregate function for loss functions, is it?

You probably want to refer to:
https://en.wikipedia.org/wiki/Empirical_risk_minimization

You are interested in minimizing the risk that is defined as the expectation of the loss function and thus the sum aggregate.

AhmedThahir added Bug Needs Triage Issue requires triage labels Nov 23, 2024

AhmedThahir changed the title ~~How to optimize for Mean Absolute Error~~ Absolute Error optimizing for Median Absolute Error, not Mean Absolute Error Nov 23, 2024

ogrisel changed the title ~~Absolute Error optimizing for Median Absolute Error, not Mean Absolute Error~~ DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. Dec 3, 2024

ogrisel added Documentation and removed Bug Needs Triage Issue requires triage labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. #30339

DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. #30339

AhmedThahir commented Nov 23, 2024 •

edited

Loading

glemaitre commented Nov 24, 2024

AhmedThahir commented Nov 25, 2024 •

edited

Loading

ogrisel commented Dec 2, 2024 •

edited

Loading

ogrisel commented Dec 2, 2024 •

edited

Loading

ogrisel commented Dec 2, 2024

AhmedThahir commented Dec 2, 2024

AhmedThahir commented Dec 2, 2024 •

edited

Loading

AhmedThahir commented Dec 2, 2024 •

edited

Loading

ogrisel commented Dec 3, 2024 •

edited

Loading

ogrisel commented Dec 3, 2024 •

edited

Loading

AhmedThahir commented Dec 3, 2024

AhmedThahir commented Dec 3, 2024

glemaitre commented Dec 3, 2024

DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. #30339

DOC: clarify the documentation for the loss functions used in GBRT, and Absolute Error in particular. #30339

Comments

AhmedThahir commented Nov 23, 2024 • edited Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Nov 24, 2024

AhmedThahir commented Nov 25, 2024 • edited Loading

ogrisel commented Dec 2, 2024 • edited Loading

ogrisel commented Dec 2, 2024 • edited Loading

ogrisel commented Dec 2, 2024

AhmedThahir commented Dec 2, 2024

AhmedThahir commented Dec 2, 2024 • edited Loading

AhmedThahir commented Dec 2, 2024 • edited Loading

ogrisel commented Dec 3, 2024 • edited Loading

ogrisel commented Dec 3, 2024 • edited Loading

AhmedThahir commented Dec 3, 2024

AhmedThahir commented Dec 3, 2024

glemaitre commented Dec 3, 2024

AhmedThahir commented Nov 23, 2024 •

edited

Loading

AhmedThahir commented Nov 25, 2024 •

edited

Loading

ogrisel commented Dec 2, 2024 •

edited

Loading

ogrisel commented Dec 2, 2024 •

edited

Loading

AhmedThahir commented Dec 2, 2024 •

edited

Loading

AhmedThahir commented Dec 2, 2024 •

edited

Loading

ogrisel commented Dec 3, 2024 •

edited

Loading

ogrisel commented Dec 3, 2024 •

edited

Loading