Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC add LaTeX to various linear models #30322

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

virchan
Copy link
Contributor

@virchan virchan commented Nov 21, 2024

Reference Issues/PRs

Towards the Documentation Improvement Project.

What does this implement/fix? Explain your changes.

This PR enhances the doc-strings of the following linear model estimators by adding LaTeX-formatted equations:

  • Lasso
  • LassoCV
  • Ridge
  • ElasticNet
  • ElasticNetCV
  • MultiTaskElasticNet
  • MultiTaskElasticNetCV

For example, once merged, the HTML documentation for ElasticNet would render as follows:

demo

and the enet_path function would appear as

demo2

These improvements aim to make the documentation more user-friendly and accessible, whether viewed in the HTML documentation or directly in the source code.

Any other comments?

C.c. @adrinjalali, @glemaitre in advance.

Copy link

github-actions bot commented Nov 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fc34ae3. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also wonder what @lucyleeow thinks here.

Comment on lines +236 to +242
.. math::
\\frac{1}{2n_{\\operatorname{samples}}}
\\vert \\vert Y- XW \\vert \\vert^2_F +
\\alpha \\vert \\vert W \\vert \\vert_{2,1}

where :math:`\\vert\\vert W \\vert\\vert_F` is the Frobenius norm of :math:`W`,
and::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that these are basically rendered twice. Ideally we'd have only one version of them, not both, but I do see that it's less readable in latex format unrendered.

Maybe a good solution would be to rst-commen (with .. ) the "code form" and only have the latex form rendered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that this is a long standing debate. I was always more on the side of no-latex because I don't find it readable when looking at my IDE docstring. I don't know if modern IDE, is actually translating latex to HTML view nowadays?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that I don't think it needs to be here twice. Less opinionated on whether its the code or latex one that we should have.

As @glemaitre said, I guess one view is for no latex in docstring, and have this in stuff in the user guide instead?

@adrinjalali
Copy link
Member

So as consistency, we do have math notations in a few places already in the codebase:

$ git grep -A5 -B5 -p '\.\. math' "sklearn/*.py" > /tmp/log.txt
sklearn/cluster/_agglomerative.py=def ward_tree(X, *, connectivity=None, n_clusters=None, return_distance=False):
--
sklearn/cluster/_agglomerative.py-        distance. Distances are updated in the following way
sklearn/cluster/_agglomerative.py-        (from scipy.hierarchy.linkage):
sklearn/cluster/_agglomerative.py-
sklearn/cluster/_agglomerative.py-        The new entry :math:`d(u,v)` is computed as follows,
sklearn/cluster/_agglomerative.py-
sklearn/cluster/_agglomerative.py:        .. math::
sklearn/cluster/_agglomerative.py-
sklearn/cluster/_agglomerative.py-           d(u,v) = \\sqrt{\\frac{|v|+|s|}
sklearn/cluster/_agglomerative.py-                               {T}d(v,s)^2
sklearn/cluster/_agglomerative.py-                        + \\frac{|v|+|t|}
sklearn/cluster/_agglomerative.py-                               {T}d(v,t)^2
--
sklearn/decomposition/_nmf.py=def non_negative_factorization(
--
sklearn/decomposition/_nmf.py-    negative matrix X. This factorization can be used for example for
sklearn/decomposition/_nmf.py-    dimensionality reduction, source separation or topic extraction.
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-    The objective function is:
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py:    .. math::
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-        L(W, H) &= 0.5 * ||X - WH||_{loss}^2
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-                &+ alpha\\_W * l1\\_ratio * n\\_features * ||vec(W)||_1
sklearn/decomposition/_nmf.py-
--
sklearn/decomposition/_nmf.py=class NMF(_BaseNMF):
--
sklearn/decomposition/_nmf.py-    whose product approximates the non-negative matrix X. This factorization can be used
sklearn/decomposition/_nmf.py-    for example for dimensionality reduction, source separation or topic extraction.
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-    The objective function is:
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py:    .. math::
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-        L(W, H) &= 0.5 * ||X - WH||_{loss}^2
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-                &+ alpha\\_W * l1\\_ratio * n\\_features * ||vec(W)||_1
sklearn/decomposition/_nmf.py-
--
sklearn/decomposition/_nmf.py=class MiniBatchNMF(_BaseNMF):
--
sklearn/decomposition/_nmf.py-    factorization can be used for example for dimensionality reduction, source
sklearn/decomposition/_nmf.py-    separation or topic extraction.
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-    The objective function is:
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py:    .. math::
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-        L(W, H) &= 0.5 * ||X - WH||_{loss}^2
sklearn/decomposition/_nmf.py-
sklearn/decomposition/_nmf.py-                &+ alpha\\_W * l1\\_ratio * n\\_features * ||vec(W)||_1
sklearn/decomposition/_nmf.py-
--
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py=class Sum(KernelOperator):
sklearn/gaussian_process/kernels.py-    """The `Sum` kernel takes two kernels :math:`k_1` and :math:`k_2`
sklearn/gaussian_process/kernels.py-    and combines them via
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k_{sum}(X, Y) = k_1(X, Y) + k_2(X, Y)
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Note that the `__add__` magic method is overridden, so
sklearn/gaussian_process/kernels.py-    `Sum(RBF(), RBF())` is equivalent to using the + operator
sklearn/gaussian_process/kernels.py-    with `RBF() + RBF()`.
--
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py=class Product(KernelOperator):
sklearn/gaussian_process/kernels.py-    """The `Product` kernel takes two kernels :math:`k_1` and :math:`k_2`
sklearn/gaussian_process/kernels.py-    and combines them via
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k_{prod}(X, Y) = k_1(X, Y) * k_2(X, Y)
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Note that the `__mul__` magic method is overridden, so
sklearn/gaussian_process/kernels.py-    `Product(RBF(), RBF())` is equivalent to using the * operator
sklearn/gaussian_process/kernels.py-    with `RBF() * RBF()`.
--
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py=class Exponentiation(Kernel):
sklearn/gaussian_process/kernels.py-    """The Exponentiation kernel takes one base kernel and a scalar parameter
sklearn/gaussian_process/kernels.py-    :math:`p` and combines them via
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k_{exp}(X, Y) = k(X, Y) ^p
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Note that the `__pow__` magic method is overridden, so
sklearn/gaussian_process/kernels.py-    `Exponentiation(RBF(), 2)` is equivalent to using the ** operator
sklearn/gaussian_process/kernels.py-    with `RBF() ** 2`.
--
sklearn/gaussian_process/kernels.py=class ConstantKernel(StationaryKernelMixin, GenericKernelMixin, Kernel):
--
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Can be used as part of a product-kernel where it scales the magnitude of
sklearn/gaussian_process/kernels.py-    the other factor (kernel) or as part of a sum-kernel, where it modifies
sklearn/gaussian_process/kernels.py-    the mean of the Gaussian process.
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_1, x_2) = constant\\_value \\;\\forall\\; x_1, x_2
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Adding a constant kernel is equivalent to adding a constant::
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-            kernel = RBF() + ConstantKernel(constant_value=2)
--
sklearn/gaussian_process/kernels.py=class WhiteKernel(StationaryKernelMixin, GenericKernelMixin, Kernel):
--
sklearn/gaussian_process/kernels.py-    The main use-case of this kernel is as part of a sum-kernel where it
sklearn/gaussian_process/kernels.py-    explains the noise of the signal as independently and identically
sklearn/gaussian_process/kernels.py-    normally-distributed. The parameter noise_level equals the variance of this
sklearn/gaussian_process/kernels.py-    noise.
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_1, x_2) = noise\\_level \\text{ if } x_i == x_j \\text{ else } 0
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    Read more in the :ref:`User Guide <gp_kernels>`.
sklearn/gaussian_process/kernels.py-
--
sklearn/gaussian_process/kernels.py=class RBF(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
--
sklearn/gaussian_process/kernels.py-    "squared exponential" kernel. It is parameterized by a length scale
sklearn/gaussian_process/kernels.py-    parameter :math:`l>0`, which can either be a scalar (isotropic variant
sklearn/gaussian_process/kernels.py-    of the kernel) or a vector with the same number of dimensions as the inputs
sklearn/gaussian_process/kernels.py-    X (anisotropic variant of the kernel). The kernel is given by:
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_i, x_j) = \\exp\\left(- \\frac{d(x_i, x_j)^2}{2l^2} \\right)
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    where :math:`l` is the length scale of the kernel and
sklearn/gaussian_process/kernels.py-    :math:`d(\\cdot,\\cdot)` is the Euclidean distance.
sklearn/gaussian_process/kernels.py-    For advice on how to set the length scale parameter, see e.g. [1]_.
--
sklearn/gaussian_process/kernels.py=class Matern(RBF):
--
sklearn/gaussian_process/kernels.py-    :math:`\\nu=1.5` (once differentiable functions)
sklearn/gaussian_process/kernels.py-    and :math:`\\nu=2.5` (twice differentiable functions).
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    The kernel is given by:
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-         k(x_i, x_j) =  \\frac{1}{\\Gamma(\\nu)2^{\\nu-1}}\\Bigg(
sklearn/gaussian_process/kernels.py-         \\frac{\\sqrt{2\\nu}}{l} d(x_i , x_j )
sklearn/gaussian_process/kernels.py-         \\Bigg)^\\nu K_\\nu\\Bigg(
sklearn/gaussian_process/kernels.py-         \\frac{\\sqrt{2\\nu}}{l} d(x_i , x_j )\\Bigg)
sklearn/gaussian_process/kernels.py-
--
sklearn/gaussian_process/kernels.py=class RationalQuadratic(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
--
sklearn/gaussian_process/kernels.py-    parameterized by a length scale parameter :math:`l>0` and a scale
sklearn/gaussian_process/kernels.py-    mixture parameter :math:`\\alpha>0`. Only the isotropic variant
sklearn/gaussian_process/kernels.py-    where length_scale :math:`l` is a scalar is supported at the moment.
sklearn/gaussian_process/kernels.py-    The kernel is given by:
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_i, x_j) = \\left(
sklearn/gaussian_process/kernels.py-        1 + \\frac{d(x_i, x_j)^2 }{ 2\\alpha  l^2}\\right)^{-\\alpha}
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    where :math:`\\alpha` is the scale mixture parameter, :math:`l` is
sklearn/gaussian_process/kernels.py-    the length scale of the kernel and :math:`d(\\cdot,\\cdot)` is the
--
sklearn/gaussian_process/kernels.py=class ExpSineSquared(StationaryKernelMixin, NormalizedKernelMixin, Kernel):
--
sklearn/gaussian_process/kernels.py-    themselves exactly. It is parameterized by a length scale
sklearn/gaussian_process/kernels.py-    parameter :math:`l>0` and a periodicity parameter :math:`p>0`.
sklearn/gaussian_process/kernels.py-    Only the isotropic variant where :math:`l` is a scalar is
sklearn/gaussian_process/kernels.py-    supported at the moment. The kernel is given by:
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_i, x_j) = \text{exp}\left(-
sklearn/gaussian_process/kernels.py-        \frac{ 2\sin^2(\pi d(x_i, x_j)/p) }{ l^ 2} \right)
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    where :math:`l` is the length scale of the kernel, :math:`p` the
sklearn/gaussian_process/kernels.py-    periodicity of the kernel and :math:`d(\cdot,\cdot)` is the
--
sklearn/gaussian_process/kernels.py=class DotProduct(Kernel):
--
sklearn/gaussian_process/kernels.py-    It is parameterized by a parameter sigma_0 :math:`\sigma`
sklearn/gaussian_process/kernels.py-    which controls the inhomogenity of the kernel. For :math:`\sigma_0^2 =0`,
sklearn/gaussian_process/kernels.py-    the kernel is called the homogeneous linear kernel, otherwise
sklearn/gaussian_process/kernels.py-    it is inhomogeneous. The kernel is given by
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py:    .. math::
sklearn/gaussian_process/kernels.py-        k(x_i, x_j) = \sigma_0 ^ 2 + x_i \cdot x_j
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    The DotProduct kernel is commonly combined with exponentiation.
sklearn/gaussian_process/kernels.py-
sklearn/gaussian_process/kernels.py-    See [1]_, Chapter 4, Section 4.2, for further details regarding the
--
sklearn/manifold/_t_sne.py=def trustworthiness(X, X_embedded, *, n_neighbors=5, metric="euclidean"):
sklearn/manifold/_t_sne.py-    r"""Indicate to what extent the local structure is retained.
sklearn/manifold/_t_sne.py-
sklearn/manifold/_t_sne.py-    The trustworthiness is within [0, 1]. It is defined as
sklearn/manifold/_t_sne.py-
sklearn/manifold/_t_sne.py:    .. math::
sklearn/manifold/_t_sne.py-
sklearn/manifold/_t_sne.py-        T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1}
sklearn/manifold/_t_sne.py-            \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))
sklearn/manifold/_t_sne.py-
sklearn/manifold/_t_sne.py-    where for each sample i, :math:`\mathcal{N}_{i}^{k}` are its k nearest
--
sklearn/metrics/_classification.py=def cohen_kappa_score(y1, y2, *, labels=None, weights=None, sample_weight=None):
--
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    This function computes Cohen's kappa [1]_, a score that expresses the level
sklearn/metrics/_classification.py-    of agreement between two annotators on a classification problem. It is
sklearn/metrics/_classification.py-    defined as
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py:    .. math::
sklearn/metrics/_classification.py-        \kappa = (p_o - p_e) / (1 - p_e)
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    where :math:`p_o` is the empirical probability of agreement on the label
sklearn/metrics/_classification.py-    assigned to any sample (the observed agreement ratio), and :math:`p_e` is
sklearn/metrics/_classification.py-    the expected agreement when both annotators assign labels randomly.
--
sklearn/metrics/_classification.py=def f1_score(
--
sklearn/metrics/_classification.py-    The F1 score can be interpreted as a harmonic mean of the precision and
sklearn/metrics/_classification.py-    recall, where an F1 score reaches its best value at 1 and worst score at 0.
sklearn/metrics/_classification.py-    The relative contribution of precision and recall to the F1 score are
sklearn/metrics/_classification.py-    equal. The formula for the F1 score is:
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py:    .. math::
sklearn/metrics/_classification.py-        \\text{F1} = \\frac{2 * \\text{TP}}{2 * \\text{TP} + \\text{FP} + \\text{FN}}
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    Where :math:`\\text{TP}` is the number of true positives, :math:`\\text{FN}` is the
sklearn/metrics/_classification.py-    number of false negatives, and :math:`\\text{FP}` is the number of false positives.
sklearn/metrics/_classification.py-    F1 is by default
--
sklearn/metrics/_classification.py=def fbeta_score(
--
sklearn/metrics/_classification.py-    Asymptotically, `beta -> +inf` considers only recall, and `beta -> 0`
sklearn/metrics/_classification.py-    only precision.
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    The formula for F-beta score is:
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py:    .. math::
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-       F_\\beta = \\frac{(1 + \\beta^2) \\text{tp}}
sklearn/metrics/_classification.py-                        {(1 + \\beta^2) \\text{tp} + \\text{fp} + \\beta^2 \\text{fn}}
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    Where :math:`\\text{tp}` is the number of true positives, :math:`\\text{fp}` is the
--
sklearn/metrics/_classification.py=def log_loss(y_true, y_pred, *, normalize=True, sample_weight=None, labels=None):
--
sklearn/metrics/_classification.py-    The log loss is only defined for two or more labels.
sklearn/metrics/_classification.py-    For a single sample with true label :math:`y \in \{0,1\}` and
sklearn/metrics/_classification.py-    a probability estimate :math:`p = \operatorname{Pr}(y = 1)`, the log
sklearn/metrics/_classification.py-    loss is:
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py:    .. math::
sklearn/metrics/_classification.py-        L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    Read more in the :ref:`User Guide <log_loss>`.
sklearn/metrics/_classification.py-
sklearn/metrics/_classification.py-    Parameters
--
sklearn/metrics/_ranking.py=def average_precision_score(
--
sklearn/metrics/_ranking.py-
sklearn/metrics/_ranking.py-    AP summarizes a precision-recall curve as the weighted mean of precisions
sklearn/metrics/_ranking.py-    achieved at each threshold, with the increase in recall from the previous
sklearn/metrics/_ranking.py-    threshold used as the weight:
sklearn/metrics/_ranking.py-
sklearn/metrics/_ranking.py:    .. math::
sklearn/metrics/_ranking.py-        \\text{AP} = \\sum_n (R_n - R_{n-1}) P_n
sklearn/metrics/_ranking.py-
sklearn/metrics/_ranking.py-    where :math:`P_n` and :math:`R_n` are the precision and recall at the nth
sklearn/metrics/_ranking.py-    threshold [1]_. This implementation is not interpolated and is different
sklearn/metrics/_ranking.py-    from computing the area under the precision-recall curve with the
--
sklearn/metrics/cluster/_supervised.py=def mutual_info_score(labels_true, labels_pred, *, contingency=None):
--
sklearn/metrics/cluster/_supervised.py-    of the same data. Where :math:`|U_i|` is the number of the samples
sklearn/metrics/cluster/_supervised.py-    in cluster :math:`U_i` and :math:`|V_j|` is the number of the
sklearn/metrics/cluster/_supervised.py-    samples in cluster :math:`V_j`, the Mutual Information
sklearn/metrics/cluster/_supervised.py-    between clusterings :math:`U` and :math:`V` is given as:
sklearn/metrics/cluster/_supervised.py-
sklearn/metrics/cluster/_supervised.py:    .. math::
sklearn/metrics/cluster/_supervised.py-
sklearn/metrics/cluster/_supervised.py-        MI(U,V)=\\sum_{i=1}^{|U|} \\sum_{j=1}^{|V|} \\frac{|U_i\\cap V_j|}{N}
sklearn/metrics/cluster/_supervised.py-        \\log\\frac{N|U_i \\cap V_j|}{|U_i||V_j|}
sklearn/metrics/cluster/_supervised.py-
sklearn/metrics/cluster/_supervised.py-    This metric is independent of the absolute values of the labels:
--
sklearn/metrics/pairwise.py=def nan_euclidean_distances(
--
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py-        weight = Total # of coordinates / # of present coordinates
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py-    For example, the distance between ``[3, na, na, 6]`` and ``[1, na, 4, 5]`` is:
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py:    .. math::
sklearn/metrics/pairwise.py-        \\sqrt{\\frac{4}{2}((3-1)^2 + (6-5)^2)}
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py-    If all the coordinates are missing or if there are no common present
sklearn/metrics/pairwise.py-    coordinates then NaN is returned for that pair.
sklearn/metrics/pairwise.py-
--
sklearn/metrics/pairwise.py=def haversine_distances(X, Y=None):
--
sklearn/metrics/pairwise.py-    The Haversine (or great circle) distance is the angular distance between
sklearn/metrics/pairwise.py-    two points on the surface of a sphere. The first coordinate of each point
sklearn/metrics/pairwise.py-    is assumed to be the latitude, the second is the longitude, given
sklearn/metrics/pairwise.py-    in radians. The dimension of the data must be 2.
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py:    .. math::
sklearn/metrics/pairwise.py-       D(x, y) = 2\\arcsin[\\sqrt{\\sin^2((x_{lat} - y_{lat}) / 2)
sklearn/metrics/pairwise.py-                                + \\cos(x_{lat})\\cos(y_{lat})\\
sklearn/metrics/pairwise.py-                                sin^2((x_{lon} - y_{lon}) / 2)}]
sklearn/metrics/pairwise.py-
sklearn/metrics/pairwise.py-    Parameters
--
sklearn/preprocessing/_data.py=class KernelCenterer(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
sklearn/preprocessing/_data.py-    r"""Center an arbitrary kernel matrix :math:`K`.
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py-    Let define a kernel :math:`K` such that:
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py:    .. math::
sklearn/preprocessing/_data.py-        K(X, Y) = \phi(X) . \phi(Y)^{T}
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py-    :math:`\phi(X)` is a function mapping of rows of :math:`X` to a
sklearn/preprocessing/_data.py-    Hilbert space and :math:`K` is of shape `(n_samples, n_samples)`.
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py-    This class allows to compute :math:`\tilde{K}(X, Y)` such that:
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py:    .. math::
sklearn/preprocessing/_data.py-        \tilde{K(X, Y)} = \tilde{\phi}(X) . \tilde{\phi}(Y)^{T}
sklearn/preprocessing/_data.py-
sklearn/preprocessing/_data.py-    :math:`\tilde{\phi}(X)` is the centered mapped data in the Hilbert
sklearn/preprocessing/_data.py-    space.
sklearn/preprocessing/_data.py-

And when it comes to the rendered version, I quite like them when encountering them in the rendered API pages.

So I'd be okay with rst-commenting-out the python version and including the math notations here. Not sure what others think though.

@@ -224,14 +224,29 @@ def lasso_path(

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

.. math::
\\frac{1}{2n_{\\operatorname{samples}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably use avoid the double backslash by using "raw" docstrings.

@@ -224,14 +224,29 @@ def lasso_path(

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

.. math::
\\frac{1}{2n_{\\operatorname{samples}}}
\\vert \\vert y - Xw \\vert \\vert^2_2 +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a less verbose version of the Euclidean norm that is understood by sphinx?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\Vert (with a capital "V") should display a double bar, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just use || no? That might make such expression a lot more readable when reading the docstring in an IDE (even if slightly less correctly from a typographical point of view).

@adrinjalali
Copy link
Member

In a meeting, we discussed, and conclusion is to use r"... to have raw docstrings here, and only keep the latex version in docstring, and we can move the ascii versions to down in the code instead if necessary.

@virchan virchan marked this pull request as draft November 28, 2024 08:51
glemaitre and others added 20 commits December 9, 2024 12:22
Co-authored-by: Adrin Jalali <[email protected]>
Co-authored-by: Thomas J. Fan <[email protected]>
Co-authored-by: Loïc Estève <[email protected]>
scikit-learn-bot and others added 29 commits December 9, 2024 12:22
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: adrinjalali <[email protected]>
Co-authored-by: Loïc Estève <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Copy link
Contributor Author

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: I updated the enet_path documentation using the raw doc-string format and discovered some trade-offs compared to the current implementation.

The updated doc-string renders as follows:

version_011

Here, only the LaTeX-formatted equations appear in the HTML file, while the code-formatted ones are rst-commented out.

I noticed (at least) two issues with this formatting approach:

  1. Indentation Adjustments: We have to modify the indentations in various parts of the doc-string to accommodate the sphinx extension numpydoc. Even then, the HTML may not render correctly in all cases. For example, see my comment on the precompute parameter section.

  2. IDE Compatibility: The LaTeX-formatted equations don’t render correctly in IDEs. Here’s what it looks like in PyCharm on my local machine, where no third-party LaTeX rendering extensions are installed:

version_009_IDE

This aligns with @glemaitre's concern:

I recall that this is a long standing debate. I was always more on the side of no-latex because I don't find it readable when looking at my IDE docstring. I don't know if modern IDE, is actually translating latex to HTML view nowadays?

To conclude, using the raw doc-string approach to include LaTeX introduces a different set of issues tied to numpydoc. Specifically, we’d need to verify the spacing and indentation manually and wait for numpydoc updates to ensure proper HTML rendering.

In light of this, I agree more with @lucyleeow’s suggestion:

I guess one view is for no latex in docstring, and have this in stuff in the user guide instead?

keep the current doc-string format and move the LaTeX equations to the user guide instead.

Comment on lines +501 to +502
precompute : 'auto', bool or array-like of shape (n_features, n_features),
default='auto'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the raw doc-string, the description for the precompute parameter renders incorrectly.

I’ve tried several "obvious fixes," including:

version_009_precompute

However, it seems we have to violate the PEP-8 character limit to get it right:

version_010

In particular, it appears this is an ongoing issue with numpydoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, does using \\ work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously tried this method, but sphinx didn’t render it correctly:

version_012

@@ -516,7 +552,7 @@ def enet_path(
See Also
--------
MultiTaskElasticNet : Multi-task ElasticNet model trained with L1/L2 mixed-norm \
as regularizer.
as regularizer.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation seems unavoidable because the sphinx extension numpydoc raises an error message,and sphinx would not output the HTML files otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.