Skip to content

Improving the runtime efficiency of the SVC for a precomputed kernel during inference #22363

@vbelis

Description

@vbelis

Describe the workflow you want to enable

When one wants to use a custom kernel in the svm.SVC class (docs) there are two options: either to define a custom function and pass it as a callable to svm.SVC or to pre-compute the kernel matrix and use the kernel=precomputed argument of svm.SVC. From my understanding, following the class inheritance of svm.SVC: source, these two options are computationally equivalent. Please correct me if my assumption is wrong, because all the following points stem from that.

In inference time, i.e. when you use SVC.predict(X_test), with kernel=precomputed, the expected input matrix is of shape (n_test, n_train), where n_test and n_train are the number of test and train data samples, respectively. I believe there is an inefficiency because the above pre-computed matrix X_test includes more dot products than the ones needed to classify the test data points, since, the SVM model uses only the support vectors (subset of the training data vectors) to predict a label. To elaborate, a trained (fitted) SVM model classifies a test data point in the following way (omitting the bias term),

, where are the dual coefficients, are the labels of the support vectors, is the number of the support vectors identified during training, and is our kernel. The equation can be rewritten in a matrix form to accommodate classification of a set of test data points X_test, as mentioned above.

All the dual coefficients that correspond to non-support data vectors are zero hence, the above sum is only over the support vectors. However, X_test has some columns that, in general, don't correspond to support vectors.

Isn't it more efficient to compute X_test with a shape of (n_test, svm.SVC.n_support_)? The only references to this issue that I was able to find are quite old (2012), by @amueller and @mblondel in #790 and #789 (comment). Nevertheless, no possible solution was discussed or implemented.

Can such an improvement be implemented? If not, I would really appreciate any information regarding the design choices that restrict such an implementation :)

Describe your proposed solution

I wasn't able to pinpoint which parts of the source code could be changed to accommodate the above proposal. Instead, I will provide here a simple example of how the svm.SVC class could make use of the above enhancement if it is implemented.

from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.utils import shuffle
import numpy as np

dataset = load_breast_cancer()
X, y = shuffle(dataset.data,dataset.target)
X_train, X_test = X[:100, :], X[100:, :]
y_train, y_test = y[:100], y[100:]

svc = SVC(kernel='precomputed')

kernel_train = np.dot(X_train, X_train.T)  # simple example of a linear pre-computed kernel
svc.fit(kernel_train, y_train)
# What the current implementation requires:
kernel_test = np.dot(X_test, X_train.T)
# What would be more efficient:
kernel_test_ef = np.dot(X_test, X_train[svc.support_, :].T)

y_pred = svc.predict(kernel_test,y_test)# Raises an error if kerne_test_ef is used.		
													

An example of the shape difference between kernel_test_ef and kernel_test:

svc.n_support_: [7 5]
kernel_test_eff.shape: (469, 12)
kernel_test.shape: (469, 100)

This difference can cause a significant runtime difference when scaling to large datasets.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions