-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the workflow you want to enable
When one wants to use a custom kernel in the svm.SVC class (docs) there are two options: either to define a custom function and pass it as a callable to svm.SVC or to pre-compute the kernel matrix and use the kernel=precomputed argument of svm.SVC. From my understanding, following the class inheritance of svm.SVC: source, these two options are computationally equivalent. Please correct me if my assumption is wrong, because all the following points stem from that.
In inference time, i.e. when you use SVC.predict(X_test), with kernel=precomputed, the expected input matrix is of shape (n_test, n_train), where n_test and n_train are the number of test and train data samples, respectively. I believe there is an inefficiency because the above pre-computed matrix X_test includes more dot products than the ones needed to classify the test data points, since, the SVM model uses only the support vectors (subset of the training data vectors) to predict a label. To elaborate, a trained (fitted) SVM model classifies a test data point
in the following way (omitting the bias term),
, where
are the dual coefficients,
are the labels of the support vectors,
is the number of the support vectors identified during training, and
is our kernel. The equation can be rewritten in a matrix form to accommodate classification of a set of test data points X_test, as mentioned above.
All the dual coefficients that correspond to non-support data vectors are zero hence, the above sum is only over the support vectors. However, X_test has some columns that, in general, don't correspond to support vectors.
Isn't it more efficient to compute X_test with a shape of (n_test, svm.SVC.n_support_)? The only references to this issue that I was able to find are quite old (2012), by @amueller and @mblondel in #790 and #789 (comment). Nevertheless, no possible solution was discussed or implemented.
Can such an improvement be implemented? If not, I would really appreciate any information regarding the design choices that restrict such an implementation :)
Describe your proposed solution
I wasn't able to pinpoint which parts of the source code could be changed to accommodate the above proposal. Instead, I will provide here a simple example of how the svm.SVC class could make use of the above enhancement if it is implemented.
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.utils import shuffle
import numpy as np
dataset = load_breast_cancer()
X, y = shuffle(dataset.data,dataset.target)
X_train, X_test = X[:100, :], X[100:, :]
y_train, y_test = y[:100], y[100:]
svc = SVC(kernel='precomputed')
kernel_train = np.dot(X_train, X_train.T) # simple example of a linear pre-computed kernel
svc.fit(kernel_train, y_train)
# What the current implementation requires:
kernel_test = np.dot(X_test, X_train.T)
# What would be more efficient:
kernel_test_ef = np.dot(X_test, X_train[svc.support_, :].T)
y_pred = svc.predict(kernel_test,y_test)# Raises an error if kerne_test_ef is used.
An example of the shape difference between kernel_test_ef and kernel_test:
svc.n_support_: [7 5]
kernel_test_eff.shape: (469, 12)
kernel_test.shape: (469, 100)This difference can cause a significant runtime difference when scaling to large datasets.
Describe alternatives you've considered, if relevant
No response
Additional context
No response