1.3. Kernel ridge regression

The form of the model learned by is identical to supportvector regression (SVR). However, different loss functions are used:KRR uses squared error loss while support vector regression uses

-insensitive loss, both combined with l2 regularization. Incontrast to SVR, fitting can be done inclosed-form and is typically faster for medium-sized datasets. On the otherhand, the learned model is non-sparse and thus slower than SVR, which learnsa sparse model for, at prediction-time.

The next figure compares the time for fitting and prediction of and SVR for different sizes of the training set.Fitting is faster than SVR for medium-sizedtraining sets (less than 1000 samples); however, for larger training setsSVR scales better. With regard to prediction time, SVR isfaster than for all sizes of the training set because ofthe learned sparse solution. Note that the degree of sparsity and thus theprediction time depends on the parameters

References: