Support Vector Regression (SVR) is a regression technique that is based on the concept of Support Vector Machines (SVM) and aims to find a function that approximates the training data by minimizing the error, but also allowing a certain amount of error within a specified margin.
Support Vector Regression
Support Vector Regression (SVR) is a technique derived from Support Vector Machines (SVM), which were originally developed for binary classification problems. SVMs were introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1960s and 1970s. Subsequently, the idea was extended to regression, leading to the creation of the SVR. The move from SVM to SVR involves extending the concept of Support Vector Machines to handle regression problems. In SVM, the goal is to find a hyperplane that maximizes the margin between classes, while in SVR, the goal is to find a function that approximates the training data within a certain margin of tolerance.
SVR involves minimizing a cost function that takes into account both prediction error and model complexity. The objective function can be represented as:
Under constraints:
where:
- is the weight vector.
- is the term bias.
- is the mapping function in the feature space.
- is the margin of tolerance.
- and they are slack variables that measure the forecast error.
- is a regularization parameter that controls the trade-off between the complexity of the model and the penalty for errors.
Let’s add some considerations to make:
- Kernel Trick: SVR can benefit from the “kernel trick” to handle non-linear relationships, mapping data into a more complex feature space.
- Parameter Tuning: The choice of kernel and parameters such as ( C ), ( \varepsilon ), and ( \gamma ) (in the case of RBF kernel) requires attention to obtain good model performance.
- Robustness: SVR is robust to outliers in the training data due to the presence of slack variables.
- Interpretation: Model interpretability can be a challenging aspect, especially when using complex kernels.
In summary, SVR is a useful technique for regression that exploits the principles of Support Vector Machines, trying to find a function that approximates the training data within a specified margin, while allowing for a certain degree of error.
If you want to delve deeper into the topic and discover more about the world of Data Science with Python, I recommend you read my book:
Fabio Nelli
Example of a regression problem with Support Vector Regression and scikit-learn
Support Vector Regression (SVR) is included in the scikit-learn library, which is one of the most used libraries for machine learning in Python. Scikit-learn provides an efficient and easy-to-use implementation of Support Vector Regression along with other machine learning techniques. You can use scikit-learn’s SVR class to create, train, and use support vector machines (SVM)-based regression models. Here is an example of using SVR in Python using scikit-learn:
from sklearn.svm import SVR
import numpy as np
import matplotlib.pyplot as plt
# Let's generate sample data
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel()
# Let's add noise to the data
y[::5] += 3 * (0.5 - np.random.rand(16))
# We train the SVR model
svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr_rbf.fit(X, y)
# We predict on the training data
y_pred = svr_rbf.predict(X)
# Let's visualize the results
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_pred, color='navy', lw=2, label='SVR (RBF kernel)')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()
In this example, radial kernel (RBF) SVR is used to approximate a sinusoidal function with noisy data. First we generate sample data. Sample X and y data representing a sinusoidal function with added noise are generated. We then move on to creating and training the SVR model.
svr_rbf = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr_rbf.fit(X, y)
A Radial Kernel (RBF) Support Vector Regression (SVR) model is created and trained using the training data. At this point, predictions and results visualization are made using the Matplotlib library. The goal is to show how the model approximates the input data, including the noise present in the training data.
Running the above code gives you the following graph.
Regression Problem: Predicting house prices with Support Vector Regression and scikit-learn
Let’s now move on to a regression problem on a real dataset, i.e. with values taken from a real context. In this regard, there are ready-made datasets provided directly by scikit-learn specific for these purposes of testing and studying the models. We can use Support Vector Regression (SVR) for a regression problem using the fetch_california_housing dataset provided by scikit-learn. The dataset contains data on the average price of homes in different regions of California, along with various characteristics of the homes themselves. This dataset is commonly used for regression purposes.
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
housing = fetch_california_housing()
data = pd.DataFrame(data=np.c_[housing['data'], housing['target']], columns=housing['feature_names'] + ['target'])
data.head(10)
Let’s see together what type of data this dataset contains:
The fetch_california_housing dataset
is a widely used dataset in the field of machine learning and statistics for regression purposes. It contains data on the average price of homes in various regions of California, along with several characteristics that describe the homes themselves. Here is a description of the main characteristics of the dataset:
- MedInc: Median block income.
- HouseAge: Median age of houses in the block.
- AveRooms: Average rooms per home in the block.
- AveBedrms: Average bedrooms per home in the block.
- Population: Population of the block.
- AveOccup: Average occupancy per dwelling.
- Latitude: Latitude of the block center.
- Longitude: Longitude of the center of the block.
The regression target is MedHouseVal which is the median home value for blocks in the area in thousands of dollars.
In essence, each row of the dataset represents a block (neighborhood or area) of California, and each column represents a characteristic of that block.
This dataset is often used for regression purposes to predict the average house price based on block characteristics, making it a good starting point for machine learning experiments in the context of regression.
Let’s now apply SVR for the regression and thus obtain a model capable of predicting the average house price.
# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(data.drop('target', axis=1), data['target'], test_size=0.2, random_state=42)
# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Create the SVR model
svr = SVR(kernel='rbf', C=10, gamma='auto')
# Train the model
svr.fit(X_train_scaled, y_train)
# Make predictions
y_pred = svr.predict(X_test_scaled)
# Calculate the mean square error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
By analyzing the code, where the dataset has been divided into training sets and testing sets, we move on to standardizing the features using StandardScaler. Standardization is a common practice in machine learning to make features comparable by reducing the scale of feature values. Then an SVR model is created using the radial kernel (RBF). C and gamma are model parameters that can be adjusted to optimize model performance. In this case, C is set to 10 and gamma is set to ‘auto’.
Running you get the following value:
Mean Squared Error: 0.3236927921625923
Which is a decent result. If we want to graphically see the prediction capacity of the model compared to the dataset we can add the following code.
plt.scatter(y_test, y_pred)
plt.plot([0, max(y_test)], [0, max(y_test)], color='red', linestyle='--') # Aggiungi la linea x = y
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.title("Actual vs Predicted Prices")
plt.show()
As we can see from the graph, there is a whole series of elements of the dataset that have a target value greater than 5 (in reality they all have a value of 5.0001) since it would seem like an “off-scale” evaluation and therefore it would be better to eliminate them from the starting dataset . Let’s add and modify the previous code to take this fact into account.
# Remove all rows where the 'target' column value is greater than 5
data_cleaned = data.loc[data['target'] <= 5]
# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(data_cleaned.drop('target', axis=1), data_cleaned['target'], test_size=0.2, random_state=42)
Executing the code this time with the dataset cleaned of out-of-scale data we obtain the following results:
Mean Squared Error: 0.2733577250854457
As we can see the mean square error has reduced, and therefore we have gone in the right direction.
When to use SVR in regression problems?
The choice of model depends on several factors, including the nature of the data, the dimensionality of the dataset, the distribution of the data, the presence of linear or non-linear relationships between the features and the target, and the desired performance. Here are some situations where using Support Vector Regression (SVR) might be preferable to other models provided by scikit-learn:
- Nonlinear Data: SVR can effectively handle nonlinear data through the use of nonlinear kernels such as the RBF kernel. If the data has a complex nonlinear structure, SVR may be an appropriate choice.
- Robustness to noisy data: SVMs are known for their robustness to noisy data. If your dataset contains many outliers or noisy data, SVR may be a better choice than models sensitive to noisy data such as linear regression.
- High dimensionality: SVR can effectively handle datasets with a large number of features, especially if there is a low fraction of relevant features (sparse data). Furthermore, thanks to the built-in regularization technique, SVR can help avoid overfitting even in the presence of many features.
- Little a priori knowledge about the data distribution: If you do not have clear a priori knowledge about the data distribution or the relationship between the features and the target, SVR can be a reasonable choice as it does not require specific assumptions about the data distribution.
- Tuning Flexibility: SVR offers several parameters that can be tuned to optimize model performance, such as kernel type, C regularization parameter, and kernel-specific parameters. This flexibility can be useful for adapting the model to the specific needs of the problem.
However, it is important to note that SVR may not always be the best choice. For example, if the regression problem is linear and there is no evidence of non-linear relationships between the features and the target, simpler models such as linear regression may be more appropriate and computationally less expensive. Therefore, it is always a good idea to carefully examine your data and experiment with different models to determine which one works best for your specific regression problem.