Support Vector Machines, or SVMs, are supervised learning algorithms used for classification and regression. In scikit-learn, the main classes are SVC and LinearSVC for classification, and SVR and LinearSVR for regression. The official scikit-learn guide also notes that SVMs are effective in high-dimensional spaces, can still work well when the number of features is larger than the number of samples, and use only a subset of the training points in the decision function, called support vectors.
Imagine you have two classes of points, for example:
An SVM tries to find a boundary that separates them. But it does not choose just any boundary. It chooses the one with the largest margin, meaning the widest possible gap between the classes.
That is the central idea:
In practice, this often gives good generalization on unseen data.
SVM is useful when:
Scikit-learn also warns that SVC is based on libsvm and that fit time scales at least quadratically with the number of samples, so it may become impractical beyond tens of thousands of samples. For larger datasets, the docs recommend considering LinearSVC or SGDClassifier, possibly with kernel approximation techniques.
Suppose your data has only 2 features:
A linear SVM tries to draw a straight line that separates “pass” from “fail”.
If several lines can separate the classes, the SVM picks the one with the largest margin.
Used when the data is perfectly separable.
The model insists on zero classification error.
Used when the data is noisy.
The model allows some mistakes but still tries to maximize the margin.
This is controlled by the parameter C.
C: wider margin, more tolerance for mistakes, stronger regularizationC: fewer mistakes on training data, narrower margin, weaker regularizationReal data is often not linearly separable.
Example:
A straight line cannot separate them.
SVM solves this using kernels. A kernel lets the algorithm act as if it mapped data into a higher-dimensional space without explicitly computing that mapping. Scikit-learn’s SVC supports kernels such as linear, polynomial, RBF, and sigmoid.
Common kernels:
The RBF kernel is often the default and a very strong general-purpose choice.
Important parameter:
gammaInterpretation:
gamma: smoother, broader influence of each pointgamma: more localized influence, more complex boundarySo with RBF SVM, the two most important tuning parameters are usually:
CgammaSVM is not only for classification.
For regression, scikit-learn provides SVR and LinearSVR. The regression version tries to fit a function while allowing an error tolerance band, controlled by epsilon, around the prediction. SVR uses libsvm and nonlinear kernels if desired, while LinearSVR is designed to scale better for larger datasets.
Main regression parameters:
C: regularizationepsilon: width of the no-penalty tubekernel: for nonlinear regressiongamma: for RBF/poly/sigmoid kernelsSVM is very sensitive to feature scales.
Example:
Without scaling, salary may dominate the geometry of the model.
That is why standardization is strongly recommended. StandardScaler standardizes features by removing the mean and scaling to unit variance. In scikit-learn, a Pipeline is the recommended way to chain preprocessing and the final estimator so training and prediction use the exact same transformations.
pip install numpy pandas matplotlib scikit-learnWe will use the Breast Cancer dataset from scikit-learn.
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target
# Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Pipeline: scaling + linear SVM
model = Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="linear", C=1.0))
])
# Train
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))Because Pipeline applies preprocessing and prediction in sequence, it prevents mistakes such as fitting the scaler outside the cross-validation loop or forgetting to scale new data before predicting. That is exactly what the scikit-learn pipeline API is for.
Now let us switch to a nonlinear model.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
rbf_model = Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="rbf", C=1.0, gamma="scale"))
])
rbf_model.fit(X_train, y_train)
y_pred_rbf = rbf_model.predict(X_test)
print("RBF Accuracy:", accuracy_score(y_test, y_pred_rbf))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred_rbf))
print("\nClassification Report:\n", classification_report(y_test, y_pred_rbf))gamma="scale"?That is the modern scikit-learn default for SVC, and it is usually a sensible starting point.
This is a good way to understand the decision boundary.
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
# Create nonlinear dataset
X, y = make_moons(n_samples=300, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="rbf", C=1.0, gamma="scale"))
])
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))import numpy as np
import matplotlib.pyplot as plt
def plot_decision_boundary(model, X, y):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(
np.linspace(x_min, x_max, 400),
np.linspace(y_min, y_max, 400)
)
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k")
plt.title("SVM Decision Boundary")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
plot_decision_boundary(model, X, y)This plot helps you see how the RBF kernel creates a curved boundary.

SVCkernel: linear, rbf, poly, sigmoidC: regularization strengthgamma: kernel coefficient for nonlinear kernelsdegree: for polynomial kernelSVRCepsilongammakernelScikit-learn’s GridSearchCV performs an exhaustive search over parameter combinations with cross-validation. The documentation recommends it for hyperparameter tuning when you want a systematic search over a specified grid.
from sklearn.model_selection import GridSearchCV
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", SVC())
])
param_grid = {
"svm__kernel": ["linear", "rbf"],
"svm__C": [0.1, 1, 10, 100],
"svm__gamma": ["scale", 0.01, 0.1, 1]
}
grid = GridSearchCV(
estimator=pipeline,
param_grid=param_grid,
cv=5,
scoring="accuracy",
n_jobs=-1
)
grid.fit(X_train, y_train)
print("Best Parameters:", grid.best_params_)
print("Best CV Score:", grid.best_score_)
best_model = grid.best_estimator_
test_score = best_model.score(X_test, y_test)
print("Test Accuracy:", test_score)If kernel="linear", gamma is irrelevant.
Still, it is common in simple tutorials to keep one shared grid. In a production project, you may separate the parameter grid by kernel type.
Example:
param_grid = [
{
"svm__kernel": ["linear"],
"svm__C": [0.1, 1, 10, 100]
},
{
"svm__kernel": ["rbf"],
"svm__C": [0.1, 1, 10, 100],
"svm__gamma": ["scale", 0.01, 0.1, 1]
}
]
This is cleaner.
If the search space is large, RandomizedSearchCV can be more efficient because it samples a fixed number of parameter settings rather than testing all combinations. Scikit-learn documents it as the sampling-based alternative to GridSearchCV.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="rbf"))
])
param_dist = {
"svm__C": loguniform(1e-2, 1e2),
"svm__gamma": loguniform(1e-3, 1e1)
}
random_search = RandomizedSearchCV(
pipeline,
param_distributions=param_dist,
n_iter=20,
cv=5,
scoring="accuracy",
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print("Best Parameters:", random_search.best_params_)
print("Best CV Score:", random_search.best_score_)
print("Test Accuracy:", random_search.best_estimator_.score(X_test, y_test))SVC vs LinearSVCScikit-learn documents that:
SVC(kernel="linear") uses libsvmLinearSVC uses liblinearLinearSVC generally scales better to large datasetsUse:
SVC(kernel="linear") for small/medium datasets when you want the classic SVM formulationLinearSVC for larger linear problemsSVC(kernel="rbf") when the boundary is nonlinearLinearSVCfrom sklearn.svm import LinearSVC
linear_svc_model = Pipeline([
("scaler", StandardScaler()),
("svm", LinearSVC(C=1.0, max_iter=10000))
])
linear_svc_model.fit(X_train, y_train)
y_pred = linear_svc_model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))SVRLet us create a synthetic regression problem.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# Synthetic data
rng = np.random.RandomState(42)
X = np.sort(5 * rng.rand(200, 1), axis=0)
y = np.sin(X).ravel()
# Add noise
y[::5] += 0.5 - rng.rand(40)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = Pipeline([
("scaler", StandardScaler()),
("svr", SVR(kernel="rbf", C=10, epsilon=0.1, gamma="scale"))
])
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))
Plot predictions
X_plot = np.linspace(X.min(), X.max(), 500).reshape(-1, 1)
y_plot = model.predict(X_plot)
plt.figure(figsize=(8, 6))
plt.scatter(X, y, label="Data")
plt.plot(X_plot, y_plot, linewidth=2, label="SVR prediction")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Support Vector Regression")
plt.legend()
plt.show()LinearSVRIf the relationship is approximately linear and the dataset is large, LinearSVR can be faster. Scikit-learn distinguishes it from SVR in the same way LinearSVC differs from SVC: better scalability for linear settings.
from sklearn.svm import LinearSVR
linear_svr_model = Pipeline([
("scaler", StandardScaler()),
("svr", LinearSVR(C=1.0, epsilon=0.1, max_iter=10000))
])
linear_svr_model.fit(X_train, y_train)
y_pred = linear_svr_model.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))This is the workflow you should use in real projects.
import pandas as pd
df = pd.read_csv("your_data.csv")X = df.drop("target", axis=1)
y = df["target"]from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", SVC())
])from sklearn.model_selection import GridSearchCV
param_grid = [
{
"svm__kernel": ["linear"],
"svm__C": [0.1, 1, 10]
},
{
"svm__kernel": ["rbf"],
"svm__C": [0.1, 1, 10],
"svm__gamma": ["scale", 0.01, 0.1, 1]
}
]
grid = GridSearchCV(
pipeline,
param_grid=param_grid,
cv=5,
scoring="accuracy",
n_jobs=-1
)
grid.fit(X_train, y_train)from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
print("Best Params:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))new_samples = X_test.iloc[:5]
predictions = best_model.predict(new_samples)
print(predictions)SVM, especially linear SVM, is a classic strong method for text classification because text data often has very high dimensional sparse features. Scikit-learn’s text tutorial shows the standard pattern of turning text into feature vectors and then training a classifier in a pipeline.
Example:
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
texts = [
"I love this product",
"This is amazing",
"Very bad quality",
"I hate it",
"Excellent and wonderful",
"Terrible experience"
]
labels = [1, 1, 0, 0, 1, 0] # 1 = positive, 0 = negative
X_train, X_test, y_train, y_test = train_test_split(
texts, labels, test_size=0.3, random_state=42
)
text_model = Pipeline([
("tfidf", TfidfVectorizer()),
("clf", LinearSVC())
])
text_model.fit(X_train, y_train)
y_pred = text_model.predict(X_test)
print(classification_report(y_test, y_pred))This is a simple sentiment classification example.
For classification, common metrics are:
Use accuracy when classes are balanced.
If classes are imbalanced, pay more attention to precision, recall, and F1.
Example:
from sklearn.metrics import confusion_matrix, classification_report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
For regression, common metrics are:
Example:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R²:", r2)This is one of the biggest mistakes with SVM.
Bad:
model = SVC(kernel="rbf")
model.fit(X_train, y_train)
Better:
model = Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="rbf"))
])
Because SVM geometry depends strongly on feature magnitudes, scaling is usually essential. StandardScaler and Pipeline are the documented scikit-learn tools for this workflow.
Do not choose C and gamma by repeatedly checking the test set.
Correct process:
That is exactly the purpose of GridSearchCV and related model selection tools.
SVC on a huge datasetIf your dataset is very large:
SVC can become slowLinearSVC for linear tasksScikit-learn explicitly warns that SVC fit time scales at least quadratically with sample count.
C and large gammaTypical pattern:
CgammaThis may create a model that memorizes training data and generalizes poorly.
Symptoms:
Fix:
CgammaLinearSVC when:SVC(kernel="rbf") when:SVR when:LinearSVR when:These choices align with the distinctions scikit-learn draws between libsvm-based SVC/SVR and liblinear-based LinearSVC/LinearSVR.
For classification:
Pipeline([
("scaler", StandardScaler()),
("svm", SVC(kernel="rbf", C=1.0, gamma="scale"))
])For large linear classification:
Pipeline([
("scaler", StandardScaler()),
("svm", LinearSVC(C=1.0, max_iter=10000))
])For regression:
Pipeline([
("scaler", StandardScaler()),
("svr", SVR(kernel="rbf", C=10, epsilon=0.1, gamma="scale"))
])Here is a compact but realistic project.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Load
data = load_breast_cancer()
X, y = data.data, data.target
# Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Pipeline
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", SVC())
])
# Search space
param_grid = [
{
"svm__kernel": ["linear"],
"svm__C": [0.1, 1, 10]
},
{
"svm__kernel": ["rbf"],
"svm__C": [0.1, 1, 10],
"svm__gamma": ["scale", 0.01, 0.1, 1]
}
]
# Grid search
grid = GridSearchCV(
pipeline,
param_grid=param_grid,
cv=5,
scoring="accuracy",
n_jobs=-1
)
grid.fit(X_train, y_train)
# Evaluate
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
SVM is one of the most important classical machine learning algorithms.
The core ideas are:
In Python with scikit-learn, the most important tools are:
SVCLinearSVCSVRLinearSVRStandardScalerPipelineGridSearchCVAnd the practical rules are:
C and gammaLinearSVC for large linear problemsSVC with RBF for nonlinear medium-sized problemsThese recommendations match the current scikit-learn documentation on SVMs, preprocessing, pipelines, and hyperparameter tuning.
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report
# X, y = your data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
pipeline = Pipeline([
("scaler", StandardScaler()),
("svm", SVC())
])
param_grid = [
{
"svm__kernel": ["linear"],
"svm__C": [0.1, 1, 10]
},
{
"svm__kernel": ["rbf"],
"svm__C": [0.1, 1, 10],
"svm__gamma": ["scale", 0.01, 0.1]
}
]
grid = GridSearchCV(
pipeline,
param_grid=param_grid,
cv=5,
scoring="accuracy",
n_jobs=-1
)
grid.fit(X_train, y_train)
print("Best params:", grid.best_params_)
print("Test score:", grid.best_estimator_.score(X_test, y_test))
y_pred = grid.best_estimator_.predict(X_test)
print(classification_report(y_test, y_pred))Train a linear SVM on the Iris dataset and report accuracy.
Train an RBF SVM on make_moons and visualize the boundary.
Tune C and gamma using GridSearchCV.
Compare SVC(kernel="linear") with LinearSVC.
Train an SVR model on a synthetic nonlinear regression dataset.
SVC and LinearSVCSVR