K-Nearest Neighbors, usually called KNN, is one of the simplest and most intuitive supervised machine learning algorithms. In MATLAB, the main function for KNN classification is fitcknn, which trains a nearest-neighbor classifier from predictor data and class labels. MATLAB’s Classification Learner app also supports nearest-neighbor classifiers and uses fitcknn behind the scenes.
This tutorial explains what KNN is, how it works, when to use it, and how to implement it in MATLAB with practical examples.
KNN is a supervised learning algorithm used mainly for classification. It predicts the class of a new observation by looking at the k nearest training samples and assigning the class based on those neighbors. MathWorks describes nearest neighbors as a kNN classification method where, after training, you can predict labels or estimate posterior probabilities using the trained model and predict.
In simple terms:
k,k points,KNN is popular because it is:
It is often used when you want a straightforward classifier without building an explicit parametric model. MATLAB’s nearest-neighbor classifier stores training data and predicts from those stored examples rather than learning coefficients like linear or logistic regression.
k is the number of neighbors used to classify a new point.
k can make the model sensitive to noise,k can make the model smoother but less flexible.MathWorks’ classifier options describe examples such as fine KNN using 1 neighbor and coarse KNN using 100 neighbors.
KNN depends on a distance measure to decide which points are nearest. MATLAB’s KNN classifier lets you alter the distance metric.
Common choices include:
Not all neighbors need to contribute equally. MATLAB supports distance weights such as:
This means closer neighbors can have more influence than farther ones.
If features have very different scales, KNN can behave poorly because distance becomes dominated by larger-scale variables. MathWorks recommends standardizing when predictors have widely different scales.
For KNN in MATLAB, the most important tools are:
fitcknn → train a KNN classifier,predict → predict labels for new data,loss → compute classification loss,Classification Learner → app-based workflow.MathWorks documents fitcknn(X,Y) for training and predict(mdl,X) for predicted class labels. It also documents loss for evaluating a trained ClassificationKNN model.
Let us begin with a small binary dataset.
clc;
clear;
close all;
% Example data
X = [1 2;
2 3;
2 1;
3 2;
6 7;
7 8;
8 7;
7 6];
Y = [1; 1; 1; 1; 2; 2; 2; 2];
% Train KNN model
Mdl = fitcknn(X, Y, 'NumNeighbors', 3);
% Predict on training data
[label, score] = predict(Mdl, X);
disp('Predicted labels:');
disp(label);
disp('Scores:');
disp(score);Here:
X contains the predictors,Y contains the class labels,fitcknn trains the classifier,'NumNeighbors',3 sets k = 3,predict returns predicted labels and classification scores.MathWorks documents both fitcknn and [label,score] = predict(mdl,X) for nearest-neighbor classification.
gscatter(X(:,1), X(:,2), Y, 'rb', 'ox');
xlabel('Feature 1');
ylabel('Feature 2');
title('Training Data');
grid on;This plot helps you see whether the classes are visually separable.
KNN is often called a lazy learner because it does not build an explicit compact model during training the way regression or SVM often does. Instead, it stores training data and uses it directly during prediction. MATLAB’s ClassificationKNN page notes that the classifier stores training data.
That means:
In real machine learning tasks, we should test the model on unseen data.
clc;
clear;
close all;
% Dataset
X = [1 2;
2 3;
2 1;
3 2;
6 7;
7 8;
8 7;
7 6;
1.5 2.5;
6.5 7.5];
Y = [1;1;1;1;2;2;2;2;1;2];
% Split data
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);
XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);
XTest = X(test(cv), :);
YTest = Y(test(cv), :);
% Train KNN
Mdl = fitcknn(XTrain, YTrain, ...
'NumNeighbors', 3, ...
'Standardize', 1);
% Predict on test set
YPred = predict(Mdl, XTest);
% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);This example adds two important ideas:
MathWorks documents standardization as a configurable option for nearest-neighbor classifiers in Classification Learner.
A confusion matrix helps measure classification performance.
cm = confusionmat(YTest, YPred);
disp('Confusion Matrix:');
disp(cm);
confusionchart(YTest, YPred);
title('Confusion Matrix');
This shows how many observations were correctly or incorrectly classified.
<hr>
Part III — Choosing k
10. Testing different values of k
One of the most important choices in KNN is the number of neighbors.
clc;
clear;
close all;
% Dataset
X = [1 2;
2 3;
2 1;
3 2;
6 7;
7 8;
8 7;
7 6;
1.5 2.5;
6.5 7.5];
Y = [1;1;1;1;2;2;2;2;1;2];
kValues = [1 3 5];
accuracies = zeros(size(kValues));
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);
XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);
XTest = X(test(cv), :);
YTest = Y(test(cv), :);
for i = 1:length(kValues)
Mdl = fitcknn(XTrain, YTrain, ...
'NumNeighbors', kValues(i), ...
'Standardize', 1);
YPred = predict(Mdl, XTest);
accuracies(i) = mean(YPred == YTest) * 100;
end
disp(table(kValues', accuracies', ...
'VariableNames', {'k','Accuracy'}));This compares different choices of k.
k = 1 can fit the training data very closely,k makes predictions smoother.MathWorks’ nearest-neighbor options page also emphasizes that changing the number of neighbors changes the model from fine to coarse.
plot(kValues, accuracies, '-o', 'LineWidth', 1.5);
xlabel('Number of Neighbors (k)');
ylabel('Accuracy (%)');
title('Accuracy vs k');
grid on;
This helps choose a reasonable value of k.
<hr>
Part IV — Distance Metrics and Weights
12. Using a different distance metric
MATLAB allows you to choose how distance is computed.
clc;
clear;
close all;
X = [1 2;
2 3;
2 1;
3 2;
6 7;
7 8;
8 7;
7 6];
Y = [1;1;1;1;2;2;2;2];
Mdl = fitcknn(X, Y, ...
'NumNeighbors', 3, ...
'Distance', 'cityblock', ...
'Standardize', 1);
YPred = predict(Mdl, X);
disp('Predicted labels:');
disp(YPred);MATLAB’s ClassificationKNN model supports altering the distance metric, and Classification Learner exposes this as a configurable option.
clc;
clear;
close all;
X = [1 2;
2 3;
2 1;
3 2;
6 7;
7 8;
8 7;
7 6];
Y = [1;1;1;1;2;2;2;2];
Mdl = fitcknn(X, Y, ...
'NumNeighbors', 5, ...
'DistanceWeight', 'inverse', ...
'Standardize', 1);
YPred = predict(Mdl, X);
disp('Predicted labels:');
disp(YPred);With inverse weighting, closer neighbors have more influence than farther ones. MATLAB’s Classification Learner options list Equal, Inverse, and Squared Inverse weighting choices.
KNN is not limited to binary classification. It can also handle multiclass problems. MathWorks documentation and examples show KNN being used on fisheriris, which has three flower classes.
clc;
clear;
close all;
load fisheriris
X = meas;
Y = species;
% Train 5-nearest neighbors classifier
Mdl = fitcknn(X, Y, ...
'NumNeighbors', 5, ...
'Standardize', 1);
% Predict
YPred = predict(Mdl, X);
% Accuracy
accuracy = mean(strcmp(YPred, Y)) * 100;
fprintf('Training Accuracy = %.2f%%\n', accuracy);
% Confusion chart
confusionchart(Y, YPred);
title('Iris KNN Classification');This is a classic multiclass example:
meas contains flower measurements,species contains the class labels,fitcknn accepts matrix predictors and class labels directly, including table-based and multiclass workflows.
clc;
clear;
close all;
load fisheriris
X = meas;
Y = species;
rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);
XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);
XTest = X(test(cv), :);
YTest = Y(test(cv), :);
Mdl = fitcknn(XTrain, YTrain, ...
'NumNeighbors', 5, ...
'Standardize', 1);
YPred = predict(Mdl, XTest);
accuracy = mean(strcmp(YPred, YTest)) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);
confusionchart(YTest, YPred);
title('Iris Test Results with KNN');This is closer to a realistic machine learning workflow than training and testing on the same data.
MATLAB provides a loss function for ClassificationKNN models. Smaller loss generally means better performance.
clc;
clear;
close all;
load fisheriris
X = meas;
Y = species;
Mdl = fitcknn(X, Y, ...
'NumNeighbors', 5, ...
'Standardize', 1);
L = loss(Mdl, X, Y);
fprintf('Classification Loss = %.4f\n', L);
fprintf('Approximate Accuracy = %.2f%%\n', (1-L)*100);Here:
loss measures prediction error on supplied data,MathWorks notes that better classifiers generally yield smaller classification loss values.
Although the core KNN workflow is often introduced with train/test splits, MATLAB’s Classification Learner app supports validation schemes and hyperparameter optimization, which makes it useful for comparing KNN settings more systematically.
A simple manual comparison strategy is:
k,MATLAB’s Classification Learner app lets you:
To open it:
classificationLearnerNearest Neighbor classifiers in Classification Learner use fitcknn, and the app exposes KNN options such as number of neighbors, distance metric, distance weight, and standardization.
Here is a small complete KNN project.
clc;
clear;
close all;
% Example student dataset
StudyHours = [1;2;2;3;4;5;5;6;7;8];
Attendance = [50;55;60;65;70;75;80;85;90;95];
Result = categorical([0;0;0;0;0;1;1;1;1;1]);
X = [StudyHours Attendance];
Y = Result;
% Train/test split
rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);
XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);
XTest = X(test(cv), :);
YTest = Y(test(cv), :);
% Train KNN model
Mdl = fitcknn(XTrain, YTrain, ...
'NumNeighbors', 3, ...
'Distance', 'euclidean', ...
'Standardize', 1);
% Predict
YPred = predict(Mdl, XTest);
% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);
% Confusion chart
confusionchart(YTest, YPred);
title('Pass/Fail KNN Classification');
% Predict a new student
newStudent = [6 82];
newClass = predict(Mdl, newStudent);
disp('Predicted class for new student:');
disp(newClass);This project includes:
That is a strong beginner workflow.
KNN depends on distances, so different feature scales can strongly distort results. MathWorks specifically notes that standardizing can improve the fit when predictor scales differ widely.
A random choice of k can hurt performance. It is better to compare several values.
A model that looks perfect on training data may generalize poorly.
Euclidean distance is common, but another metric may work better depending on the data.
Because KNN stores training data and uses it during prediction, prediction cost can grow with dataset size. MATLAB’s documentation explicitly notes that the ClassificationKNN classifier stores training data. From that, it follows that large stored training sets can make prediction more computationally heavy.
KNN is a good choice when:
You might avoid KNN when:
These are practical inferences from how KNN works and how MATLAB represents the classifier as a stored training-data model.
KNN is one of the simplest classification algorithms and one of the best starting points in machine learning. In MATLAB, the main training function is fitcknn, predictions are made with predict, performance can be assessed with confusion matrices and loss, and the Classification Learner app offers a visual workflow for training and comparing nearest-neighbor models.
A good practical workflow is:
k,Mdl = fitcknn(X, Y, 'NumNeighbors', 5, 'Standardize', 1);[label, score] = predict(Mdl, Xnew);L = loss(Mdl, X, Y);classificationLearnerThese commands match the documented MATLAB KNN workflow.
k = 1, k = 3, and k = 5
fisheriris dataset to build a multiclass KNN classifier
Train a basic KNN classifier and compute test accuracy.
Compare several values of k.
Compare different distance metrics.
Build a multiclass KNN model using the iris dataset.
Complete mini-project with train/test split, confusion matrix, and new prediction.