0. Introduction

K-Nearest Neighbors, usually called KNN, is one of the simplest and most intuitive supervised machine learning algorithms. In MATLAB, the main function for KNN classification is fitcknn, which trains a nearest-neighbor classifier from predictor data and class labels. MATLAB’s Classification Learner app also supports nearest-neighbor classifiers and uses fitcknn behind the scenes.

This tutorial explains what KNN is, how it works, when to use it, and how to implement it in MATLAB with practical examples.

1. What is KNN?

KNN is a supervised learning algorithm used mainly for classification. It predicts the class of a new observation by looking at the k nearest training samples and assigning the class based on those neighbors. MathWorks describes nearest neighbors as a kNN classification method where, after training, you can predict labels or estimate posterior probabilities using the trained model and predict.

In simple terms:

2. Why use KNN?

KNN is popular because it is:

It is often used when you want a straightforward classifier without building an explicit parametric model. MATLAB’s nearest-neighbor classifier stores training data and predicts from those stored examples rather than learning coefficients like linear or logistic regression.

3. Main concepts behind KNN

3.1 The value of k

k is the number of neighbors used to classify a new point.

MathWorks’ classifier options describe examples such as fine KNN using 1 neighbor and coarse KNN using 100 neighbors.

3.2 Distance metric

KNN depends on a distance measure to decide which points are nearest. MATLAB’s KNN classifier lets you alter the distance metric.

Common choices include:

3.3 Distance weighting

Not all neighbors need to contribute equally. MATLAB supports distance weights such as:

This means closer neighbors can have more influence than farther ones.

3.4 Standardization

If features have very different scales, KNN can behave poorly because distance becomes dominated by larger-scale variables. MathWorks recommends standardizing when predictors have widely different scales.

4. MATLAB functions and tools you need to know

For KNN in MATLAB, the most important tools are:

MathWorks documents fitcknn(X,Y) for training and predict(mdl,X) for predicted class labels. It also documents loss for evaluating a trained ClassificationKNN model.

Part I — First KNN Classification Example

5. Simple KNN example

Let us begin with a small binary dataset.

clc;
clear;
close all;

% Example data
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1; 1; 1; 1; 2; 2; 2; 2];

% Train KNN model
Mdl = fitcknn(X, Y, 'NumNeighbors', 3);

% Predict on training data
[label, score] = predict(Mdl, X);

disp('Predicted labels:');
disp(label);

disp('Scores:');
disp(score);

Explanation

Here:

MathWorks documents both fitcknn and [label,score] = predict(mdl,X) for nearest-neighbor classification.

6. Visualizing the data

gscatter(X(:,1), X(:,2), Y, 'rb', 'ox');
xlabel('Feature 1');
ylabel('Feature 2');
title('Training Data');
grid on;

This plot helps you see whether the classes are visually separable.

7. Why KNN is called a lazy learner

KNN is often called a lazy learner because it does not build an explicit compact model during training the way regression or SVM often does. Instead, it stores training data and uses it directly during prediction. MATLAB’s ClassificationKNN page notes that the classifier stores training data.

That means:

Part II — Train/Test Workflow

8. Splitting data into training and test sets

In real machine learning tasks, we should test the model on unseen data.

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

% Split data
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Standardize', 1);

% Predict on test set
YPred = predict(Mdl, XTest);

% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

Explanation

This example adds two important ideas:

MathWorks documents standardization as a configurable option for nearest-neighbor classifiers in Classification Learner.

9. Confusion matrix

A confusion matrix helps measure classification performance.

cm = confusionmat(YTest, YPred);
disp('Confusion Matrix:');
disp(cm);

confusionchart(YTest, YPred);
title('Confusion Matrix');

This shows how many observations were correctly or incorrectly classified.
<hr>
Part III — Choosing k
10. Testing different values of k
One of the most important choices in KNN is the number of neighbors.

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

kValues = [1 3 5];
accuracies = zeros(size(kValues));

rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

for i = 1:length(kValues)
    Mdl = fitcknn(XTrain, YTrain, ...
        'NumNeighbors', kValues(i), ...
        'Standardize', 1);
    
    YPred = predict(Mdl, XTest);
    accuracies(i) = mean(YPred == YTest) * 100;
end

disp(table(kValues', accuracies', ...
    'VariableNames', {'k','Accuracy'}));

Explanation

This compares different choices of k.

MathWorks’ nearest-neighbor options page also emphasizes that changing the number of neighbors changes the model from fine to coarse.

11. Plotting accuracy vs k

plot(kValues, accuracies, '-o', 'LineWidth', 1.5);
xlabel('Number of Neighbors (k)');
ylabel('Accuracy (%)');
title('Accuracy vs k');
grid on;

This helps choose a reasonable value of k.
<hr>
Part IV — Distance Metrics and Weights
12. Using a different distance metric
MATLAB allows you to choose how distance is computed.

clc;
clear;
close all;

X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1;1;1;1;2;2;2;2];

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 3, ...
    'Distance', 'cityblock', ...
    'Standardize', 1);

YPred = predict(Mdl, X);

disp('Predicted labels:');
disp(YPred);

MATLAB’s ClassificationKNN model supports altering the distance metric, and Classification Learner exposes this as a configurable option.

13. Using distance weighting

clc;
clear;
close all;

X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1;1;1;1;2;2;2;2];

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'DistanceWeight', 'inverse', ...
    'Standardize', 1);

YPred = predict(Mdl, X);

disp('Predicted labels:');
disp(YPred);

With inverse weighting, closer neighbors have more influence than farther ones. MATLAB’s Classification Learner options list Equal, Inverse, and Squared Inverse weighting choices.

Part V — Real MATLAB Dataset Example

14. Multiclass KNN with the iris dataset

KNN is not limited to binary classification. It can also handle multiclass problems. MathWorks documentation and examples show KNN being used on fisheriris, which has three flower classes.

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

% Train 5-nearest neighbors classifier
Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

% Predict
YPred = predict(Mdl, X);

% Accuracy
accuracy = mean(strcmp(YPred, Y)) * 100;
fprintf('Training Accuracy = %.2f%%\n', accuracy);

% Confusion chart
confusionchart(Y, YPred);
title('Iris KNN Classification');

Explanation

This is a classic multiclass example:

fitcknn accepts matrix predictors and class labels directly, including table-based and multiclass workflows.

15. Train/test split with iris

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

YPred = predict(Mdl, XTest);

accuracy = mean(strcmp(YPred, YTest)) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

confusionchart(YTest, YPred);
title('Iris Test Results with KNN');

This is closer to a realistic machine learning workflow than training and testing on the same data.

Part VI — Model Evaluation in MATLAB

16. Using classification loss

MATLAB provides a loss function for ClassificationKNN models. Smaller loss generally means better performance.

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

L = loss(Mdl, X, Y);

fprintf('Classification Loss = %.4f\n', L);
fprintf('Approximate Accuracy = %.2f%%\n', (1-L)*100);

Explanation

Here:

MathWorks notes that better classifiers generally yield smaller classification loss values.

17. Cross-validation idea

Although the core KNN workflow is often introduced with train/test splits, MATLAB’s Classification Learner app supports validation schemes and hyperparameter optimization, which makes it useful for comparing KNN settings more systematically.

A simple manual comparison strategy is:

Part VII — Using Classification Learner

18. App-based KNN workflow

MATLAB’s Classification Learner app lets you:

To open it:

classificationLearner

Nearest Neighbor classifiers in Classification Learner use fitcknn, and the app exposes KNN options such as number of neighbors, distance metric, distance weight, and standardization.

Part VIII — End-to-End Mini Project

19. Project: classify students as pass/fail

Here is a small complete KNN project.

clc;
clear;
close all;

% Example student dataset
StudyHours = [1;2;2;3;4;5;5;6;7;8];
Attendance = [50;55;60;65;70;75;80;85;90;95];
Result = categorical([0;0;0;0;0;1;1;1;1;1]);

X = [StudyHours Attendance];
Y = Result;

% Train/test split
rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN model
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Distance', 'euclidean', ...
    'Standardize', 1);

% Predict
YPred = predict(Mdl, XTest);

% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

% Confusion chart
confusionchart(YTest, YPred);
title('Pass/Fail KNN Classification');

% Predict a new student
newStudent = [6 82];
newClass = predict(Mdl, newStudent);

disp('Predicted class for new student:');
disp(newClass);

What this project teaches

This project includes:

That is a strong beginner workflow.

Part IX — Common mistakes beginners make

20. Forgetting to standardize features

KNN depends on distances, so different feature scales can strongly distort results. MathWorks specifically notes that standardizing can improve the fit when predictor scales differ widely.

21. Choosing k without testing

A random choice of k can hurt performance. It is better to compare several values.

22. Evaluating only on training data

A model that looks perfect on training data may generalize poorly.

23. Ignoring distance metric choices

Euclidean distance is common, but another metric may work better depending on the data.

24. Using KNN on very large datasets without caution

Because KNN stores training data and uses it during prediction, prediction cost can grow with dataset size. MATLAB’s documentation explicitly notes that the ClassificationKNN classifier stores training data. From that, it follows that large stored training sets can make prediction more computationally heavy.

Part X — When should you use KNN?

KNN is a good choice when:

You might avoid KNN when:

These are practical inferences from how KNN works and how MATLAB represents the classifier as a stored training-data model.

Part XI — Summary

KNN is one of the simplest classification algorithms and one of the best starting points in machine learning. In MATLAB, the main training function is fitcknn, predictions are made with predict, performance can be assessed with confusion matrices and loss, and the Classification Learner app offers a visual workflow for training and comparing nearest-neighbor models.

A good practical workflow is:

  1. prepare and inspect the data,
  2. standardize the predictors,
  3. choose k,
  4. train the KNN model,
  5. test it on unseen data,
  6. compare distance metrics and weights,
  7. evaluate with accuracy, confusion matrix, or loss,
  8. use the model for prediction.

Part XII — MATLAB cheat sheet

Train a KNN classifier

Mdl = fitcknn(X, Y, 'NumNeighbors', 5, 'Standardize', 1);

Predict labels

[label, score] = predict(Mdl, Xnew);

Compute classification loss

L = loss(Mdl, X, Y);

Open Classification Learner

classificationLearner

These commands match the documented MATLAB KNN workflow.

Practice exercises

Exercise 1

Train a KNN classifier on a small binary dataset and compute test accuracy

Exercise 2

Compare the results for k = 1, k = 3, and k = 5

Exercise 3

Train a KNN classifier with two different distance metrics and compare the predictions

Exercise 4

Use the fisheriris dataset to build a multiclass KNN classifier

Exercise 5

Build a small end-to-end KNN project with train/test split, confusion matrix, and prediction for a new observation

Short recap

Exercise 1

Train a basic KNN classifier and compute test accuracy.

Exercise 2

Compare several values of k.

Exercise 3

Compare different distance metrics.

Exercise 4

Build a multiclass KNN model using the iris dataset.

Exercise 5

Complete mini-project with train/test split, confusion matrix, and new prediction.