0. Introduction

K-Nearest Neighbors, usually called KNN, is one of the simplest and most intuitive supervised machine learning algorithms. In MATLAB, the main function for KNN classification is fitcknn, which trains a nearest-neighbor classifier from predictor data and class labels. MATLAB’s Classification Learner app also supports nearest-neighbor classifiers and uses fitcknn behind the scenes.

This tutorial explains what KNN is, how it works, when to use it, and how to implement it in MATLAB with practical examples.

1. What is KNN?

KNN is a supervised learning algorithm used mainly for classification. It predicts the class of a new observation by looking at the k nearest training samples and assigning the class based on those neighbors. MathWorks describes nearest neighbors as a kNN classification method where, after training, you can predict labels or estimate posterior probabilities using the trained model and predict.

In simple terms:

store the training data,
choose a value for k,
measure the distance from a new point to the training points,
find the nearest k points,
assign the most common class among them.

2. Why use KNN?

KNN is popular because it is:

simple to understand,
easy to implement,
effective for many small and medium classification problems,
useful as a baseline model.

It is often used when you want a straightforward classifier without building an explicit parametric model. MATLAB’s nearest-neighbor classifier stores training data and predicts from those stored examples rather than learning coefficients like linear or logistic regression.

3. Main concepts behind KNN

3.1 The value of k

k is the number of neighbors used to classify a new point.

small k can make the model sensitive to noise,
large k can make the model smoother but less flexible.

MathWorks’ classifier options describe examples such as fine KNN using 1 neighbor and coarse KNN using 100 neighbors.

3.2 Distance metric

KNN depends on a distance measure to decide which points are nearest. MATLAB’s KNN classifier lets you alter the distance metric.

Common choices include:

Euclidean distance,
cityblock / Manhattan distance,
cosine distance,
correlation distance.

3.3 Distance weighting

Not all neighbors need to contribute equally. MATLAB supports distance weights such as:

equal,
inverse,
squared inverse.

This means closer neighbors can have more influence than farther ones.

3.4 Standardization

If features have very different scales, KNN can behave poorly because distance becomes dominated by larger-scale variables. MathWorks recommends standardizing when predictors have widely different scales.

4. MATLAB functions and tools you need to know

For KNN in MATLAB, the most important tools are:

fitcknn → train a KNN classifier,
predict → predict labels for new data,
loss → compute classification loss,
Classification Learner → app-based workflow.

MathWorks documents fitcknn(X,Y) for training and predict(mdl,X) for predicted class labels. It also documents loss for evaluating a trained ClassificationKNN model.

Part I — First KNN Classification Example

5. Simple KNN example

Let us begin with a small binary dataset.

clc;
clear;
close all;

% Example data
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1; 1; 1; 1; 2; 2; 2; 2];

% Train KNN model
Mdl = fitcknn(X, Y, 'NumNeighbors', 3);

% Predict on training data
[label, score] = predict(Mdl, X);

disp('Predicted labels:');
disp(label);

disp('Scores:');
disp(score);

Explanation

Here:

X contains the predictors,
Y contains the class labels,
fitcknn trains the classifier,
'NumNeighbors',3 sets k = 3,
predict returns predicted labels and classification scores.

MathWorks documents both fitcknn and [label,score] = predict(mdl,X) for nearest-neighbor classification.

6. Visualizing the data

gscatter(X(:,1), X(:,2), Y, 'rb', 'ox');
xlabel('Feature 1');
ylabel('Feature 2');
title('Training Data');
grid on;

This plot helps you see whether the classes are visually separable.

7. Why KNN is called a lazy learner

KNN is often called a lazy learner because it does not build an explicit compact model during training the way regression or SVM often does. Instead, it stores training data and uses it directly during prediction. MATLAB’s ClassificationKNN page notes that the classifier stores training data.

That means:

training is usually simple,
prediction can be more expensive when the dataset becomes large.

Part II — Train/Test Workflow

8. Splitting data into training and test sets

In real machine learning tasks, we should test the model on unseen data.

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

% Split data
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Standardize', 1);

% Predict on test set
YPred = predict(Mdl, XTest);

% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

Explanation

This example adds two important ideas:

holdout validation,
standardization.

MathWorks documents standardization as a configurable option for nearest-neighbor classifiers in Classification Learner.

9. Confusion matrix

A confusion matrix helps measure classification performance.

cm = confusionmat(YTest, YPred);
disp('Confusion Matrix:');
disp(cm);

confusionchart(YTest, YPred);
title('Confusion Matrix');

This shows how many observations were correctly or incorrectly classified.
<hr>
Part III — Choosing k
10. Testing different values of k
One of the most important choices in KNN is the number of neighbors.

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

kValues = [1 3 5];
accuracies = zeros(size(kValues));

rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

for i = 1:length(kValues)
    Mdl = fitcknn(XTrain, YTrain, ...
        'NumNeighbors', kValues(i), ...
        'Standardize', 1);
    
    YPred = predict(Mdl, XTest);
    accuracies(i) = mean(YPred == YTest) * 100;
end

disp(table(kValues', accuracies', ...
    'VariableNames', {'k','Accuracy'}));

Explanation

This compares different choices of k.

k = 1 can fit the training data very closely,
larger k makes predictions smoother.

MathWorks’ nearest-neighbor options page also emphasizes that changing the number of neighbors changes the model from fine to coarse.

11. Plotting accuracy vs k

plot(kValues, accuracies, '-o', 'LineWidth', 1.5);
xlabel('Number of Neighbors (k)');
ylabel('Accuracy (%)');
title('Accuracy vs k');
grid on;

This helps choose a reasonable value of k.
<hr>
Part IV — Distance Metrics and Weights
12. Using a different distance metric
MATLAB allows you to choose how distance is computed.

clc;
clear;
close all;

X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1;1;1;1;2;2;2;2];

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 3, ...
    'Distance', 'cityblock', ...
    'Standardize', 1);

YPred = predict(Mdl, X);

disp('Predicted labels:');
disp(YPred);

MATLAB’s ClassificationKNN model supports altering the distance metric, and Classification Learner exposes this as a configurable option.

13. Using distance weighting

clc;
clear;
close all;

X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1;1;1;1;2;2;2;2];

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'DistanceWeight', 'inverse', ...
    'Standardize', 1);

YPred = predict(Mdl, X);

disp('Predicted labels:');
disp(YPred);

With inverse weighting, closer neighbors have more influence than farther ones. MATLAB’s Classification Learner options list Equal, Inverse, and Squared Inverse weighting choices.

Part V — Real MATLAB Dataset Example

14. Multiclass KNN with the iris dataset

KNN is not limited to binary classification. It can also handle multiclass problems. MathWorks documentation and examples show KNN being used on fisheriris, which has three flower classes.

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

% Train 5-nearest neighbors classifier
Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

% Predict
YPred = predict(Mdl, X);

% Accuracy
accuracy = mean(strcmp(YPred, Y)) * 100;
fprintf('Training Accuracy = %.2f%%\n', accuracy);

% Confusion chart
confusionchart(Y, YPred);
title('Iris KNN Classification');

Explanation

This is a classic multiclass example:

meas contains flower measurements,
species contains the class labels,
KNN predicts one of three species.

fitcknn accepts matrix predictors and class labels directly, including table-based and multiclass workflows.

15. Train/test split with iris

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

YPred = predict(Mdl, XTest);

accuracy = mean(strcmp(YPred, YTest)) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

confusionchart(YTest, YPred);
title('Iris Test Results with KNN');

This is closer to a realistic machine learning workflow than training and testing on the same data.

Part VI — Model Evaluation in MATLAB

16. Using classification loss

MATLAB provides a loss function for ClassificationKNN models. Smaller loss generally means better performance.

clc;
clear;
close all;

load fisheriris

X = meas;
Y = species;

Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

L = loss(Mdl, X, Y);

fprintf('Classification Loss = %.4f\n', L);
fprintf('Approximate Accuracy = %.2f%%\n', (1-L)*100);

Explanation

Here:

loss measures prediction error on supplied data,
smaller values are better.

MathWorks notes that better classifiers generally yield smaller classification loss values.

17. Cross-validation idea

Although the core KNN workflow is often introduced with train/test splits, MATLAB’s Classification Learner app supports validation schemes and hyperparameter optimization, which makes it useful for comparing KNN settings more systematically.

A simple manual comparison strategy is:

split the dataset,
train with several values of k,
compare validation accuracy,
keep the best model.

Part VII — Using Classification Learner

18. App-based KNN workflow

MATLAB’s Classification Learner app lets you:

import data,
select predictors and response,
train nearest-neighbor classifiers,
compare models,
tune hyperparameters,
inspect results.

To open it:

classificationLearner

Nearest Neighbor classifiers in Classification Learner use fitcknn, and the app exposes KNN options such as number of neighbors, distance metric, distance weight, and standardization.

Part VIII — End-to-End Mini Project

19. Project: classify students as pass/fail

Here is a small complete KNN project.

clc;
clear;
close all;

% Example student dataset
StudyHours = [1;2;2;3;4;5;5;6;7;8];
Attendance = [50;55;60;65;70;75;80;85;90;95];
Result = categorical([0;0;0;0;0;1;1;1;1;1]);

X = [StudyHours Attendance];
Y = Result;

% Train/test split
rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN model
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Distance', 'euclidean', ...
    'Standardize', 1);

% Predict
YPred = predict(Mdl, XTest);

% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

% Confusion chart
confusionchart(YTest, YPred);
title('Pass/Fail KNN Classification');

% Predict a new student
newStudent = [6 82];
newClass = predict(Mdl, newStudent);

disp('Predicted class for new student:');
disp(newClass);

What this project teaches

This project includes:

dataset creation,
train/test split,
KNN training,
prediction,
accuracy calculation,
classification of a new observation.

That is a strong beginner workflow.

Part IX — Common mistakes beginners make

20. Forgetting to standardize features

KNN depends on distances, so different feature scales can strongly distort results. MathWorks specifically notes that standardizing can improve the fit when predictor scales differ widely.

21. Choosing k without testing

A random choice of k can hurt performance. It is better to compare several values.

22. Evaluating only on training data

A model that looks perfect on training data may generalize poorly.

23. Ignoring distance metric choices

Euclidean distance is common, but another metric may work better depending on the data.

24. Using KNN on very large datasets without caution

Because KNN stores training data and uses it during prediction, prediction cost can grow with dataset size. MATLAB’s documentation explicitly notes that the ClassificationKNN classifier stores training data. From that, it follows that large stored training sets can make prediction more computationally heavy.

Part X — When should you use KNN?

KNN is a good choice when:

the dataset is small or medium,
you want a simple baseline classifier,
class boundaries may be irregular,
interpretability at the neighborhood level is enough.

You might avoid KNN when:

the dataset is very large,
there are many irrelevant or badly scaled features,
fast prediction is critical.

These are practical inferences from how KNN works and how MATLAB represents the classifier as a stored training-data model.

Part XI — Summary

KNN is one of the simplest classification algorithms and one of the best starting points in machine learning. In MATLAB, the main training function is fitcknn, predictions are made with predict, performance can be assessed with confusion matrices and loss, and the Classification Learner app offers a visual workflow for training and comparing nearest-neighbor models.

A good practical workflow is:

prepare and inspect the data,
standardize the predictors,
choose k,
train the KNN model,
test it on unseen data,
compare distance metrics and weights,
evaluate with accuracy, confusion matrix, or loss,
use the model for prediction.

Part XII — MATLAB cheat sheet

Train a KNN classifier

Mdl = fitcknn(X, Y, 'NumNeighbors', 5, 'Standardize', 1);

Predict labels

[label, score] = predict(Mdl, Xnew);

Compute classification loss

L = loss(Mdl, X, Y);

Open Classification Learner

classificationLearner

These commands match the documented MATLAB KNN workflow.

Practice exercises

Exercise 1

Train a KNN classifier on a small binary dataset and compute test accuracy

clc;
clear;
close all;

% Small binary dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

% Split data into training and test sets
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN model
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Standardize', 1);

% Predict on test set
YPred = predict(Mdl, XTest);

% Compute accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

Explanation

In this solution:

the data is divided into training and testing parts,
a KNN classifier is trained with k = 3,
the model predicts the class of the test set,
the final accuracy is computed.

Exercise 2

Compare the results for `k = 1`, `k = 3`, and `k = 5`

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6;
     1.5 2.5;
     6.5 7.5];

Y = [1;1;1;1;2;2;2;2;1;2];

% Split data
rng(1);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Values of k to test
kValues = [1 3 5];
accuracies = zeros(length(kValues),1);

for i = 1:length(kValues)
    Mdl = fitcknn(XTrain, YTrain, ...
        'NumNeighbors', kValues(i), ...
        'Standardize', 1);

    YPred = predict(Mdl, XTest);
    accuracies(i) = mean(YPred == YTest) * 100;
end

% Display results
disp(table(kValues', accuracies, ...
    'VariableNames', {'k','Accuracy'}));

Explanation

This solution compares three choices of k:

k = 1
k = 3
k = 5

Then it displays the accuracy for each case so you can see which one works best on the test set.

Exercise 3

Train a KNN classifier with two different distance metrics and compare the predictions

clc;
clear;
close all;

% Dataset
X = [1 2;
     2 3;
     2 1;
     3 2;
     6 7;
     7 8;
     8 7;
     7 6];

Y = [1;1;1;1;2;2;2;2];

% Train with Euclidean distance
Mdl1 = fitcknn(X, Y, ...
    'NumNeighbors', 3, ...
    'Distance', 'euclidean', ...
    'Standardize', 1);

YPred1 = predict(Mdl1, X);

% Train with Cityblock distance
Mdl2 = fitcknn(X, Y, ...
    'NumNeighbors', 3, ...
    'Distance', 'cityblock', ...
    'Standardize', 1);

YPred2 = predict(Mdl2, X);

% Display comparison
disp('Predictions with Euclidean distance:');
disp(YPred1);

disp('Predictions with Cityblock distance:');
disp(YPred2);

Explanation

This exercise shows how the distance metric can affect KNN classification.

Here we compare:

Euclidean distance
Cityblock distance

The predictions may be identical on simple data, but on more complex datasets the metric can change the results.

Exercise 4

Use the `fisheriris` dataset to build a multiclass KNN classifier

clc;
clear;
close all;

% Load iris dataset
load fisheriris

X = meas;
Y = species;

% Train multiclass KNN model
Mdl = fitcknn(X, Y, ...
    'NumNeighbors', 5, ...
    'Standardize', 1);

% Predict on all data
YPred = predict(Mdl, X);

% Compute accuracy
accuracy = mean(strcmp(YPred, Y)) * 100;
fprintf('Training Accuracy = %.2f%%\n', accuracy);

% Display confusion matrix
confusionchart(Y, YPred);
title('KNN Classification on Fisher Iris Dataset');

Explanation

This is a multiclass classification example.

meas contains the flower features,
species contains the class labels,
fitcknn trains the KNN model,
predict classifies each flower,
confusionchart helps visualize the results.

Exercise 5

Build a small end-to-end KNN project with train/test split, confusion matrix, and prediction for a new observation

clc;
clear;
close all;

% Example student dataset
StudyHours = [1;2;2;3;4;5;5;6;7;8];
Attendance = [50;55;60;65;70;75;80;85;90;95];
Result = categorical([0;0;0;0;0;1;1;1;1;1]);

X = [StudyHours Attendance];
Y = Result;

% Split dataset
rng(2);
cv = cvpartition(Y, 'HoldOut', 0.3);

XTrain = X(training(cv), :);
YTrain = Y(training(cv), :);

XTest = X(test(cv), :);
YTest = Y(test(cv), :);

% Train KNN model
Mdl = fitcknn(XTrain, YTrain, ...
    'NumNeighbors', 3, ...
    'Distance', 'euclidean', ...
    'Standardize', 1);

% Predict on test set
YPred = predict(Mdl, XTest);

% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);

% Confusion matrix
confusionchart(YTest, YPred);
title('Pass/Fail KNN Classification');

% Predict a new student
newStudent = [6 82];
newClass = predict(Mdl, newStudent);

disp('Predicted class for new student:');
disp(newClass);

Explanation

This solution is a complete small KNN project.

It includes:

building a dataset,
splitting into training and testing sets,
training the KNN classifier,
evaluating it with accuracy,
displaying a confusion matrix,
predicting the class of a new observation.

Short recap

Exercise 1

Train a basic KNN classifier and compute test accuracy.

Exercise 2

Compare several values of k.

Exercise 3

Compare different distance metrics.

Exercise 4

Build a multiclass KNN model using the iris dataset.

Exercise 5

Complete mini-project with train/test split, confusion matrix, and new prediction.

0. Introduction

1. What is KNN?

2. Why use KNN?

3. Main concepts behind KNN

3.1 The value of k

3.2 Distance metric

3.3 Distance weighting

3.4 Standardization

4. MATLAB functions and tools you need to know

Part I — First KNN Classification Example

5. Simple KNN example

Explanation

6. Visualizing the data

7. Why KNN is called a lazy learner

Part II — Train/Test Workflow

8. Splitting data into training and test sets

Explanation

9. Confusion matrix

Explanation

11. Plotting accuracy vs k

13. Using distance weighting

Part V — Real MATLAB Dataset Example

14. Multiclass KNN with the iris dataset

Explanation

15. Train/test split with iris

Part VI — Model Evaluation in MATLAB

16. Using classification loss

Explanation

17. Cross-validation idea

Part VII — Using Classification Learner

18. App-based KNN workflow

Part VIII — End-to-End Mini Project

19. Project: classify students as pass/fail

What this project teaches

Part IX — Common mistakes beginners make

20. Forgetting to standardize features

21. Choosing k without testing

22. Evaluating only on training data

23. Ignoring distance metric choices

24. Using KNN on very large datasets without caution

Part X — When should you use KNN?

Part XI — Summary

Part XII — MATLAB cheat sheet

Train a KNN classifier

Predict labels

Compute classification loss

Open Classification Learner

Practice exercises

Exercise 1

Train a KNN classifier on a small binary dataset and compute test accuracy

Explanation

Exercise 2

Compare the results for k = 1, k = 3, and k = 5

Explanation

Exercise 3

Train a KNN classifier with two different distance metrics and compare the predictions

Explanation

Exercise 4

Use the fisheriris dataset to build a multiclass KNN classifier

Explanation

Exercise 5

Build a small end-to-end KNN project with train/test split, confusion matrix, and prediction for a new observation

Explanation

Short recap

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Compare the results for `k = 1`, `k = 3`, and `k = 5`

Use the `fisheriris` dataset to build a multiclass KNN classifier