Logistic Regression is one of the most important supervised machine learning algorithms for classification. Unlike linear regression, which predicts continuous values, logistic regression predicts the probability of belonging to a class, then converts that probability into a class label. In MATLAB, a common documented path for binary GLM logistic regression is fitglm, while efficiently trained logistic regression in Classification Learner uses fitclinear for binary data and fitcecoc for multiclass data. MATLAB also provides the Classification Learner app for GUI-based workflows.
This tutorial explains what logistic regression is, when to use it, how it works conceptually, and how to implement it in MATLAB with practical examples for binary and multiclass classification. The MATLAB-specific parts below are aligned with current MathWorks documentation on fitglm, Classification Learner, and classifier options.
Logistic Regression is a supervised learning algorithm used mainly for classification problems. It models class probabilities from a linear combination of predictors, and in MATLAB’s Classification Learner, logistic regression classifiers are described as modeling class probabilities as a function of that linear combination. Binary GLM logistic regression there is based on fitglm.
In simple terms:
Logistic Regression is popular because it is:
MathWorks specifically describes logistic regression as a popular classifier to try because it is easy to interpret. It also distinguishes standard binary GLM logistic regression from more efficient logistic regression options for larger datasets.
It is especially useful when:
Instead of predicting a raw numeric value like linear regression, logistic regression predicts a probability between 0 and 1 for a class. MATLAB’s logistic regression workflows are built around class probability modeling.
A probability is converted into a label using a threshold. In common binary classification practice that is often 0.5, and in Classification Learner the predicted class is assigned according to the class probabilities. This means logistic regression produces a classification boundary in predictor space.
Like linear regression, logistic regression starts from a weighted sum of predictors:
$z = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n$
Then it passes this through a logistic transformation to convert it into a probability.
In MATLAB, fitglm creates a generalized linear model, and logistic regression is one of the supported GLM families. The typical binary logistic regression setup uses a binomial distribution and logit link. This is part of the fitglm workflow documented by MathWorks.
For logistic regression in MATLAB, the most useful tools are:
fitglm → binary GLM logistic regression,predict → predicted probabilities or fitted responses from the model,fitclinear → efficient logistic regression for binary classification in larger problems,fitcecoc → multiclass classification using binary learners,Classification Learner → GUI-based classification workflow.MathWorks’ classifier options page explicitly states that binary GLM logistic regression in Classification Learner uses fitglm, and efficient logistic regression uses fitclinear for binary data and fitcecoc for multiclass data.
Let us begin with a small binary classification example.
clc;
clear;
close all;
% Example data
X = [1; 2; 3; 4; 5; 6; 7; 8];
Y = [0; 0; 0; 0; 1; 1; 1; 1];
% Fit logistic regression model
mdl = fitglm(X, Y, 'Distribution', 'binomial');
% Display model
disp(mdl);This code fits a binary generalized linear model using fitglm with a binomial distribution, which is the standard MATLAB route for GLM logistic regression. fitglm(X,y) returns a generalized linear regression model fit to predictors X and response y, and extra options can be specified through model settings.
A major strength of logistic regression is that it predicts probabilities, not only labels.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8];
Y = [0; 0; 0; 0; 1; 1; 1; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
% Predicted probabilities
p = predict(mdl, X);
disp('Predicted probabilities:');
disp(p);With logistic regression, the output of predict is interpreted as the model’s fitted response, which for binomial logistic regression is the estimated probability of the positive class. This fits with how MathWorks describes logistic classifiers as modeling class probabilities.
To turn probabilities into final class predictions, we compare them to a threshold.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8];
Y = [0; 0; 0; 0; 1; 1; 1; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
p = predict(mdl, X);
% Convert probabilities to labels
YPred = double(p >= 0.5);
disp(table(X, Y, p, YPred));This shows the typical idea of logistic classification: estimate the class probability and then map it to a label. In MATLAB app-based workflows, the class with the larger assigned probability is selected as the predicted class.
A useful visualization for one predictor is the S-shaped probability curve.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8];
Y = [0; 0; 0; 0; 1; 1; 1; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
xFine = linspace(min(X), max(X), 100)';
pFine = predict(mdl, xFine);
plot(X, Y, 'o');
hold on;
plot(xFine, pFine, '-', 'LineWidth', 2);
xlabel('X');
ylabel('Probability / Class');
title('Logistic Regression Curve');
legend('Training Data', 'Predicted Probability');
grid on;This graph helps you see how the predicted probability changes as the predictor increases. It is one of the clearest ways to understand logistic regression with a single feature.
In machine learning, we should evaluate the model on unseen data.
clc;
clear;
close all;
% Example dataset
X = [1; 2; 3; 4; 5; 6; 7; 8; 2.5; 6.5];
Y = [0; 0; 0; 0; 1; 1; 1; 1; 0; 1];
% Random split
rng(1);
idx = randperm(length(X));
trainIdx = idx(1:7);
testIdx = idx(8:end);
XTrain = X(trainIdx);
YTrain = Y(trainIdx);
XTest = X(testIdx);
YTest = Y(testIdx);
% Train model
mdl = fitglm(XTrain, YTrain, 'Distribution', 'binomial');
% Predict probabilities
pTest = predict(mdl, XTest);
% Convert to labels
YPred = double(pTest >= 0.5);
% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);
disp(table(XTest, YTest, pTest, YPred));This basic workflow mirrors what MATLAB’s Classification Learner also automates: import data, choose validation, train the model, and evaluate validation performance. The app’s default validation scheme is 5-fold cross-validation.
A confusion matrix shows the number of correct and incorrect predictions.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8; 2.5; 6.5];
Y = [0; 0; 0; 0; 1; 1; 1; 1; 0; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
p = predict(mdl, X);
YPred = double(p >= 0.5);
cm = confusionmat(Y, YPred);
disp('Confusion Matrix:');
disp(cm);
confusionchart(Y, YPred);
title('Confusion Matrix');Confusion matrices are a standard way to evaluate classification models, especially when raw accuracy is not enough.
<hr>Real classification tasks usually depend on more than one variable. For example, predicting whether a patient has a condition might depend on age, weight, blood pressure, and other features.
clc;
clear;
close all;
% Two predictors
X1 = [1;2;3;4;5;6;7;8];
X2 = [8;7;6;5;4;3;2;1];
% Response
Y = [0;0;0;0;1;1;1;1];
% Build table
tbl = table(X1, X2, Y);
% Fit logistic regression
mdl = fitglm(tbl, 'Y ~ X1 + X2', 'Distribution', 'binomial');
disp(mdl);MathWorks documents formula-based modeling with fitglm, including table workflows where you can specify predictors and response using model formulas.
clc;
clear;
close all;
X1 = [1;2;3;4;5;6;7;8];
X2 = [8;7;6;5;4;3;2;1];
Y = [0;0;0;0;1;1;1;1];
tbl = table(X1, X2, Y);
mdl = fitglm(tbl, 'Y ~ X1 + X2', 'Distribution', 'binomial');
newData = table([2;6], [7;3], 'VariableNames', {'X1','X2'});
pNew = predict(mdl, newData);
YPredNew = double(pNew >= 0.5);
disp(table(newData.X1, newData.X2, pNew, YPredNew, ...
'VariableNames', {'X1','X2','Probability','PredictedClass'}));This is the practical prediction workflow you would use after training a binary logistic regression model.
patients data workflowMathWorks provides an example showing Classification Learner with the patients dataset for binary GLM logistic regression. The example uses predictors like Age, Diastolic, Height, Systolic, and Weight, and response Gender.
Below is a code-based version:
clc;
clear;
close all;
load patients
% Predictors
tbl = table(Age, Diastolic, Height, Systolic, Weight, Gender);
% Remove missing values if needed
tbl = rmmissing(tbl);
% Fit binary logistic regression
mdl = fitglm(tbl, 'Gender ~ Age + Diastolic + Height + Systolic + Weight', ...
'Distribution', 'binomial');
disp(mdl);
% Predicted probabilities
p = predict(mdl, tbl);
% Convert probabilities into class labels
% Depending on class coding, inspect probabilities first in practice
YPred = strings(height(tbl),1);
YPred(p >= 0.5) = string(unique(tbl.Gender)(2));
YPred(p < 0.5) = string(unique(tbl.Gender)(1));
disp('First few predicted probabilities:');
disp(p(1:10));This example is mainly pedagogical. In real practice, when the response is categorical text labels, you should verify how MATLAB encodes the response internally before manually assigning labels from probabilities.
Accuracy is the percentage of correctly classified examples.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8; 2.5; 6.5];
Y = [0; 0; 0; 0; 1; 1; 1; 1; 0; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
p = predict(mdl, X);
YPred = double(p >= 0.5);
accuracy = mean(YPred == Y) * 100;
fprintf('Accuracy = %.2f%%\n', accuracy);Accuracy is easy to understand, but it can be misleading when classes are imbalanced.
For many classification tasks, especially imbalanced ones, accuracy is not enough. MATLAB’s recent Classification Learner metrics include precision, recall, and F1 score among model performance metrics in the app.
You can compute them manually:
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6; 7; 8; 2.5; 6.5];
Y = [0; 0; 0; 0; 1; 1; 1; 1; 0; 1];
mdl = fitglm(X, Y, 'Distribution', 'binomial');
p = predict(mdl, X);
YPred = double(p >= 0.5);
TP = sum((Y == 1) & (YPred == 1));
TN = sum((Y == 0) & (YPred == 0));
FP = sum((Y == 0) & (YPred == 1));
FN = sum((Y == 1) & (YPred == 0));
precision = TP / (TP + FP);
recall = TP / (TP + FN);
f1 = 2 * (precision * recall) / (precision + recall);
fprintf('Precision = %.4f\n', precision);
fprintf('Recall = %.4f\n', recall);
fprintf('F1-score = %.4f\n', f1);These metrics often tell a more useful story than accuracy alone.
fitclinearMathWorks distinguishes standard binary GLM logistic regression from efficient logistic regression. According to the classifier options page, efficient logistic regression uses fitclinear for binary class data and is recommended when you have many predictors or many observations, trading some accuracy for faster training.
Example:
clc;
clear;
close all;
% Example matrix predictors
X = [1 5;
2 4;
3 3;
4 2;
5 1;
6 2;
7 3;
8 4];
Y = [0;0;0;0;1;1;1;1];
% Efficient logistic regression
mdl = fitclinear(X, Y, ...
'Learner', 'logistic', ...
'Regularization', 'ridge');
[label, score] = predict(mdl, X);
disp('Predicted labels:');
disp(label);
disp('Scores:');
disp(score);This is a good option when scaling up beyond small demonstration datasets.
Yes, but multiclass classification is handled differently from simple binary logistic regression. In MATLAB’s classifier options, efficient logistic regression for multiclass data uses fitcecoc.
A simple practical way is to use fitcecoc with linear learners:
clc;
clear;
close all;
load fisheriris
X = meas;
Y = species;
% Multiclass linear classification via ECOC
t = templateLinear('Learner', 'logistic');
Mdl = fitcecoc(X, Y, 'Learners', t);
YPred = predict(Mdl, X);
accuracy = mean(strcmp(YPred, Y)) * 100;
fprintf('Training Accuracy = %.2f%%\n', accuracy);
confusionchart(Y, YPred);
title('Multiclass Logistic-Style ECOC Classification');This is a practical multiclass workflow in MATLAB when you want a linear classifier with logistic-style learners.
MATLAB’s Classification Learner app lets you import data, choose predictors and response, set validation schemes, train logistic regression classifiers, compare models, assess results, and investigate predictor contributions.
Open it with:
classificationLearnerTypical steps:
MathWorks’ example explicitly shows training a binary GLM logistic regression model in Classification Learner, with 5-fold cross-validation as the default validation choice.
Here is a small complete project.
clc;
clear;
close all;
% Example student data
StudyHours = [1;2;2;3;4;5;5;6;7;8];
Attendance = [50;55;60;65;70;75;80;85;90;95];
Pass = [0;0;0;0;0;1;1;1;1;1];
tbl = table(StudyHours, Attendance, Pass);
% Train/test split
rng(2);
idx = randperm(height(tbl));
trainIdx = idx(1:7);
testIdx = idx(8:end);
trainTbl = tbl(trainIdx, :);
testTbl = tbl(testIdx, :);
% Train logistic regression
mdl = fitglm(trainTbl, 'Pass ~ StudyHours + Attendance', ...
'Distribution', 'binomial');
disp(mdl);
% Predict probabilities on test set
pTest = predict(mdl, testTbl(:, {'StudyHours','Attendance'}));
% Convert probabilities to labels
YPred = double(pTest >= 0.5);
YTest = testTbl.Pass;
% Accuracy
accuracy = mean(YPred == YTest) * 100;
fprintf('Test Accuracy = %.2f%%\n', accuracy);
% Confusion matrix
confusionchart(YTest, YPred);
title('Pass/Fail Logistic Regression');
% Predict a new student
newStudent = table(6, 82, 'VariableNames', {'StudyHours','Attendance'});
pNew = predict(mdl, newStudent);
classNew = double(pNew >= 0.5);
fprintf('Predicted probability of passing = %.4f\n', pNew);
fprintf('Predicted class = %d\n', classNew);This project includes dataset creation, train/test split, model fitting, probability prediction, class conversion, accuracy calculation, and inference for a new example.
Logistic regression is for classification, not for predicting continuous numeric values.
The raw output is usually a probability, not the final class label.
For imbalanced classes, precision, recall, and F1-score can matter more than accuracy. MATLAB’s app now exposes these metrics directly.
A model can look good on training data and still generalize poorly. Classification Learner’s default 5-fold cross-validation exists to help protect against overfitting.
For a small and interpretable binary GLM, fitglm is a natural choice. For many predictors or many observations, MathWorks suggests considering efficient logistic regression with fitclinear.
Logistic Regression is one of the best starting points for classification. In MATLAB, binary GLM logistic regression is commonly handled with fitglm, efficient logistic regression with fitclinear, and multiclass workflows with fitcecoc. The Classification Learner app provides a no-code route to import data, validate models, compare classifiers, and export the trained result.
A good practical workflow is:
fitglmmdl = fitglm(X, Y, 'Distribution', 'binomial');
p = predict(mdl, Xnew);
YPred = double(p >= 0.5);mdl = fitclinear(X, Y, 'Learner', 'logistic');
[label, score] = predict(mdl, Xnew);t = templateLinear('Learner', 'logistic');
mdl = fitcecoc(X, Y, 'Learners', t);
YPred = predict(mdl, Xnew);
classificationLearnerThese code patterns match the documented MATLAB ecosystem for logistic regression and related classifier workflows.
patients dataset or another binary dataset and train a classification model using logistic regression, then predict the class of new observations
With categorical responses like Gender, MATLAB may internally choose which category is treated as the positive class. So in real practice, after training, you should verify carefully which class corresponds to higher predicted probabilities before assigning final names.
Binary logistic regression with one predictor and a plotted logistic curve.
Logistic regression with two predictors and predicted probabilities.
Train/test split and accuracy calculation.
Confusion matrix, precision, recall, and F1-score.
Real dataset example using patients and prediction for new observations.