Linear Regression is one of the most important machine learning algorithms for predicting continuous values. In MATLAB, the main function for fitting linear regression models is fitlm, which returns a LinearModel object you can inspect, evaluate, and use for prediction. MATLAB also provides related tools such as predict, stepwiselm, regress, and the Regression Learner app for GUI-based workflows.
This tutorial explains the theory of linear regression, how it works, when to use it, and how to implement it in MATLAB with practical examples.
Linear Regression is a supervised learning method used to model the relationship between a response variable and one or more predictor variables. The goal is to predict a continuous output such as price, salary, temperature, or sales. MathWorks defines a linear regression model as one that describes the relationship between a dependent variable y and one or more independent variables X.
In simple terms:
A general linear regression equation looks like this:
$y = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n$
where:
y is the predicted output,b0 is the intercept,b1, b2, ... are coefficients,x1, x2, ... are predictor valuesLinear Regression is popular because it is:
It is especially useful when the target variable is numeric and when the relationship between predictors and the response is approximately linear.
This is the output you want to predict, such as house price.
These are the inputs used to explain or predict the response.
The intercept is the predicted value of y when all predictors are zero.
Each coefficient tells you how much the response changes when the corresponding predictor changes by one unit, while the other predictors stay fixed.
Residuals are the differences between real values and predicted values:
Residual=yactual−ypredicted\text{Residual} = y_{\text{actual}} - y_{\text{predicted}}Residual=yactual−ypredicted
Residual analysis is an important part of assessing model quality, and MathWorks recommends checking residual plots and goodness-of-fit measures such as R^2 and adjusted R^2.
The main MATLAB functions and tools for linear regression are:
fitlm → fit linear regression modelspredict → predict outputs for new datastepwiselm → build models with stepwise variable selectionregress → classical multiple linear regression functionRegression Learner app → visual interface for regression modelingLinearModel object → inspect coefficients, fitted values, residuals, and model statistics Let us begin with a simple dataset where one variable predicts another.
clc;
clear;
close all;
% Example data
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
% Fit linear regression model
mdl = fitlm(X, Y);
% Display model
disp(mdl);Here:
X contains one predictor,Y contains the output values,fitlm(X,Y) fits a linear regression model.According to MathWorks, fitlm(X,y) returns a linear regression model fit to predictor matrix X and response y.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
mdl = fitlm(X, Y);
% Scatter plot of data
plot(X, Y, 'o');
hold on;
% Predicted values
YPred = predict(mdl, X);
% Regression line
plot(X, YPred, '-');
xlabel('X');
ylabel('Y');
title('Simple Linear Regression');
legend('Data', 'Regression Line');
grid on;This is often the first useful visualization: points show the real data and the line shows the fitted linear trend.
When you display mdl, MATLAB gives information such as:
R-squared.MathWorks provides dedicated documentation for interpreting linear regression outputs and understanding the coefficient table and goodness-of-fit statistics.
Example:
disp(mdl.Coefficients);This displays the coefficient table.
Once the model is trained, you can predict outputs for new observations using predict.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
mdl = fitlm(X, Y);
% New predictor values
Xnew = [7; 8; 9];
% Predict
YPredNew = predict(mdl, Xnew);
disp('Predicted values:');
disp(YPredNew);
MathWorks documents predict(mdl,Xnew) as the standard way to return predicted responses from a linear regression model.
Multiple linear regression uses more than one predictor.
Example idea:
x1 = size of a housex2 = number of roomsx3 = age of the housey = priceThe model becomes:
$y = b_0 + b_1x_1 + b_2x_2 + b_3x_3$
clc;
clear;
close all;
% Predictor matrix: columns are features
X = [1 5;
2 4;
3 6;
4 8;
5 7;
6 9];
% Response
Y = [10; 12; 13; 16; 17; 20];
% Fit multiple linear regression
mdl = fitlm(X, Y);
disp(mdl);
Here:
fitlm automatically handles multiple predictors when X has multiple columns. MATLAB works very well with tables, which often makes code more readable.
clc;
clear;
close all;
Size = [50; 60; 70; 80; 90; 100];
Rooms = [2; 3; 3; 4; 4; 5];
Price = [100; 120; 140; 160; 180; 210];
tbl = table(Size, Rooms, Price);
mdl = fitlm(tbl, 'Price ~ Size + Rooms');
disp(mdl);MathWorks documents fitlm(tbl) and formula-based workflows such as fitlm(tbl,modelspec) for table inputs. It also notes that when you pass a table without specifying otherwise, the last variable is treated as the response.
clc;
clear;
close all;
Size = [50; 60; 70; 80; 90; 100];
Rooms = [2; 3; 3; 4; 4; 5];
Price = [100; 120; 140; 160; 180; 210];
tbl = table(Size, Rooms, Price);
mdl = fitlm(tbl, 'Price ~ Size + Rooms');
newData = table([75; 95], [3; 5], 'VariableNames', {'Size','Rooms'});
YPred = predict(mdl, newData);
disp('Predicted prices:');
disp(YPred);One of the most common measures of regression quality is R-squared. It measures how much of the variability in the response is explained by the model.
In MATLAB:
mdl.RsquaredMathWorks highlights R^2 and adjusted R^2 as core goodness-of-fit measures in linear regression analysis.
Example:
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
mdl = fitlm(X, Y);
disp('R-squared:');
disp(mdl.Rsquared.Ordinary);
disp('Adjusted R-squared:');
disp(mdl.Rsquared.Adjusted);These are very common regression metrics:
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
mdl = fitlm(X, Y);
YPred = predict(mdl, X);
MAE = mean(abs(Y - YPred));
MSE = mean((Y - YPred).^2);
RMSE = sqrt(MSE);
fprintf('MAE = %.4f\n', MAE);
fprintf('MSE = %.4f\n', MSE);
fprintf('RMSE = %.4f\n', RMSE);These metrics are especially useful when comparing several regression models.
Residual plots help you understand whether the model assumptions are reasonable.
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
mdl = fitlm(X, Y);
figure;
plotResiduals(mdl, 'fitted');
title('Residuals vs Fitted Values');
figure;
plotResiduals(mdl, 'probability');
title('Normal Probability Plot of Residuals');
MathWorks specifically recommends examining residuals and looking for patterns when assessing the quality of a linear regression fit.
A model may fit the training data well but still perform poorly on new unseen data. That is why a train/test split is important in machine learning.
clc;
clear;
close all;
% Example dataset
X = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10];
Y = [1.2; 2.1; 2.9; 4.1; 5.2; 5.9; 7.1; 8.2; 8.8; 10.1];
% Split index
rng(1);
idx = randperm(length(X));
trainIdx = idx(1:7);
testIdx = idx(8:end);
XTrain = X(trainIdx);
YTrain = Y(trainIdx);
XTest = X(testIdx);
YTest = Y(testIdx);
% Train model
mdl = fitlm(XTrain, YTrain);
% Predict on test set
YPred = predict(mdl, XTest);
% Test RMSE
RMSE = sqrt(mean((YTest - YPred).^2));
fprintf('Test RMSE = %.4f\n', RMSE);
% Compare real and predicted
disp(table(XTest, YTest, YPred));
This simple workflow is a good starting point for practical machine learning projects.
carsmall datasetMATLAB includes several built-in datasets. One well-known example for regression is carsmall.
clc;
clear;
close all;
load carsmall
% Create table
tbl = table(Horsepower, Weight, MPG);
% Remove missing values
tbl = rmmissing(tbl);
% Fit model: predict MPG using Horsepower and Weight
mdl = fitlm(tbl, 'MPG ~ Horsepower + Weight');
disp(mdl);
% Predicted MPG
YPred = predict(mdl, tbl(:, {'Horsepower','Weight'}));
% Plot actual vs predicted
figure;
plot(tbl.MPG, YPred, 'o');
xlabel('Actual MPG');
ylabel('Predicted MPG');
title('Actual vs Predicted MPG');
grid on;
This is a more realistic example of multiple linear regression.
Sometimes the effect of one predictor depends on another. In that case, you can include interaction terms.
clc;
clear;
close all;
load carsmall
tbl = table(Horsepower, Weight, MPG);
tbl = rmmissing(tbl);
mdl = fitlm(tbl, 'MPG ~ Horsepower * Weight');
disp(mdl);Formula syntax in fitlm supports model specifications and interactions, which is one of the reasons the LinearModel workflow is flexible.
When you have many predictors, not all of them may be useful. Stepwise regression is a variable selection method that adds or removes predictors iteratively.
MathWorks describes stepwise regression as a dimensionality-reduction method where less important variables are successively removed in an automatic iterative process, and it can be done with stepwiselm, stepwisefit, or the Regression Learner app.
clc;
clear;
close all;
load carsmall
tbl = table(Horsepower, Weight, Acceleration, MPG);
tbl = rmmissing(tbl);
mdl = stepwiselm(tbl, 'MPG ~ 1', 'Upper', 'linear');
disp(mdl);This allows MATLAB to test which predictors improve the model.
regressregressMATLAB also has the function regress, but MathWorks notes that fitlm is usually preferable when you want a richer model object and more analysis tools.
Example:
clc;
clear;
close all;
X = [1; 2; 3; 4; 5; 6];
Y = [2; 4; 5; 4; 5; 7];
% Add intercept column
Xreg = [ones(size(X)) X];
[b, bint, r, rint, stats] = regress(Y, Xreg);
disp('Coefficients:');
disp(b);
disp('Stats [R2 F p error variance]:');
disp(stats);This is more classical and lower-level than fitlm.
MATLAB provides the Regression Learner app, which lets you import data, choose validation schemes, train models, optimize hyperparameters, compare results, and inspect predictor contributions.
To open it:
regressionLearnerTypical steps:
This is very useful for beginners.
Let us build a small end-to-end linear regression project.
clc;
clear;
close all;
% Example house dataset
Size = [50; 60; 70; 80; 90; 100; 110; 120];
Rooms = [2; 3; 3; 4; 4; 5; 5; 6];
Age = [20; 18; 15; 12; 10; 8; 5; 3];
Price = [100; 120; 135; 150; 170; 190; 210; 230];
tbl = table(Size, Rooms, Age, Price);
% Train/test split
rng(2);
idx = randperm(height(tbl));
trainIdx = idx(1:6);
testIdx = idx(7:8);
trainTbl = tbl(trainIdx, :);
testTbl = tbl(testIdx, :);
% Train model
mdl = fitlm(trainTbl, 'Price ~ Size + Rooms + Age');
disp(mdl);
% Predict
YPred = predict(mdl, testTbl(:, {'Size','Rooms','Age'}));
% Evaluate
YTest = testTbl.Price;
RMSE = sqrt(mean((YTest - YPred).^2));
MAE = mean(abs(YTest - YPred));
fprintf('Test RMSE = %.4f\n', RMSE);
fprintf('Test MAE = %.4f\n', MAE);
% Show results
disp(table(YTest, YPred));
% Predict a new house
newHouse = table(95, 4, 7, 'VariableNames', {'Size','Rooms','Age'});
newPrice = predict(mdl, newHouse);
fprintf('Predicted price for new house = %.2f\n', newPrice);This project covers:
That is a strong beginner workflow.
Linear Regression is for continuous outputs, not class labels.
A high R^2 is not enough. Residual plots can reveal model problems.
This can lead to unnecessary complexity. Stepwise methods or thoughtful feature selection can help.
Extrapolation can produce unreliable predictions.
A useful predictive relationship does not automatically imply a causal relationship.
Linear Regression is a good choice when:
You may choose another model when:
Linear Regression is one of the best starting points in machine learning. In MATLAB, the standard workflow uses fitlm to fit the model, predict to generate outputs, residual plots and R^2 to evaluate quality, and tools such as stepwiselm or Regression Learner for model improvement and exploration. MathWorks also positions fitlm as a central least-squares workflow for linear regression analysis and documents it as the typical training path for linear regression models.
A practical workflow is:
fitlm,mdl = fitlm(X, Y);mdl = fitlm(X, Y);
mdl = fitlm(tbl, 'Y ~ X1 + X2');YPred = predict(mdl, Xnew);
mdl = stepwiselm(tbl, 'Y ~ 1', 'Upper', 'linear');[b, bint, r, rint, stats] = regress(Y, Xreg);
regressionLearner
carsmall dataset to predict MPG from Horsepower and Weight
stepwiselm to select useful predictors for a regression problem
Simple linear regression with one predictor and a regression line plot.
Multiple linear regression with two predictors and coefficient display.
Train/test split and RMSE calculation.
Real regression example using the carsmall dataset.
Automatic predictor selection using stepwiselm.