0) Introduction

Linear Regression is one of the most important and widely used machine learning algorithms for regression tasks. In scikit-learn, the main estimator is LinearRegression, which fits a linear model to minimize the residual sum of squares between the observed targets and the predictions made by the linear approximation.

1) What Linear Regression does

Linear Regression tries to model the relationship between one or more input features and a continuous target value using a straight-line formula.

In mathematical form:

$\hat{y} = w_0 + w_1x_1 + w_2x_2 + \dots + w_px_p$

Scikit-learn’s linear models guide describes this exactly: the predicted value is a linear combination of the features, where the coefficients are stored in coef_ and the intercept is stored in intercept_.

Example intuition

Suppose you want to predict a house price using:

A linear regression model might learn something like:

Then it combines those effects into one numeric prediction.

2) Why Linear Regression is useful

Linear Regression is popular because it is:

It is often the first model people try when the target is numeric, because it gives clear coefficient-based explanations and trains quickly. Scikit-learn documents LinearRegression as ordinary least squares linear regression.

3) Simple linear regression vs multiple linear regression

Simple linear regression

Uses one input feature.

Example:

Formula:

$\hat{y} = w_0 + w_1x$

Multiple linear regression

Uses more than one input feature.

Example:

Formula:

$\hat{y} = w_0 + w_1x_1 + w_2x_2 + w_3x_3$

Both are handled by the same LinearRegression estimator in scikit-learn.

4) What the coefficients mean

After training, a linear regression model gives you:

Scikit-learn’s linear model guide explains that coef_ stores the feature weights and intercept_ stores the independent term.

Interpretation

If a coefficient is positive:

If a coefficient is negative:

Example:

This means:

This interpretation works best when the model assumptions are reasonably satisfied.

5) What “ordinary least squares” means

Scikit-learn states that LinearRegression fits a model by minimizing the residual sum of squares.

Residual =

actual value−predicted value\text{actual value} - \text{predicted value}actual value−predicted value

So the model tries to make the squared errors as small as possible.

Why square them?

This is why ordinary least squares is often abbreviated as OLS.

6) When Linear Regression works well

Linear Regression works best when:

It can still be useful as a baseline even when the relationship is not perfectly linear, because it is quick and interpretable.

7) Main assumptions of Linear Regression

In practical machine learning, Linear Regression is often used even when assumptions are not perfectly met. But it is still helpful to know the classic assumptions:

These assumptions are part of the standard statistical interpretation of linear regression. Scikit-learn focuses more on prediction than formal inference, but the linear structure of the model remains the same.

Important note

Scikit-learn’s LinearRegression is aimed at prediction, not full statistical inference like p-values or confidence intervals.

Part I — First Example

8) Install required libraries

pip install numpy pandas matplotlib scikit-learn

9) A simple Linear Regression example

We will use a synthetic dataset.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Create synthetic data
rng = np.random.RandomState(42)
X = 2 * rng.rand(200, 1)
y = 4 + 3 * X[:, 0] + rng.randn(200)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Build model
model = LinearRegression()

# Train
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Coefficient:", model.coef_)
print("Intercept:", model.intercept_)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

train_test_split is scikit-learn’s standard utility for splitting data into random train and test subsets, LinearRegression fits the least-squares model, and mean_squared_error plus r2_score are standard regression metrics.

10) Plot the regression line

plt.figure(figsize=(8, 6))
plt.scatter(X_test, y_test, label="Actual data")
plt.plot(X_test, y_pred, linewidth=2, label="Regression line")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression Example")
plt.legend()
plt.show()

This lets you see the fitted straight-line relationship between the feature and the target.

Part II — A Real Dataset Example

11) Linear Regression on California housing-style tabular data

A common workflow is to use a real regression dataset with multiple features.

import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

The LinearRegression estimator supports multiple features directly, and the train/test workflow should always split before any preprocessing to avoid leakage. Scikit-learn’s common pitfalls guide explicitly warns to split data before preprocessing steps.

12) Inspect coefficients

coef_table = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": model.coef_
})

print(coef_table.sort_values(by="Coefficient", key=abs, ascending=False))
print("Intercept:", model.intercept_)

This is one of the biggest advantages of Linear Regression: it is easy to inspect and explain.

Part III — Evaluation Metrics

13) Mean Squared Error (MSE)

Scikit-learn defines mean_squared_error as the mean squared error regression loss.

Formula idea:

$MSE = \frac{1}{n}\sum (y - \hat{y})^2$

Interpretation:

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)

14) Root Mean Squared Error (RMSE)

Scikit-learn provides root_mean_squared_error, added in version 1.4.

Interpretation:

from sklearn.metrics import root_mean_squared_error

rmse = root_mean_squared_error(y_test, y_pred)
print("RMSE:", rmse)

15) R² score

Scikit-learn documents r2_score as the coefficient of determination, where the best possible score is 1.0, a constant-mean predictor gets 0.0 in the usual non-constant-target case, and values can be negative if the model is worse than that baseline.

Interpretation:

from sklearn.metrics import r2_score

r2 = r2_score(y_test, y_pred)
print("R²:", r2)

16) MAE

You can also use Mean Absolute Error.

from sklearn.metrics import mean_absolute_error

mae = mean_absolute_error(y_test, y_pred)
print("MAE:", mae)

MAE is often easier to interpret because it uses absolute errors instead of squared errors. Scikit-learn’s model evaluation guide includes multiple regression metrics for exactly these tradeoffs.

Part IV — Train/Test Split and Good Practice

17) Why train/test split matters

Scikit-learn’s train_test_split utility is the standard way to create training and test subsets.

The model should be trained on one part of the data and evaluated on separate unseen data. This gives a better estimate of real-world performance.

Bad practice:

Better practice:

Scikit-learn’s common pitfalls guide explicitly warns against data leakage and recommends splitting before preprocessing.

Part V — Feature Scaling and Linear Regression

18) Does Linear Regression need scaling?

Plain LinearRegression does not require scaling to work. The least-squares solution is still valid without scaling. But scaling can help when:

This is a practical guideline based on how linear models and preprocessing work in scikit-learn. The OLS solution itself does not depend on distance geometry the way KNN or SVM does.

19) Example with scaling in a pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("lr", LinearRegression())
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

print("R²:", r2_score(y_test, y_pred))

Using a Pipeline keeps preprocessing and prediction together and helps avoid leakage mistakes. Scikit-learn recommends this style for safe workflows.

Part VI — Multiple Linear Regression Workflow

20) Full workflow on a CSV file

Step 1: Load data

import pandas as pd

df = pd.read_csv("your_data.csv")

Step 2: Separate features and target

X = df.drop("target", axis=1)
y = df["target"]

Step 3: Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Step 4: Train

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

Step 5: Predict

y_pred = model.predict(X_test)

Step 6: Evaluate

from sklearn.metrics import mean_squared_error, r2_score

print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

Step 7: Interpret coefficients

coef_table = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": model.coef_
})
print(coef_table)
print("Intercept:", model.intercept_)

This is the standard scikit-learn regression workflow built from train_test_split, LinearRegression, and regression metrics.

Part VII — Visual Diagnostics

21) Plot actual vs predicted

plt.figure(figsize=(7, 7))
plt.scatter(y_test, y_pred)
plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("Actual vs Predicted")
plt.show()

If the model fits well, the points should lie roughly around a diagonal trend.

22) Residual plot

Residuals are:

$\text{residual} = y_{\text{true}} - y_{\text{pred}}$

residuals = y_test - y_pred

plt.figure(figsize=(8, 6))
plt.scatter(y_pred, residuals)
plt.axhline(0, linestyle="--")
plt.xlabel("Predicted values")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

A good linear model often shows residuals scattered around zero without a strong visible pattern.

Part VIII — Common Problems

23) Underfitting

Linear Regression is a simple model. If the true relationship is strongly nonlinear, the model may underfit.

Signs:

In such cases, you may need:

This is a practical modeling inference, not a limitation specific to scikit-learn’s implementation.

24) Multicollinearity

If two or more features are highly correlated, the coefficients can become unstable and harder to interpret.

The model may still predict reasonably well, but coefficient interpretation becomes weaker.

This is one reason people often move from plain Linear Regression to regularized variants like Ridge or Lasso.

25) Outliers

Linear Regression can be sensitive to outliers because OLS minimizes squared error, which heavily penalizes large residuals. That follows directly from scikit-learn’s description of minimizing residual sum of squares.

If outliers are a major issue, consider:

Part IX — Linear Regression vs Regularized Models

26) Ridge and Lasso context

Scikit-learn’s linear model examples and guide place ordinary least squares next to regularized alternatives such as Ridge.

LinearRegression

Ridge

Lasso

This tutorial is about plain Linear Regression, but it is useful to know where it fits in the larger family.

Part X — Strengths and Weaknesses

27) Strengths of Linear Regression

Linear Regression is strong because it is:

These strengths follow directly from the OLS model structure in scikit-learn’s linear model docs.

28) Weaknesses of Linear Regression

Its main limitations are:

These are standard implications of using a linear functional form.

Part XI — Practical Advice

29) When should you use Linear Regression?

Use it when:

30) When should you avoid it?

Be cautious when:

31) A good default starting point

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

Scikit-learn’s examples show this exact basic usage pattern for ordinary least squares.

Part XII — Mini Project Example

32) Predicting sales from advertising spend

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Example dataframe
df = pd.DataFrame({
    "tv": [230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2],
    "radio": [37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6],
    "newspaper": [69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6],
    "sales": [22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2]
})

X = df.drop("sales", axis=1)
y = df["sales"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

This is a classic kind of business-style regression problem: predict a continuous outcome from numeric inputs.

Part XIII — Summary

33) What you should remember

Linear Regression is one of the most important machine learning algorithms for numeric prediction.

Its core idea is simple:

Scikit-learn defines LinearRegression as ordinary least squares regression and stores the learned weights in coef_ and intercept_.

The most important practical rules are:

These recommendations align with the current scikit-learn documentation and examples for linear models and regression metrics.

34) Final ready-to-use template

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# X, y = your data

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("MSE:", mean_squared_error(y_test, y_pred))
print("R²:", r2_score(y_test, y_pred))

35) Practice exercises

Exercise 1

Train a LinearRegression model on a synthetic dataset and report MSE and R².

Exercise 2

Use a real multi-feature regression dataset and inspect the learned coefficients.

Exercise 3

Plot the regression line for a simple one-feature dataset.

Exercise 4

Create an actual-vs-predicted plot and a residual plot.

Exercise 5

Compare plain LinearRegression with a scaled pipeline version.

Final summary

What each exercise teaches