Normal Equation in Linear Regression Simply Explained

3 minute read

Published:

Normal Equation In Linear Regression


What is Linear Regression

Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It aims to find the best-fit line that minimizes the error between the actual values (dependent variable) and the predicted values. Linear regression helps us understand how different factors (independent variables) influence the outcome (dependent variable).

What is the Normal Equation

The Normal Equation is a direct method for finding the optimal parameters (weights and bias) in linear regression. Unlike iterative approaches like gradient descent, the Normal Equation computes the parameters analytically, eliminating the need for multiple iterations or trial-and-error processes.

Mathematical Representation:

\[\theta = (X^T X)^{-1} X^T y\]

Where:

  • \(\theta\) (theta): Vector of parameters (weights and bias)
  • \(X\): Matrix of input features (including a column of ones for the bias term)
  • \(y\): Vector of target values

Advantages of Using the Normal Equation

  1. Direct Solution:
    • The Normal Equation computes the best parameters directly without requiring iterative updates.
  2. Ease of Implementation:
    • It is simple and straightforward to understand and implement for smaller datasets.

Disadvantages of Using the Normal Equation

  1. Computational Cost:
    • For large datasets, calculating \((X^T X)^{-1}\) is computationally expensive, with a time complexity of \(O(n^3)\), where \(n\) is the number of features.
  2. Numerical Instability:
    • If \(X^T X\) is not invertible or poorly conditioned, the computation may result in numerical instability. Techniques like regularization (e.g., Ridge Regression) can mitigate this issue.

Example in Python

Here is an example implementation of the Normal Equation for linear regression in Python:

import numpy as np

# Input Data
X = np.array([[1], [2], [3], [4]])
y = np.array([2.5, 4.5, 6.5, 8.5])

# Normal Linear Regression
class NormalLinear:

    def fit(self, X, y):
        # Adding a column of ones for the bias term
        X_add_bias = np.c_[np.ones((X.shape[0], 1)), X]
        # Compute theta using the Normal Equation
        X_transpose = X_add_bias.T
        X_transpose_X = X_transpose.dot(X_add_bias)
        X_transpose_y = X_transpose.dot(y)
        theta = np.linalg.solve(X_transpose_X, X_transpose_y)
        return theta

    def predict(self, X, theta):
        # Adding a column of ones for the bias term
        X_with_bias = np.c_[np.ones((X.shape[0], 1)), X]
        # Predicting values
        return X_with_bias.dot(theta)

# Create model
model = NormalLinear()

# Fit the model
theta = model.fit(X, y)

# Test Data
X_test = np.array([[5], [6]])

# Predictions
predictions = model.predict(X_test, theta)
print("Predictions:", predictions)

Step-by-Step Explanation

  1. Adding a Bias Term:
    • A column of ones is added to the input matrix \(X\) to account for the bias term (\(\theta_0\)). This ensures that the computed \(\theta\) includes both the bias and weights.
  2. Matrix Operations:
    • \(X^T X\): Computes the product of the transpose of \(X\) and \(X\).
    • \((X^T X)^{-1}\): Computes the inverse of the resulting matrix.
    • \(X^T y\): Computes the product of the transpose of \(X\) and the target vector \(y\).
  3. Prediction:
    • \(\hat{y} = X_{\text{with bias}} \cdot \theta\): Multiplies the input matrix (including the bias term) by the parameter vector \(\theta\) to generate predictions.

Practical Notes

  • Efficiency: The Normal Equation is suitable for small to medium-sized datasets. For very large datasets, iterative methods like gradient descent are preferred due to lower computational costs.
  • Regularization: If \(X^T X\) is not invertible, adding a small regularization term (e.g., Ridge Regression) can stabilize the computation.

Final Thoughts

The Normal Equation is an elegant and powerful tool for linear regression when dealing with small datasets. Its simplicity makes it a great starting point for understanding linear models. However, for real-world scenarios with large datasets or high-dimensional features, alternative approaches like gradient descent or regularized regression are often more practical.

Reference

For more details, refer to this excellent guide on GeeksforGeeks.