Linear Regression From Scratch | Mahir Malik

Explanation

The implementation keeps the math visible instead of hiding it behind abstractions.

The repository splits the problem into two parts. The core model lives in `src/linear_regression.py`, where the class implements both batch gradient descent and a closed-form solver. The evaluation layer lives in `src/demo.py`, where synthetic data is generated, all model variants are trained, and the final comparisons are plotted.

The gradient-descent path starts with zero-initialized weights and bias, standardizes the feature matrix, runs a fully vectorized forward pass, computes mean squared error, derives gradients with matrix multiplication, and updates the parameters for 1,500 iterations. After optimization, the learned weights are mapped back into the original feature space so `predict()` works on raw inputs without requiring the caller to re-standardize data by hand.

The closed-form path uses the normal equation on an augmented design matrix with a bias column. That gives an exact solution when `X^T X` is invertible. Keeping both implementations side by side makes the repo useful as a learning artifact: one path shows how optimization behaves iteratively, while the other shows the direct analytical answer.

Standardize the training feature

The model computes feature mean and standard deviation, guards against zero variance, and trains in normalized space for stable gradients.

Run a vectorized training loop

Each iteration computes predictions with `X @ w + b`, logs MSE, derives `dw` and `db` with matrix ops, then updates parameters with a fixed learning rate.

Transform weights back to raw space

After training, the normalized coefficients are rescaled so inference works directly on original inputs instead of standardized ones.

Cross-check against exact and library baselines

The repo compares gradient descent against a normal-equation solve and scikit-learn's `LinearRegression` to verify both correctness and parity.

Results

The from-scratch model lands exactly on the same validation metrics as the baselines.

The repo evaluates three paths on the same synthetic regression setup: gradient descent, a closed-form solve, and scikit-learn. All three produce an identical validation MSE of 246.121793 and an R² of 0.968085. That is the strongest signal in the project because it confirms that the custom implementation is not approximately correct, but numerically aligned with a trusted reference.

The gradient-descent run also behaves cleanly during optimization. Training loss starts at 6521.7356, drops to 224.5569 by iteration 100, and finishes at 224.5569. The held-out test MSE settles at 246.1218, which lines up with the saved convergence plot in the repository.

MethodMSER²Reading

Gradient Descent246.1217930.968085Matches the library baseline while exposing every optimization step.

Closed-Form246.1217930.968085Arrives at the same fit analytically through the normal equation.

Scikit-Learn246.1217930.968085Acts as the reference implementation for parity checking.

Fit comparison

The learned regression line from the custom gradient-descent model overlaps the scikit-learn baseline, with test points distributed tightly around the same trend.

Loss convergence

Most of the optimization progress happens early. The curve collapses quickly and then flattens, which is exactly what a well-conditioned convex objective should do.

Intuition

The value of the project is not the algorithm alone, but what it teaches about learning systems.

Linear regression looks simple on paper, but it quietly teaches almost every habit that modern ML systems still rely on. You define a model, measure error, compute gradients, and move parameters in the direction that lowers loss. That forward -> loss -> gradient -> update loop is the same structure that later scales into neural nets.

Feature standardization is the most important practical idea in this repo. Without scaling, one feature can dominate the geometry of the loss surface and make gradient descent unstable or painfully slow. Standardizing the input makes the optimization landscape easier to move across, so a reasonable learning rate behaves predictably.

The evaluation result is the real takeaway: the from-scratch optimizer, the closed-form solver, and scikit-learn all land on the same answer. That confirms the math, but it also builds intuition. If three routes converge to the same fit, then the implementation is not just visually plausible, it is numerically aligned with a trusted baseline.

Core takeaway

Rebuilding a simple model from scratch is useful because it forces the whole learning process into view. Once the mechanics are visible here, larger model training loops become much less mysterious.