Logistic Regression From Scratch — What It Actually Does (Without Skipping the Thinking)
Breaking down Logistic Regression from first principles—why it exists to express confidence in binary outcomes, how sigmoid transforms linear scores into probabilities, and a minimal from-scratch implementation.
Why I'm Writing This
I've used Logistic Regression many times through libraries, but I realized that using a model and understanding a model are very different things.
This post is my attempt to explain Logistic Regression from first principles—focusing on why it exists and how it works internally, not just how to call it from sklearn.
The Real Problem Logistic Regression Solves
The problem is not classification.
The real problem is:
How do we express confidence when the outcome is binary?
In many real-world systems (spam detection, fraud detection, medical risk scoring), we don't just want a class label. We want to know how confident the model is.
Linear Regression fails here because:
- It produces unbounded outputs
- It cannot be interpreted as probability
Logistic Regression exists to fix this exact issue.
Why Linear Regression Fails for Classification
Linear Regression predicts a number like:
y = w·x + b
But for classification:
- Predictions like −3 or 2.7 make no sense
- There's no concept of probability
- Error is measured incorrectly using squared loss
So instead of predicting a value, we need to predict a belief.
The Core Idea Behind Logistic Regression
Logistic Regression keeps the linear model, but changes how we interpret its output.
Instead of saying:
"This number is the prediction"
We say:
"This number represents confidence before converting it into probability"
That linear score is then passed through a sigmoid function, which converts any real number into a value between 0 and 1.
This final output represents:
The probability that the input belongs to class 1
Minimal Logistic Regression (From Scratch)
Below is a minimal forward pass of Logistic Regression implemented from scratch:
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def logistic_forward(x, w, b):
z = np.dot(w, x) + b
p = sigmoid(z)
return p
x = np.array([2.0, 3.0])
w = np.array([0.5, -0.5])
b = 0.1
probability = logistic_forward(x, w, b)
print(probability)
This small piece of code captures the entire model logic:
- A linear score
- A probability mapping
- No magic
Why Cross-Entropy Loss Is Used
Since the model outputs a probability, using squared error is incorrect.
Instead, Logistic Regression uses log loss, which:
- Rewards confident correct predictions
- Heavily penalizes confident wrong predictions
This makes the model statistically sound for binary outcomes.
Where This Shows Up in Real Systems
Even today, Logistic Regression is widely used in:
- Credit risk modeling
- Medical diagnosis systems
- Baseline classifiers for large ML pipelines
- Calibration layers in deep learning systems
Understanding it deeply helps in understanding all modern ML models.