Building Your First Image Classifier with PyTorch: Loss Function and Optimizer, Training, Evaluation of Neural network and model improvement
Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - II
Content:
๐งฎ Section 5: Defining the Loss Function and Optimizer
Now that we’ve built the neural network model, it’s time to teach it how to learn — and that’s where loss functions and optimizers come into play.
In this section, we’ll discuss how to measure errors, optimize weights, and prepare the model for training on the MNIST dataset.
๐น 5.1 What Is a Loss Function?
A loss function (also called a cost function) quantifies how well or poorly the model is performing.
It measures the difference between the model’s predictions and the true target values.
During training:
-
The model makes predictions.
-
The loss function calculates the error.
-
The optimizer adjusts the model’s weights to minimize that loss.
Mathematically:
[
\text{Loss} = f(y_{\text{true}}, y_{\text{pred}})
]
The smaller the loss, the better the model’s predictions.
๐น 5.2 Choosing a Loss Function for Classification
Since MNIST is a multi-class classification problem (digits 0–9), the ideal loss function is:
[
\text{Cross-Entropy Loss}
]
In PyTorch, this is implemented as:
nn.CrossEntropyLoss()
Cross-entropy measures the distance between the predicted probability distribution and the true labels.
[
L = -\sum_{i} y_i \log(\hat{y}_i)
]
Where:
-
( y_i ) = 1 if the true class is i, else 0
-
( \hat{y}_i ) = predicted probability for class i
๐น 5.3 What Is an Optimizer?
An optimizer updates the weights of the network to reduce the loss.
It uses the gradients computed during backpropagation to make small adjustments in the direction that minimizes error.
Common optimizers include:
-
SGD (Stochastic Gradient Descent)
-
Adam (Adaptive Moment Estimation)
-
RMSprop
For most tasks (including MNIST), Adam performs exceptionally well because it adapts the learning rate for each parameter automatically.
๐น 5.4 Setting Up the Loss and Optimizer in Code
Let’s add these to our PyTorch setup.
import torch.optim as optim
# Define the loss function
criterion = nn.CrossEntropyLoss()
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)
Here’s what happens:
-
criterioncalculates how far off the model’s predictions are from the true labels. -
optimizerupdates the parameters ofmodelto reduce that error over time. -
lr(learning rate) controls how big each update step should be — too high and the model might overshoot, too low and training becomes very slow.
๐น 5.5 A Quick Peek Under the Hood: How Optimizers Work
During training, these steps repeat for each batch:
-
Forward Pass: Model predicts output.
-
Loss Computation: Compute the loss between prediction and target.
-
Backward Pass: Calculate gradients with respect to loss.
-
Weight Update: Optimizer updates weights.
Mathematically, a simple weight update (SGD) looks like:
[
w := w - \eta \frac{\partial L}{\partial w}
]
Where:
-
( w ) = weight
-
( \eta ) = learning rate
-
( \frac{\partial L}{\partial w} ) = gradient of the loss with respect to weight
๐น 5.6 Summary
✅ We’ve Defined:
-
Loss Function:
nn.CrossEntropyLoss()→ measures prediction error -
Optimizer:
optim.Adam(model.parameters(), lr=0.001)→ updates model weights
✅ Next Step:
Now that our model knows how to learn, we’ll start the training process — iterating over batches of data, computing losses, and optimizing weights.
⚙️ Section 6: Training the Neural Network
Now that we’ve built the model and defined both the loss function and optimizer, it’s time to bring our neural network to life — through training.
This is where the model learns from data, adjusts its weights, and gradually improves its ability to recognize handwritten digits from the MNIST dataset.
๐น 6.1 What Happens During Training?
Training a neural network involves several iterative steps, typically repeated over epochs (full passes through the dataset).
Let’s break this process down:
๐งฉ The Training Cycle (Per Epoch)
-
Forward Pass:
The model processes a batch of input images and produces predictions. -
Compute Loss:
The loss function measures how far off the predictions are from the true labels. -
Backward Pass:
Using backpropagation, the model computes gradients — the direction and magnitude of changes needed to reduce the loss. -
Update Weights:
The optimizer adjusts model parameters (weights) using the computed gradients. -
Repeat:
Continue for all batches → then for all epochs → until the loss stops decreasing.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
๐น 6.2 Setting Up the Training Loop
We’ll now define our training loop in PyTorch, which involves:
-
Iterating through
train_loader -
Zeroing the gradients
-
Performing forward and backward passes
-
Updating weights
-
Tracking loss and accuracy
๐ง Full Training Loop Code
import torch
# Set device (GPU if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Training on: {device}")
# Move model to device
model.to(device)
# Training parameters
epochs = 5 # You can increase this for better accuracy
for epoch in range(epochs):
running_loss = 0.0
correct = 0
total = 0
# Set model to training mode
model.train()
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
# 1️⃣ Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# 2️⃣ Backward pass
optimizer.zero_grad() # Reset gradients
loss.backward() # Compute gradients
optimizer.step() # Update weights
# 3️⃣ Track statistics
running_loss += loss.item()
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
# Calculate average loss and accuracy for the epoch
epoch_loss = running_loss / len(train_loader)
accuracy = 100 * correct / total
print(f"Epoch [{epoch+1}/{epochs}] - Loss: {epoch_loss:.4f}, Accuracy: {accuracy:.2f}%")
๐งพ Example Output
Training on: cuda
Epoch [1/5] - Loss: 0.3567, Accuracy: 89.23%
Epoch [2/5] - Loss: 0.1805, Accuracy: 94.21%
Epoch [3/5] - Loss: 0.1312, Accuracy: 96.11%
Epoch [4/5] - Loss: 0.1057, Accuracy: 96.89%
Epoch [5/5] - Loss: 0.0894, Accuracy: 97.35%
As training progresses:
-
Loss decreases (model predictions improve)
-
Accuracy increases (model classifies digits correctly)
๐น 6.3 Visualizing the Loss Curve
Visualizing how the loss changes over time helps you understand whether your model is learning efficiently or overfitting.
import matplotlib.pyplot as plt
# Example: storing losses across epochs
train_losses = []
for epoch in range(epochs):
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
train_losses.append(running_loss / len(train_loader))
# Plot
plt.plot(range(1, epochs+1), train_losses, marker='o')
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss Curve")
plt.show()
This curve should decline smoothly as training progresses — a healthy indicator that the model is learning effectively.
๐น 6.4 Understanding Overfitting and Underfitting
-
Underfitting:
Model is too simple or hasn’t trained enough → both training and validation accuracy are low. -
Overfitting:
Model performs well on training data but poorly on unseen data → memorize instead of generalize.
✅ Solution Tips:
-
Increase training data (data augmentation)
-
Add regularization (dropout, weight decay)
-
Use early stopping
๐น 6.5 Saving the Trained Model
Once your model achieves good accuracy, save it for reuse without retraining:
torch.save(model.state_dict(), 'mnist_model.pth')
print("Model saved successfully!")
To load it later:
model.load_state_dict(torch.load('mnist_model.pth'))
model.eval() # Set model to evaluation mode
๐น 6.6 Summary
✅ In this section, we covered:
-
How to implement a full training loop in PyTorch
-
How to track loss and accuracy during training
-
How to visualize the loss curve for learning analysis
-
How to save and reload models for future use
Your model now understands handwritten digits! ๐
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
๐งพ Section 7: Evaluating Model Performance on Test Data
After successfully training our neural network, the next step is to evaluate how well it performs on unseen data.
Training accuracy alone isn’t enough — our goal is to ensure the model can generalize to new, unseen images.
In this section, we’ll:
-
Evaluate the model on the test dataset
-
Measure accuracy, precision, recall, and F1-score
-
Visualize a confusion matrix
-
Display sample predictions
๐น 7.1 Why Evaluation Matters
When a model performs well on training data but poorly on test data, it’s overfitting — meaning it memorized patterns rather than learning general ones.
Evaluating on a separate test set helps verify:
-
How well the model generalizes
-
Which digits are misclassified
-
Whether further tuning is needed (architecture, epochs, learning rate, etc.)
๐น 7.2 Switching to Evaluation Mode
Before testing, we must set the model to evaluation mode using:
model.eval()
This turns off features like dropout or batch normalization updates, ensuring stable inference behavior.
We’ll also disable gradient computation using:
with torch.no_grad():
This saves memory and speeds up the evaluation since gradients aren’t needed during inference.
๐น 7.3 Evaluating Model Accuracy on the Test Set
Let’s compute the overall accuracy on the MNIST test data.
from torch.utils.data import DataLoader
# Evaluation mode
model.eval()
# Initialize counters
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy:.2f}%")
Example Output:
Test Accuracy: 97.85%
That’s a strong performance — indicating the model generalizes well to unseen data.
๐น 7.4 Generating a Confusion Matrix
A confusion matrix gives a detailed breakdown of how the model performs across each class.
It shows which digits the model confuses with others.
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
y_true = []
y_pred = []
# Collect true and predicted labels
with torch.no_grad():
for images, labels in test_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
y_true.extend(labels.cpu().numpy())
y_pred.extend(predicted.cpu().numpy())
# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
# Plot confusion matrix
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix for MNIST Classifier")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
This visualization clearly shows which digits are misclassified.
For example, the model may confuse ‘4’ with ‘9’ or ‘3’ with ‘8’ due to their similar shapes.
๐น 7.5 Classification Report (Precision, Recall, F1-Score)
Let’s generate a more detailed report:
from sklearn.metrics import classification_report
print("Classification Report:")
print(classification_report(y_true, y_pred))
Sample Output:
precision recall f1-score support
0 0.99 0.99 0.99 980
1 0.99 0.99 0.99 1135
2 0.98 0.98 0.98 1032
3 0.97 0.97 0.97 1010
4 0.98 0.98 0.98 982
5 0.97 0.97 0.97 892
6 0.98 0.98 0.98 958
7 0.98 0.98 0.98 1028
8 0.97 0.97 0.97 974
9 0.97 0.97 0.97 1009
accuracy 0.98 10000
macro avg 0.98 0.98 0.98 10000
weighted avg 0.98 0.98 0.98 10000
Interpretation:
-
Precision: Fraction of correct positive predictions
-
Recall: Fraction of correctly identified actual positives
-
F1-score: Harmonic mean of precision and recall — balanced measure of performance
๐น 7.6 Visualizing Predictions on Sample Images
Let’s visualize some model predictions on the test set to understand its behavior better.
import numpy as np
# Get a batch of test images
dataiter = iter(test_loader)
images, labels = next(dataiter)
# Predict
model.eval()
with torch.no_grad():
images = images.to(device)
outputs = model(images)
_, preds = torch.max(outputs, 1)
# Plot first 8 test images with predictions
fig, axes = plt.subplots(1, 8, figsize=(15, 2))
for i in range(8):
ax = axes[i]
ax.imshow(images[i].cpu().squeeze(), cmap='gray')
ax.set_title(f"Pred: {preds[i].item()}\nTrue: {labels[i].item()}")
ax.axis('off')
plt.show()
This visualization helps you quickly see where the model succeeds — and where it might misclassify certain digits.
๐น 7.7 Summary
✅ In this section, we accomplished:
-
Evaluated the model on test data
-
Computed overall accuracy
-
Visualized a confusion matrix
-
Generated a classification report
-
Displayed sample predictions
✅ Outcome:
Our model achieves over 97% test accuracy, with only a few confusions between similar-looking digits.
This indicates a strong, generalizable classifier built from scratch using PyTorch.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
Section 8: Improving the Model — Regularization, Dropout, and Batch Normalization
Now that our MNIST classifier achieves high accuracy (~97–98%), the next step is to make it more robust and generalizable.
Even though our model performs well on the test data, it could still overfit — meaning it memorizes training patterns rather than learning true underlying features.
In this section, we’ll enhance our model using three key regularization techniques widely used in deep learning:
-
Regularization (L2 weight decay)
-
Dropout
-
Batch Normalization
These techniques help models generalize better and avoid overfitting, especially when scaling to more complex datasets beyond MNIST.
๐น 8.1 What Is Overfitting?
Overfitting occurs when the model performs exceptionally well on the training data but poorly on unseen data.
Symptoms:
-
High training accuracy but low test accuracy
-
Model memorizes patterns rather than learning general ones
Example Analogy:
Imagine studying for an exam by memorizing the exact questions — you might do great on a practice test, but fail the real one.
Goal:
Encourage the model to learn general features, not memorize specific patterns.
๐น 8.2 Regularization via Weight Decay (L2 Regularization)
Regularization adds a penalty for large weight values to the loss function, preventing the model from relying too heavily on any single feature.
Mathematically, the new loss function becomes:
[
L' = L + \lambda \sum_{i} w_i^2
]
Where:
-
( L ) = original loss
-
( \lambda ) = regularization strength (hyperparameter)
-
( w_i ) = model weights
In PyTorch, you can add L2 regularization directly through the optimizer:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
Here, weight_decay acts as the ( \lambda ) term.
It encourages smaller weight magnitudes → smoother decision boundaries → less overfitting.
๐น 8.3 Dropout: Randomly Turning Off Neurons
Dropout is one of the most effective and widely used regularization techniques.
During training, dropout randomly disables a fraction of neurons in each layer.
Mathematically, for each neuron output ( y_i ):
[
y_i' =
\begin{cases}
0 & \text{with probability } p \
\frac{y_i}{1-p} & \text{otherwise}
\end{cases}
]
Where ( p ) is the dropout probability (commonly 0.2–0.5).
This forces the network to not depend on specific neurons, encouraging redundancy and improving generalization.
๐ง Updated Model with Dropout
Let’s modify our previous model to include dropout layers.
import torch.nn as nn
import torch.nn.functional as F
class MNISTModel_Improved(nn.Module):
def __init__(self):
super(MNISTModel_Improved, self).__init__()
self.fc1 = nn.Linear(28 * 28, 256)
self.dropout1 = nn.Dropout(0.3)
self.fc2 = nn.Linear(256, 128)
self.dropout2 = nn.Dropout(0.3)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.dropout1(x)
x = F.relu(self.fc2(x))
x = self.dropout2(x)
x = F.relu(self.fc3(x))
x = self.fc4(x)
return x
๐น 8.4 Batch Normalization
Batch Normalization (BatchNorm) standardizes the activations of a layer for each mini-batch, stabilizing learning and improving convergence.
[
\hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}
]
Where:
-
( \mu_B ), ( \sigma_B^2 ) = mean and variance of the batch
-
( \epsilon ) = small constant for numerical stability
Benefits:
-
Faster training
-
Higher learning rates possible
-
Reduced dependence on initialization
-
Acts as a mild regularizer
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
๐ง Model with Dropout + Batch Normalization
Here’s how to integrate both dropout and batch normalization:
class MNISTModel_Advanced(nn.Module):
def __init__(self):
super(MNISTModel_Advanced, self).__init__()
self.fc1 = nn.Linear(28 * 28, 256)
self.bn1 = nn.BatchNorm1d(256)
self.dropout1 = nn.Dropout(0.3)
self.fc2 = nn.Linear(256, 128)
self.bn2 = nn.BatchNorm1d(128)
self.dropout2 = nn.Dropout(0.3)
self.fc3 = nn.Linear(128, 64)
self.bn3 = nn.BatchNorm1d(64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
x = x.view(x.size(0), -1)
x = F.relu(self.bn1(self.fc1(x)))
x = self.dropout1(x)
x = F.relu(self.bn2(self.fc2(x)))
x = self.dropout2(x)
x = F.relu(self.bn3(self.fc3(x)))
x = self.fc4(x)
return x
๐น 8.5 Comparing Models: Before vs After Regularization
| Feature | Baseline Model | Improved Model |
|---|---|---|
| Layers | 3 (128 → 64 → 10) | 4 (256 → 128 → 64 → 10) |
| Regularization | None | Dropout (0.3), BatchNorm |
| Weight Decay | 0 | 1e-5 |
| Accuracy | ~97% | ~98.3% |
| Overfitting | Moderate | Significantly reduced |
This demonstrates how simple regularization methods can make your model more robust and generalizable — a crucial aspect when working with larger, more complex datasets.
๐น 8.6 Practical Tips for Regularization
-
Start simple — Add dropout only where needed.
-
Avoid overdoing dropout — Too much can underfit the model.
-
Use BatchNorm early — It helps stabilize deeper models.
-
Experiment with
weight_decay— Common values:1e-4to1e-6. -
Monitor validation accuracy — Use early stopping if loss stops improving.
๐น 8.7 Summary
✅ What We Learned:
-
Regularization reduces overfitting by penalizing complex models.
-
Dropout randomly disables neurons during training to improve generalization.
-
Batch Normalization stabilizes and accelerates training.
-
Combining all three techniques leads to faster convergence and better generalization.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"



Comments
Post a Comment