Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - VI

content:

21. Adding Model Checkpointing and Saving Best Weights

22. Visualizing Training with TensorBoard

23. Performing Error Analysis — Visualizing Misclassified Digits

24. Using Learning Rate Schedulers to Improve Training

Section 21: Adding Model Checkpointing and Saving Best Weights

Training a model is often computationally expensive. You might spend minutes, hours, or even days training a neural network, especially when working with large datasets or complex architectures.

So what happens if your training process stops unexpectedly?

Power failure
Colab runtime disconnect
GPU timeouts
Code crashes

Without saving your model checkpoints, you would lose all progress.

That’s why checkpointing is one of the most important steps in deep learning workflows.

✅ Why Model Checkpoints Matter

Model checkpointing allows you to save:

The current weights
The optimizer state
The training progress (epoch count, metrics)
The best-performing model

This makes it possible to:

✔ Resume training exactly from where you stopped
✔ Store multiple versions for experimentation
✔ Automatically save only the models that perform better
✔ Deploy the trained model without re-training
✔ Share the final .pt or .pth file with others

In production, saving best weights is a must for reproducibility.

21.1 – PyTorch Model Saving Basics

PyTorch allows two common saving patterns:

A) Save the entire model

torch.save(model, "mnist_model.pth")

✔ Saves architecture + weights
✘ Not recommended for long-term use (architecture dependency)

B) Save only state_dict (Recommended)

torch.save(model.state_dict(), "mnist_weights.pth")

✔ Lightweight
✔ Architecture-independent
✔ Best for deployment and research

21.2 – Saving and Loading a Trained Model

Saving:

torch.save(model.state_dict(), "mnist_cnn_best.pth")

Loading:

model = CNNModel()
model.load_state_dict(torch.load("mnist_cnn_best.pth"))
model.eval()

.eval() is required because:

It turns off dropout
It uses running mean/variance in batchnorm
It ensures stable inference

21.3 – Implementing Automatic Checkpointing During Training

The most popular pattern is:

Save the model only when validation accuracy improves.

Let's integrate this in the MNIST training loop.

🔧 Code: Model Checkpointing with Best Accuracy Saving

best_accuracy = 0.0

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Evaluate on test set
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss:.4f}, Accuracy: {accuracy:.2f}%")

    # --- Save best model ---
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        torch.save(model.state_dict(), "best_mnist_model.pth")
        print("Model improved! Saved checkpoint.")

21.4 – Saving Additional Training Metadata

You might want to save:

Epoch number
Optimizer states
Best accuracy
Training loss
Learning rate

PyTorch allows you to save them as a dictionary.

🔧 Code: Saving Full Training State

checkpoint = {
    "epoch": epoch,
    "model_state": model.state_dict(),
    "optimizer_state": optimizer.state_dict(),
    "best_accuracy": best_accuracy
}

torch.save(checkpoint, "mnist_checkpoint.pth")

🔧 Code: Loading Full Training State (Resume Training)

checkpoint = torch.load("mnist_checkpoint.pth")

model.load_state_dict(checkpoint["model_state"])
optimizer.load_state_dict(checkpoint["optimizer_state"])

start_epoch = checkpoint["epoch"] + 1
best_accuracy = checkpoint["best_accuracy"]

print("Training resumed from epoch:", start_epoch)

This allows seamless continuation of training — even after system crashes.

21.5 – Best Practices for Model Saving

✔ Use descriptive filenames

best_mnist_cnn_acc99.pth
mnist_epoch10_lr0.001.pth
mnist_experiment_dropout0.3.pth

✔ Save models in a dedicated directory

/models
    best.pth
    last.pth
    checkpoint_epoch_3.pth

✔ Keep training metadata in JSON/YAML for reproducibility

✔ Never store large model files inside Git repositories

Use:

HuggingFace Hub
AWS S3
Google Cloud
Model Registry

21.6 – Example Folder Structure for MNIST Project

mnist-project/
│
├── data/
├── models/
│   ├── best_model.pth
│   ├── checkpoint_epoch_5.pth
├── scripts/
│   ├── train.py
│   ├── evaluate.py
│   ├── predict.py
│
├── utils/
│   ├── dataset.py
│   ├── transforms.py
│   ├── visualizer.py
│
└── README.md

This structure scales well when adding more experiments.

21.7 – Conclusion of Section 21

In this chapter, you learned:

✅ Why checkpointing is necessary
✅ How to save and load PyTorch models
✅ How to automatically save best-performing weights
✅ How to resume training from a saved checkpoint
✅ Recommended practices for real-world ML workflows

These techniques ensure that your model development process is safe, repeatable, and production-ready.

Section 22: Visualizing Training with TensorBoard

As your deep learning projects grow, understanding your model’s learning behavior becomes crucial. Simply printing loss values in the console is not enough — you need powerful visualization tools that help answer important questions:

Is the model overfitting?
Is the learning rate too high or too low?
Are gradients exploding?
Are weights updating correctly?
Is validation accuracy improving consistently?

This is exactly where TensorBoard, one of the most popular visualization tools in the ML world, comes into play.

22.1 – What Is TensorBoard?

TensorBoard is a visualization toolkit originally developed for TensorFlow, but now fully supported by PyTorch as well. It helps track and visualize:

Training & validation loss curves
Accuracy curves
Histograms of weights
Learning rate schedules
Images and prediction samples
Computational graph (for TensorFlow; partially for PyTorch)

Think of TensorBoard as a fitness tracker for your model — it tells you how your neural network is evolving in real time.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

22.2 – Why TensorBoard Is Essential for Deep Learning

TensorBoard provides insights such as:

✔ Detect Overfitting

If training loss decreases but validation loss increases, you know the model is overfitting.

✔ Check for Underfitting

Both losses stagnate — model isn't learning properly.

✔ Track Gradient Flow

Helps identify vanishing/exploding gradients.

✔ Compare Experiments

Run multiple models and compare results visually.

✔ Visualize Images

Useful for MNIST, CIFAR, medical images, etc.

22.3 – Installing TensorBoard

TensorBoard is included with TensorFlow, but for PyTorch:

pip install tensorboard

To launch TensorBoard:

tensorboard --logdir=runs

The default log directory for PyTorch is /runs.

22.4 – Integrating TensorBoard into Your MNIST Training Loop

Let’s add TensorBoard to the MNIST classifier.

PyTorch provides a built-in utility:

from torch.utils.tensorboard import SummaryWriter

Initialize Writer

writer = SummaryWriter("runs/mnist_experiment_1")

You can give each experiment a unique name:

runs/
    mnist_experiment_1/
    mnist_experiment_dropout/
    mnist_experiment_lr0.001/

22.5 – Logging Training Loss and Accuracy

Inside the training loop:

writer.add_scalar("Training Loss", running_loss, epoch)
writer.add_scalar("Validation Accuracy", accuracy, epoch)

This creates two plots on TensorBoard:

Loss vs. Epoch
Accuracy vs. Epoch

22.6 – Logging Images (MNIST Samples)

Before training or during training, you can visualize sample inputs:

images, labels = next(iter(train_loader))
img_grid = torchvision.utils.make_grid(images)

writer.add_image("MNIST Images", img_grid)

TensorBoard will display a grid of MNIST images.

22.7 – Logging the Model Graph

You can visualize the network architecture:

writer.add_graph(model, images.to(device))

This helps ensure the architecture is correct.

22.8 – Logging Weight Histograms

Histograms help monitor weight changes across epochs:

for name, param in model.named_parameters():
    writer.add_histogram(name, param, epoch)

Useful to detect:

Dead neurons
Gradient problems
Diverging weights

22.9 – Full TensorBoard-Integrated Training Loop

Here is a simplified standalone version:

from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("runs/mnist_tensorboard")

best_accuracy = 0

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    # Evaluate
    model.eval()
    correct, total = 0, 0

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total

    # ---- Logging ----
    writer.add_scalar("Loss/train", running_loss, epoch)
    writer.add_scalar("Accuracy/test", accuracy, epoch)

    for name, param in model.named_parameters():
        writer.add_histogram(name, param, epoch)

    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {running_loss:.4f}, Acc: {accuracy:.2f}%")

writer.close()

22.10 – Launch TensorBoard

Run:

tensorboard --logdir=runs

Then open the link:

http://localhost:6006/

You will see tabs like:

Scalars
Graphs
Histograms
Images
Distributions
Projector

22.11 – TensorBoard Visualizations You Will See

✔ Smooth loss curve

Shows how well model is learning.

✔ Accuracy curve

Indicates generalization to unseen data.

✔ Weight histograms

Spot unstable layers.

✔ Image grid

Useful to confirm preprocessing is correct.

22.12 – Best Practices for TensorBoard

✔ Create a new run folder for each experiment

runs/exp1/
runs/exp2_dropout/
runs/exp3_batchsize128/

✔ Avoid mixing multiple experiments into the same folder

Hard to compare results.

✔ Use meaningful experiment names

Example:

mnist_lr0.001_bs64/
mnist_dropout0.3/
mnist_no_batchnorm/

✔ Track both train and validation values

It helps identify overfitting clearly.

22.13 – Conclusion of Section 22

In this section, you learned:

✅ What TensorBoard is
✅ Why it's essential for visualization
✅ How to set up TensorBoard for PyTorch
✅ How to log losses, accuracies, images, histograms
✅ How to visualize the model graph
✅ How to organize experiments
✅ Best practices for ML experiment tracking

TensorBoard turns your training process into a visual, trackable, analyzable workflow — critical for both beginners and advanced deep learning engineers.

Section 23: Performing Error Analysis — Visualizing Misclassified Digits

Even after training a strong MNIST classifier with good accuracy (usually above 98–99%), the model will still misclassify some images.

Why?

Because:

Some digits look ambiguous
Some images are poorly written
The model may be biased or insufficiently trained
The dataset has noisy samples

Performing error analysis helps you understand why the model fails and gives insights to improve it.

23.1 – What Is Error Analysis?

Error analysis means systematically examining:

Which samples are misclassified
What mistakes the model makes
Whether specific numbers (e.g., 5 vs 3) are confusing
Whether the model performs poorly on specific variations (slanted digits, faint digits, bold digits)

Error analysis is an essential practice in both research and industry.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

23.2 – Why Misclassification Analysis Matters

Here are the reasons:

✔ Finds weaknesses the accuracy metric hides

Even 1% error = 600 misclassified samples out of 60,000.

✔ Helps improve the model

By identifying patterns in mistakes:

Add data augmentation
Tune learning rate
Use better architectures
Balance dataset

✔ Helps explain predictions

Critical in healthcare, finance, defense.

✔ Builds intuition for what the model “sees”

Deep learning is a black box; visualization helps us peek inside.

23.3 – How to Extract Misclassified Samples

To analyze errors, we first need to capture:

The predicted labels
The true labels
The images

Let's write the code.

🔧 Code: Get Misclassified Images

misclassified_images = []
misclassified_preds = []
misclassified_labels = []

model.eval()

with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs, 1)

        # Collect indices where prediction != actual
        mismatch = predicted != labels
        mis_images = images[mismatch]
        mis_preds = predicted[mismatch]
        mis_labels = labels[mismatch]

        misclassified_images.extend(mis_images.cpu())
        misclassified_preds.extend(mis_preds.cpu())
        misclassified_labels.extend(mis_labels.cpu())

This stores misclassified examples.

23.4 – Visualizing Misclassified Digits

Let’s display a grid of wrongly predicted samples.

🔧 Code: Plot First 25 Misclassified Samples

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))

for i in range(25):
    plt.subplot(5, 5, i+1)
    img = misclassified_images[i].squeeze(0)  # remove channel
    plt.imshow(img, cmap="gray")

    plt.title(f"Pred: {misclassified_preds[i]}\nTrue: {misclassified_labels[i]}")
    plt.axis("off")

plt.tight_layout()
plt.show()

23.5 – What You Will Observe in the Plots

The visual patterns usually include:

1. Confusing digits

A sloppy 5 looks like 3
A thick 9 looks like 4
A tilted 7 looks like 1

2. Low contrast images

Some digits are faint, making them harder for the model.

3. Overlapping strokes

Like 8 written in a single line-looking like 3.

4. Noisy or broken digits

Some samples contain noise, smudges, or distortions.

5. Unusual handwriting styles

Some digits are drawn uniquely (especially 2s, 7s, and 9s).

These insights lead to improvements.

23.6 – Example Interpretation of Misclassifications

Let’s say the model misclassified:

Predicted	True	Reason
8	3	Overlapping circles confused the model
1	7	Slanted handwriting
9	4	Classic MNIST ambiguity
0	6	Open-loop 6 resembles 0
5	8	Similar structure

These confusions are common in MNIST and even humans may hesitate.

23.7 – Confusion Matrix for Deeper Insights

A confusion matrix shows:

How many 5s were classified as 3
How many 8s were classified as 0
Which digit has the worst recall

🔧 Code: Generate Confusion Matrix

from sklearn.metrics import confusion_matrix
import seaborn as sns
import numpy as np

all_preds = []
all_labels = []

model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images.to(device))
        _, predicted = torch.max(outputs, 1)

        all_preds.extend(predicted.cpu())
        all_labels.extend(labels)

cm = confusion_matrix(all_labels, all_preds)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("True Label")
plt.title("MNIST Confusion Matrix")
plt.show()

23.8 – Insights from Confusion Matrix

You will observe:

“5 → 3” and “3 → 5” mistakes are common
“4 → 9” also occurs frequently
“7 → 1” and “1 → 7” are often confused

It helps identify which digits need more attention.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

23.9 – Using Error Analysis to Improve the Model

Once you identify weaknesses, you can improve:

✔ Add Data Augmentation

Examples:

Rotation
Shear
Zoom
Elastic distortions

This helps with slanted or weirdly shaped digits.

✔ Increase Training Epochs

Model may simply need more learning.

✔ Use a Better Architecture

Try:

CNN with more filters
BatchNorm
Dropout
ResNet-like blocks

✔ Hyperparameter tuning

Reduce learning rate
Increase batch size
Try different optimizers

✔ Clean noisy samples

Remove extreme cases if necessary.

23.10 – Summary of What You Learned in Section 23

✔ How to identify misclassified MNIST samples
✔ How to visualize mistakes using matplotlib
✔ How to interpret wrong predictions
✔ How confusion matrix strengthens analysis
✔ Why some digits are commonly confused
✔ How to use error insights to improve the model

Error analysis is a powerful phase that transforms a good model into a great one.

Section 24: Improving MNIST Accuracy with Data Augmentation

Even though MNIST is considered an “easy” dataset, boosting accuracy beyond 98–99% often requires more than just increasing epochs or changing optimizers.

A powerful technique that consistently improves generalization is:

✨ Data Augmentation

Data augmentation artificially expands your training dataset by applying random transformations to the images during training — helping your model generalize to variations it hasn’t seen before.

This is one of the most effective ways to reduce:

Overfitting
Sensitivity to handwriting styles
Reliance on pixel-perfect shapes
Sensitivity to noise

24.1 – Why Data Augmentation Works (Intuition)

MNIST images contain huge diversity:

Some digits are large, some are small
Some are slanted left, others right
Some are thick; others are faint
Some digits overlap or break
Different handwriting styles
Noisy or distorted images

A neural network that trains only on the original images may become too sensitive to particular forms.

Augmentation teaches the model to be invariant to:

Rotation
Translation
Shearing
Zooming
Elastic deformation

This leads to:

✔ Better generalization

✔ Higher test accuracy

✔ More robust predictions

24.2 – Popular Augmentation Techniques for MNIST

Below are common transformations and why they matter.

1. Random Rotation

Digits can be slightly rotated depending on handwriting.

transforms.RandomRotation(10)

Rotation within ±10 degrees is reasonable.

2. Random Shift (Translation)

Moves the digit left/right or up/down.

transforms.RandomAffine(0, translate=(0.1, 0.1))

3. Random Scaling (Zoom In/Out)

transforms.RandomAffine(0, scale=(0.9, 1.1))

4. Random Shear (Tilt/Streak)

Shear introduces slanted strokes.

transforms.RandomAffine(0, shear=10)

5. Gaussian Noise

Simulates faint or noisy handwriting.

transforms.GaussianBlur(kernel_size=3)

6. Elastic Distortion (Advanced)

This is used in state-of-the-art MNIST models.
It simulates natural handwriting variations.

24.3 – PyTorch Transform Pipeline With Augmentation

Let’s build a complete transform pipeline.

import torchvision.transforms as transforms

train_transform = transforms.Compose([
    transforms.RandomRotation(10),
    transforms.RandomAffine(0, translate=(0.1, 0.1)),
    transforms.RandomAffine(0, scale=(0.9, 1.1)),
    
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

We apply augmentation only on training data.

24.4 – Rebuilding the Dataloaders

train_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=True,
    download=True,
    transform=train_transform
)

test_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=False,
    download=True,
    transform=test_transform
)

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=64, shuffle=True
)

test_loader = torch.utils.data.DataLoader(
    test_dataset, batch_size=64, shuffle=False
)

24.5 – Visualizing Augmented Images

Understanding how augmentations look is essential.

🔧 Code: Visualize Augmented Samples

import matplotlib.pyplot as plt

images, labels = next(iter(train_loader))

plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.imshow(images[i].squeeze(), cmap="gray")
    plt.title(f"Label: {labels[i]}")
    plt.axis("off")
plt.show()

You will see rotated, shifted, and zoomed digits compared to the original dataset.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

24.6 – How Augmentation Affects Training

The model will observe a new variation of each digit every epoch.

This prevents memorization of exact images and forces the model to learn shape-based features (edges, strokes, and curves) rather than specific pixel positions.

The model becomes:

More robust
Less overfitted
Better at generalization

24.7 – Accuracy Gains with Data Augmentation

Typical improvements:

Model Type	Accuracy (No Aug)	Accuracy (With Aug)
Simple MLP	~97%	~98.5%
Simple CNN	~98.5%	~99.2%
Advanced CNN	~99.3%	~99.5%+

State-of-the-art MNIST models reach 99.7–99.8% using aggressive augmentation and deeper architectures.

24.8 – A Better CNN Model With Augmentation

Here’s a good MNIST CNN architecture:

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc = nn.Linear(64 * 7 * 7, 10)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

Add dropout for further robustness:

self.dropout = nn.Dropout(0.3)

24.9 – Using Augmentation + Better Model = High Accuracy

Training with augmentation + improved architecture will push accuracy towards 99%+.

24.10 – When Not to Use Augmentation

Data augmentation is powerful, but avoid:

❌ Excessive rotation

Digits like 6/9/0 become confusing.

❌ Extreme distortions

Too much shear might create unrealistic digits.

❌ Applying augmentation to validation/test sets

Augmentation must only be used during training.

24.11 – Summary

In this chapter, you learned:

✔ Why data augmentation improves model robustness
✔ Different augmentation techniques for handwritten digits
✔ How to implement augmentation with PyTorch transforms
✔ How to visualize augmented images
✔ How augmentation reduces overfitting
✔ How augmentation improves model accuracy
✔ Recommended augmentation pipeline for MNIST
✔ Combined effect of augmentation + stronger CNN architecture

Data augmentation is one of the simplest yet most powerful tools in deep learning to push performance further.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

Image Classifier -VI Model Checkpointing, Visualizing Training, Performing Error Analysis, Learning Rate to improve training

Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - VI

content:

Section 21: Adding Model Checkpointing and Saving Best Weights

✅ Why Model Checkpoints Matter

21.1 – PyTorch Model Saving Basics

A) Save the entire model

B) Save only state_dict (Recommended)

21.2 – Saving and Loading a Trained Model

Saving:

Loading:

21.3 – Implementing Automatic Checkpointing During Training

🔧 Code: Model Checkpointing with Best Accuracy Saving

21.4 – Saving Additional Training Metadata

🔧 Code: Saving Full Training State

🔧 Code: Loading Full Training State (Resume Training)

21.5 – Best Practices for Model Saving

✔ Use descriptive filenames

✔ Save models in a dedicated directory

✔ Keep training metadata in JSON/YAML for reproducibility

✔ Never store large model files inside Git repositories

21.6 – Example Folder Structure for MNIST Project

21.7 – Conclusion of Section 21

Section 22: Visualizing Training with TensorBoard

22.1 – What Is TensorBoard?

Sponsor Key-Word

22.2 – Why TensorBoard Is Essential for Deep Learning

✔ Detect Overfitting

✔ Check for Underfitting

✔ Track Gradient Flow

✔ Compare Experiments

✔ Visualize Images

22.3 – Installing TensorBoard

22.4 – Integrating TensorBoard into Your MNIST Training Loop

Initialize Writer

22.5 – Logging Training Loss and Accuracy

22.6 – Logging Images (MNIST Samples)

22.7 – Logging the Model Graph

22.8 – Logging Weight Histograms

22.9 – Full TensorBoard-Integrated Training Loop

22.10 – Launch TensorBoard

22.11 – TensorBoard Visualizations You Will See

✔ Smooth loss curve

✔ Accuracy curve

✔ Weight histograms

✔ Image grid

22.12 – Best Practices for TensorBoard

✔ Create a new run folder for each experiment

✔ Avoid mixing multiple experiments into the same folder

✔ Use meaningful experiment names

✔ Track both train and validation values

22.13 – Conclusion of Section 22

Section 23: Performing Error Analysis — Visualizing Misclassified Digits

23.1 – What Is Error Analysis?

Sponsor Key-Word

23.2 – Why Misclassification Analysis Matters

✔ Finds weaknesses the accuracy metric hides

✔ Helps improve the model

✔ Helps explain predictions

✔ Builds intuition for what the model “sees”

23.3 – How to Extract Misclassified Samples

🔧 Code: Get Misclassified Images

23.4 – Visualizing Misclassified Digits

🔧 Code: Plot First 25 Misclassified Samples

23.5 – What You Will Observe in the Plots

1. Confusing digits

2. Low contrast images

3. Overlapping strokes

4. Noisy or broken digits

5. Unusual handwriting styles

23.6 – Example Interpretation of Misclassifications

23.7 – Confusion Matrix for Deeper Insights

🔧 Code: Generate Confusion Matrix

23.8 – Insights from Confusion Matrix

Sponsor Key-Word

23.9 – Using Error Analysis to Improve the Model

✔ Add Data Augmentation

✔ Increase Training Epochs

✔ Use a Better Architecture

✔ Hyperparameter tuning