Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - III

Content:

9. Visualizing Learned Features and Activations

10. Evaluating Model Performance and Confusion Matrix

11. Improving Model Performance — Dropout, Regularization & Learning Rate Tuning

12. Visualizing Model Performance — Accuracy & Loss Curves

Section 9: Visualizing Learned Features and Activations

Deep learning models—especially neural networks—are often described as “black boxes” because it’s not immediately clear how they make their predictions.
However, by visualizing what the model has learned internally (its weights, filters, and activations), we can gain valuable insights into how it processes and recognizes patterns, even in simple datasets like MNIST.

In this section, we’ll focus on understanding the model’s inner workings through feature visualization and activation analysis.

🔹 9.1 Why Visualize Neural Networks?

Visualization helps us:

Interpret the model’s learning – what features it focuses on.
Detect overfitting – e.g., if weights look noisy or unstructured.
Debug training issues – such as vanishing gradients or dead neurons.
Build trust – visualize how and why predictions are made.

In the MNIST example, visualizations reveal how neurons learn to detect:

Edges (vertical, horizontal strokes)
Curves
Combinations of strokes forming digits

🔹 9.2 Visualizing the Learned Weights

Each neuron in the first fully connected layer (fc1) receives input from the 28×28 pixel image.
Thus, its weights can be reshaped into a 28×28 image to visualize what patterns it detects.

Let’s extract and plot them.

import matplotlib.pyplot as plt

# Get the first layer weights
weights = model.fc1.weight.data

# Plot a few of them as images
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, ax in enumerate(axes.flat):
    weight = weights[i].reshape(28, 28).cpu().numpy()
    ax.imshow(weight, cmap='viridis')
    ax.set_title(f'Neuron {i+1}')
    ax.axis('off')

plt.suptitle("Learned Weights from First Layer")
plt.show()

Interpretation:

You’ll see faint outlines of digits or edges.
Some filters respond to horizontal lines, others to curves or dark spots.
These act like primitive feature detectors (similar to the human visual cortex).

🔹 9.3 Visualizing Activations (Feature Maps)

Weights tell us what the model has learned, but activations show how the model reacts to a specific input.

When we pass an image (say, the digit “8”) through the model, we can inspect the activations (outputs of each layer) to see what patterns get activated.

Let’s visualize activations for the first layer.

import torch

def visualize_activations(model, image):
    model.eval()
    with torch.no_grad():
        x = image.view(1, 1, 28, 28)
        x = x.view(x.size(0), -1)
        x = model.fc1(x)
        activations = x.squeeze().cpu().numpy()

    plt.figure(figsize=(12, 4))
    plt.plot(activations)
    plt.title("Activation Values (fc1 layer)")
    plt.xlabel("Neuron index")
    plt.ylabel("Activation strength")
    plt.show()

Call the function with a test image:

sample_image, label = next(iter(test_loader))
visualize_activations(model, sample_image[0])

Interpretation:

High activation values → neurons strongly respond to that digit.
Different neurons specialize in detecting specific stroke types or orientations.

🔹 9.4 Visualizing Activation Maps for Convolutional Layers (Optional)

Though our MNIST model currently uses fully connected layers, most real-world image classifiers use convolutional layers (CNNs).
For educational purposes, let’s see what such activations might look like.

Example for CNN models:

# Example activation visualization for CNN layer
act = activation_map[0].cpu().numpy()

fig, axes = plt.subplots(2, 4, figsize=(10, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(act[i], cmap='gray')
    ax.set_title(f'Filter {i+1}')
    ax.axis('off')
plt.suptitle("Example CNN Feature Maps")
plt.show()

These feature maps show localized features such as edges, corners, and textures — early layers learn simple patterns, while deeper layers learn more abstract ones.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🔹 9.5 Visualizing Misclassified Images

Visualization is also helpful in understanding model errors.

Let’s identify images the model got wrong and plot them.

incorrect_samples = []

model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        for img, pred, label in zip(images, preds, labels):
            if pred != label:
                incorrect_samples.append((img, pred, label))
            if len(incorrect_samples) >= 10:
                break

# Display misclassified examples
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i, (img, pred, label) in enumerate(incorrect_samples):
    ax = axes.flat[i]
    ax.imshow(img.squeeze(), cmap='gray')
    ax.set_title(f"True: {label}, Pred: {pred}")
    ax.axis('off')

plt.suptitle("Misclassified MNIST Digits")
plt.show()

Observations:

The model may confuse digits like 4 ↔ 9 or 3 ↔ 8 due to shape similarities.
Helps identify ambiguous data or training limitations.

🔹 9.6 Visualizing Training Progress (Loss & Accuracy Curves)

Plotting loss and accuracy curves across epochs gives a quick overview of model performance trends.

plt.plot(train_losses, label='Train Loss')
plt.plot(test_losses, label='Test Loss')
plt.title("Loss vs Epochs")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title("Accuracy vs Epochs")
plt.xlabel("Epoch")
plt.ylabel("Accuracy (%)")
plt.legend()
plt.show()

Interpretation:

Smoothly decreasing loss and stabilizing accuracy → good convergence.
Large gap between train and test accuracy → overfitting (add regularization).
Oscillations → learning rate might be too high.

🔹 9.7 Summary of Visualization Techniques

Visualization Type	Purpose	Example
Weight Visualization	Understand learned patterns	Visualize `fc1` weights
Activation Visualization	See how neurons respond to input	Plot activations for each layer
Misclassified Images	Debug and analyze model errors	Identify confusing digits
Loss/Accuracy Curves	Monitor training performance	Track convergence and overfitting

✅ Key Takeaways

Visualization helps open the black box of neural networks.
Weight and activation maps reveal what the model “sees.”
Misclassification analysis helps improve dataset quality and model design.
Monitoring training curves ensures healthy convergence and model balance.

🧠 Section 10: Evaluating Model Performance and Confusion Matrix

After training and visualizing our MNIST classifier, the next crucial step is to evaluate how well it performs on unseen data.
Evaluation tells us whether our model generalizes well or merely memorizes training samples.

In this section, we’ll dive into key performance metrics, understand the confusion matrix, and learn to interpret model strengths and weaknesses through numbers and visuals.

🔹 10.1 Why Evaluation Matters

Training accuracy alone is not enough — your model might perform well on training data but fail on new images.
Proper evaluation ensures:

Generalization: Can the model handle unseen data?
Fairness: Does it perform equally across all classes?
Reliability: Are the predictions trustworthy?

For MNIST, although it’s a relatively simple dataset, these evaluation techniques form the foundation for real-world deep learning tasks.

🔹 10.2 Core Evaluation Metrics

Let’s look at the most common metrics used to evaluate classification models.

Metric	Formula	Meaning
Accuracy	( \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} )	Percentage of total correct predictions
Precision	( \text{Precision} = \frac{TP}{TP + FP} )	Of the predicted positives, how many were correct
Recall (Sensitivity)	( \text{Recall} = \frac{TP}{TP + FN} )	Of the actual positives, how many were detected
F1-Score	( F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} )	Balance between precision and recall

For multiclass datasets like MNIST, we compute these metrics per class and then average them (macro/micro averaging).

🔹 10.3 Implementing Evaluation in PyTorch

We’ll evaluate the trained MNIST classifier using PyTorch and scikit-learn’s metrics utilities.

🔸 Step 1: Generate Predictions and Collect Labels

from sklearn.metrics import classification_report, confusion_matrix
import torch

y_true = []
y_pred = []

model.eval()
with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, preds = torch.max(outputs, 1)
        y_true.extend(labels.cpu().numpy())
        y_pred.extend(preds.cpu().numpy())

Now we have all true and predicted labels for the test dataset.

🔸 Step 2: Print Detailed Classification Report

from sklearn.metrics import classification_report

print("Classification Report:\n")
print(classification_report(y_true, y_pred))

Sample Output:

              precision    recall  f1-score   support

           0       0.99      0.99      0.99       980
           1       0.98      0.99      0.98      1135
           2       0.98      0.97      0.98      1032
           3       0.97      0.97      0.97      1010
           4       0.98      0.98      0.98       982
           5       0.97      0.97      0.97       892
           6       0.99      0.99      0.99       958
           7       0.98      0.98      0.98      1028
           8       0.97      0.97      0.97       974
           9       0.98      0.97      0.97      1009

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000

📊 Interpretation:

Each row represents a digit (0–9).
precision, recall, and f1-score close to 1.0 indicate excellent performance.
support = number of test samples per class.
The overall accuracy of ~98% is typical for a well-trained simple MNIST model.

🔹 10.4 Confusion Matrix: The Ultimate Insight Tool

The confusion matrix shows exactly where your model goes wrong.

Each row = actual digit
Each column = predicted digit

Diagonal entries = correct predictions.
Off-diagonal entries = misclassifications.

Let’s generate and visualize it.

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix for MNIST Classifier")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

🧩 How to Interpret:

Bright diagonal line → good performance.
If you see confusion between, say, “4” and “9” → model finds those digits similar.
Helps in identifying class imbalance or difficult digits.

🔹 10.5 Per-Class Error Analysis

Let’s compute which digits the model struggles with the most.

import numpy as np

cm_sum = np.sum(cm, axis=1, keepdims=True)
cm_norm = cm / cm_sum
errors_per_class = 1 - np.diag(cm_norm)

for i, err in enumerate(errors_per_class):
    print(f"Digit {i}: Error Rate = {err:.3f}")

Example Output:

Digit 0: Error Rate = 0.011
Digit 1: Error Rate = 0.007
Digit 5: Error Rate = 0.025
Digit 8: Error Rate = 0.031

➡️ Digits like “5” and “8” may have higher error rates due to similar shapes.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🔹 10.6 Visualizing Misclassified Examples Again (Detailed)

We can now focus only on specific confusion cases.

# Show examples where model confused 4 ↔ 9
target_digit = 4
confused_with = 9

fig, axes = plt.subplots(1, 5, figsize=(10, 2))
count = 0

for img, label, pred in zip(test_loader.dataset.data, y_true, y_pred):
    if label == target_digit and pred == confused_with:
        ax = axes[count]
        ax.imshow(img, cmap='gray')
        ax.set_title(f"T:{label} P:{pred}")
        ax.axis('off')
        count += 1
    if count == 5:
        break

plt.suptitle("Examples of 4 Misclassified as 9")
plt.show()

🧠 You’ll notice that the misclassified digits might have blurry handwriting or incomplete strokes, showing how real-world data noise affects accuracy.

🔹 10.7 Computing Overall Test Accuracy

Finally, let’s print the overall accuracy directly.

correct = np.sum(np.array(y_true) == np.array(y_pred))
total = len(y_true)
accuracy = correct / total * 100
print(f"Test Accuracy: {accuracy:.2f}%")

Output:

Test Accuracy: 98.21%

That’s an impressive performance for a simple feed-forward neural network built in PyTorch!

🔹 10.8 Key Takeaways

Concept	Description	Why It Matters
Accuracy	Fraction of correctly predicted samples	General performance
Precision	How many predicted positives were correct	Avoids false positives
Recall	How many actual positives were detected	Avoids false negatives
F1-Score	Harmonic mean of precision and recall	Balances both
Confusion Matrix	Shows correct vs. incorrect classifications	Detailed error analysis

✅ Summary

By applying these metrics, we’ve:

Quantified our model’s performance beyond just accuracy.
Identified the digits that the model confuses most.
Learned to interpret the confusion matrix and per-class statistics.
Built the foundation for model improvement through data analysis.

🧠 Section 11: Improving Model Performance — Dropout, Regularization & Learning Rate Tuning

By now, your MNIST image classifier is performing impressively — achieving around 98% accuracy.
But deep learning isn’t just about high accuracy; it’s about building robust models that generalize well to unseen data.

In this section, we’ll explore techniques that help you prevent overfitting, stabilize training, and boost model generalization, focusing on three key areas:

Dropout
Regularization (L1 & L2 Weight Decay)
Learning Rate Tuning

🔹 11.1 Understanding Overfitting

Before diving into techniques, let’s recall what overfitting means.

🔸 What Is Overfitting?

Overfitting occurs when the model performs well on training data but poorly on unseen test data.
It learns not only useful patterns but also noise and random variations in the training set.

🔸 Signs of Overfitting

Training loss keeps decreasing, but test loss starts increasing.
Accuracy on training data is much higher than on test data.
Model memorizes examples instead of learning general patterns.

Let’s visualize a simple scenario:

Model Type	Behavior
Underfitting	Too simple — can’t capture patterns
Good Fit	Balanced learning — performs well on both
Overfitting	Too complex — memorizes data noise

🔹 11.2 Dropout: A Simple Yet Powerful Regularization Technique

🔸 What is Dropout?

Dropout randomly “drops” (sets to zero) some neurons during training.
This forces the network to not rely on specific neurons, encouraging it to learn redundant and more general features.

Mathematically:
[
h_i' =
\begin{cases}
h_i, & \text{with probability } p \
0, & \text{with probability } 1-p
\end{cases}
]
where ( p ) is the dropout rate (typically 0.2–0.5).

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🔸 Implementing Dropout in PyTorch

Let’s add dropout layers to our existing MNIST model.

import torch.nn as nn
import torch.nn.functional as F

class MNISTModelWithDropout(nn.Module):
    def __init__(self):
        super(MNISTModelWithDropout, self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(p=0.3)  # 30% dropout rate

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)  # apply dropout after activation
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

🔸 Key Points:

Dropout is only active during training.
During evaluation (model.eval()), dropout is automatically disabled.
Improves generalization by preventing co-adaptation of neurons.

🔸 Comparing Models (With vs Without Dropout)

Model	Train Accuracy	Test Accuracy	Overfitting
Without Dropout	99.9%	98.1%	High
With Dropout (p=0.3)	98.7%	98.5%	Reduced

✅ Result: Slightly lower training accuracy but improved test accuracy — better generalization!

🔹 11.3 Weight Regularization (L1 and L2 Penalties)

Regularization discourages large weight values that can cause the model to fit noise.

🔸 L2 Regularization (Weight Decay)

L2 adds a penalty proportional to the square of the weight magnitude:

[
L_{total} = L_{data} + \lambda \sum_i w_i^2
]

This keeps weights small and smooth.

In PyTorch, you can apply it using the weight_decay parameter in the optimizer.

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)

🔸 L1 Regularization

L1 adds a penalty proportional to the absolute value of the weight:

[
L_{total} = L_{data} + \lambda \sum_i |w_i|
]

This tends to drive some weights to zero → sparse models (useful for feature selection).

In PyTorch, L1 can be applied manually:

l1_lambda = 0.001
l1_norm = sum(p.abs().sum() for p in model.parameters())
loss = loss_fn(outputs, labels) + l1_lambda * l1_norm

🔸 Comparison of Regularization Types

Regularization	Formula	Effect
L1	( \lambda \sum	w_i
L2	( \lambda \sum w_i^2 )	Smooth weights, general stability
Dropout	Randomly drop neurons	Prevents co-adaptation

🔹 11.4 Learning Rate and Its Importance

The learning rate (LR) controls how big a step the optimizer takes during gradient descent.
Choosing the right LR is critical — too high causes divergence, too low slows learning.

🔸 Learning Rate Schedules

A scheduler adjusts the LR over time:

Start high → learn fast
Gradually lower → fine-tune learning

Example with PyTorch:

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)

for epoch in range(20):
    train_one_epoch()
    scheduler.step()  # halve the LR every 5 epochs
    print(f"Epoch {epoch+1}, LR: {scheduler.get_last_lr()}")

Other schedulers:

ExponentialLR
ReduceLROnPlateau
CosineAnnealingLR
OneCycleLR (for fast convergence)

🔹 11.5 Batch Normalization (Bonus Tip)

Although not exactly regularization, Batch Normalization (BatchNorm) stabilizes training and improves generalization.

It normalizes activations in each layer, keeping them in a stable range.
Just add a nn.BatchNorm1d or nn.BatchNorm2d layer between your linear and activation layers.

self.bn1 = nn.BatchNorm1d(256)
x = F.relu(self.bn1(self.fc1(x)))

Benefits:

Faster convergence
Reduces internal covariate shift
Works synergistically with dropout

🔹 11.6 Visualizing the Effects of Regularization

You can compare training and validation losses for models with and without dropout or L2 weight decay.

plt.plot(train_losses, label="Training Loss (No Dropout)")
plt.plot(valid_losses, label="Validation Loss (With Dropout)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Regularization Impact on Training Stability")
plt.legend()
plt.show()

✅ You should see:

Overfitted model: Training loss ↓, validation loss ↑
Regularized model: Both losses ↓ steadily

🔹 11.7 Summary of Model Improvement Techniques

Technique	Description	Effect
Dropout	Randomly disables neurons	Reduces overfitting
L2 Regularization (Weight Decay)	Penalizes large weights	Encourages smaller, smoother weights
L1 Regularization	Penalizes absolute weight size	Creates sparse models
Learning Rate Scheduling	Dynamically adjusts LR	Stabilizes convergence
Batch Normalization	Normalizes activations	Accelerates and stabilizes training

✅ Key Takeaways

Overfitting is the enemy of generalization.
Dropout, regularization, and LR tuning help build robust, reliable models.
Proper combination of these techniques leads to better test accuracy and training stability.
Monitoring loss curves and learning rates helps fine-tune the network effectively.

🧩 Section 12: Visualizing Model Performance — Accuracy & Loss Curves

After training your model, it’s crucial to visualize its performance to understand how well it’s learning over time. Visualization helps identify underfitting, overfitting, and learning rate issues — giving you deep insight into your model’s training dynamics.

🔍 Why Visualization Matters

While accuracy and loss numbers can show how your model is performing, graphs tell the story visually:

📉 Training Loss Curve – Helps you see whether the model’s error is consistently decreasing or plateauing.
📈 Validation Accuracy Curve – Helps you understand whether the model generalizes well to unseen data.
⚠️ Gap Between Training and Validation Curves – Indicates overfitting if validation accuracy stops improving while training accuracy keeps increasing.

A well-trained model should show:

Gradual decrease in loss (both training and validation).
Gradual increase in accuracy (both training and validation).
Small gap between the training and validation curves.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🧠 Example Code — Plotting Accuracy and Loss Curves

We’ll use Matplotlib to visualize how our model’s accuracy and loss change across epochs.

import matplotlib.pyplot as plt

# Example: history dictionary from our training loop
history = {
    "train_loss": [0.8, 0.6, 0.45, 0.35, 0.3],
    "train_acc": [0.70, 0.80, 0.86, 0.90, 0.92],
    "val_loss": [0.75, 0.55, 0.50, 0.40, 0.38],
    "val_acc": [0.72, 0.82, 0.87, 0.89, 0.91]
}

epochs = range(1, len(history["train_loss"]) + 1)

# Plotting Loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs, history["train_loss"], 'bo-', label='Training Loss')
plt.plot(epochs, history["val_loss"], 'ro-', label='Validation Loss')
plt.title('Training & Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Plotting Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, history["train_acc"], 'bo-', label='Training Accuracy')
plt.plot(epochs, history["val_acc"], 'ro-', label='Validation Accuracy')
plt.title('Training & Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

📊 Interpreting the Graphs

Let’s understand what different patterns in the graphs mean:

Pattern	Description	Possible Fix
🔼 Training loss decreases but validation loss increases	Overfitting	Add regularization, dropout, or early stopping
🔽 Both training and validation loss remain high	Underfitting	Increase model capacity, train longer, or lower learning rate
⚖️ Training and validation loss decrease together	Good training	Keep the current setup or fine-tune learning rate
⚡ Sudden spikes in loss	Learning rate too high	Reduce learning rate or use learning rate scheduler

🧮 Mathematical Insight — Accuracy & Loss

1️⃣ Accuracy Formula

[
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}
]

In PyTorch, this can be implemented as:

correct = (y_pred.argmax(1) == y_true).type(torch.float).sum().item()
accuracy = correct / len(y_true)

2️⃣ Loss Function (Cross-Entropy)

[
\text{Loss} = -\frac{1}{N} \sum_{i=1}^{N} y_i \log(\hat{y}_i)
]

Here:

( y_i ): True label (0 or 1 in one-hot encoding)
( \hat{y}_i ): Predicted probability
( N ): Number of samples

🧩 Adding Real-World Context

When Google or Meta trains large image models like Inception, ResNet, or Vision Transformers, they also use similar curves — except scaled to millions of data points. The concept remains identical: monitoring loss and accuracy over time to ensure optimal model learning.

For example:

In Google’s ImageNet training, validation loss helps them decide when to stop training (early stopping).
In Tesla’s autonomous driving models, training curves ensure model convergence without overfitting to specific weather or lighting conditions.

🧠 Pro Tip — Save and Compare Training Histories

When tuning hyperparameters (like learning rate or batch size), you can store training histories in a dictionary or CSV file to compare later.

Example:

import pandas as pd

# Save training history
df = pd.DataFrame(history)
df.to_csv("training_history.csv", index=False)

# Later, reload and visualize
history_loaded = pd.read_csv("training_history.csv")
print(history_loaded.head())

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🧩 Section 12: Visualizing Model Performance — Accuracy & Loss Curves

After training your model, it’s crucial to visualize its performance to understand how well it’s learning over time. Visualization helps identify underfitting, overfitting, and learning rate issues — giving you deep insight into your model’s training dynamics.

🔍 Why Visualization Matters

While accuracy and loss numbers can show how your model is performing, graphs tell the story visually:

📉 Training Loss Curve – Helps you see whether the model’s error is consistently decreasing or plateauing.

📈 Validation Accuracy Curve – Helps you understand whether the model generalizes well to unseen data.

⚠️ Gap Between Training and Validation Curves – Indicates overfitting if validation accuracy stops improving while training accuracy keeps increasing.

A well-trained model should show:

Gradual decrease in loss (both training and validation).

Gradual increase in accuracy (both training and validation).

Small gap between the training and validation curves.

🧠 Example Code — Plotting Accuracy and Loss Curves

We’ll use Matplotlib to visualize how our model’s accuracy and loss change across epochs.

import matplotlib.pyplot as plt

# Example: history dictionary from our training loop
history = {
    "train_loss": [0.8, 0.6, 0.45, 0.35, 0.3],
    "train_acc": [0.70, 0.80, 0.86, 0.90, 0.92],
    "val_loss": [0.75, 0.55, 0.50, 0.40, 0.38],
    "val_acc": [0.72, 0.82, 0.87, 0.89, 0.91]
}

epochs = range(1, len(history["train_loss"]) + 1)

# Plotting Loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs, history["train_loss"], 'bo-', label='Training Loss')
plt.plot(epochs, history["val_loss"], 'ro-', label='Validation Loss')
plt.title('Training & Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

# Plotting Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, history["train_acc"], 'bo-', label='Training Accuracy')
plt.plot(epochs, history["val_acc"], 'ro-', label='Validation Accuracy')
plt.title('Training & Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

📊 Interpreting the Graphs

Let’s understand what different patterns in the graphs mean:

Pattern Description Possible Fix

🔼 Training loss decreases but validation loss increases Overfitting Add regularization, dropout, or early stopping

🔽 Both training and validation loss remain high Underfitting Increase model capacity, train longer, or lower learning rate

⚖️ Training and validation loss decrease together Good training Keep the current setup or fine-tune learning rate

⚡ Sudden spikes in loss Learning rate too high Reduce learning rate or use learning rate scheduler

🧮 Mathematical Insight — Accuracy & Loss

1️⃣ Accuracy Formula

[

\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}

]

In PyTorch, this can be implemented as:

correct = (y_pred.argmax(1) == y_true).type(torch.float).sum().item()
accuracy = correct / len(y_true)

2️⃣ Loss Function (Cross-Entropy)

[
\text{Loss} = -\frac{1}{N} \sum_{i=1}^{N} y_i \log(\hat{y}_i)
]
Here:

( y_i ): True label (0 or 1 in one-hot encoding)

( \hat{y}_i ): Predicted probability

( N ): Number of samples

🧩 Adding Real-World Context

When Google or Meta trains large image models like Inception, ResNet, or Vision Transformers, they also use similar curves — except scaled to millions of data points. The concept remains identical: monitoring loss and accuracy over time to ensure optimal model learning.
For example:

In Google’s ImageNet training, validation loss helps them decide when to stop training (early stopping).

In Tesla’s autonomous driving models, training curves ensure model convergence without overfitting to specific weather or lighting conditions.

🧠 Pro Tip — Save and Compare Training Histories

When tuning hyperparameters (like learning rate or batch size), you can store training histories in a dictionary or CSV file to compare later.

Example:

import pandas as pd

# Save training history
df = pd.DataFrame(history)
df.to_csv("training_history.csv", index=False)

# Later, reload and visualize
history_loaded = pd.read_csv("training_history.csv")
print(history_loaded.head())

First Image Classifier with PyTorch III - Evaluating, Improving, Visualizing Model Performance with pytorch

Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - III

Content:

Section 9: Visualizing Learned Features and Activations

🔹 9.1 Why Visualize Neural Networks?

🔹 9.2 Visualizing the Learned Weights

🔹 9.3 Visualizing Activations (Feature Maps)

🔹 9.4 Visualizing Activation Maps for Convolutional Layers (Optional)

Sponsor Key-Word

🔹 9.5 Visualizing Misclassified Images

🔹 9.6 Visualizing Training Progress (Loss & Accuracy Curves)

🔹 9.7 Summary of Visualization Techniques

✅ Key Takeaways

🧠 Section 10: Evaluating Model Performance and Confusion Matrix

🔹 10.1 Why Evaluation Matters

🔹 10.2 Core Evaluation Metrics

🔹 10.3 Implementing Evaluation in PyTorch

🔸 Step 1: Generate Predictions and Collect Labels

🔸 Step 2: Print Detailed Classification Report

🔹 10.4 Confusion Matrix: The Ultimate Insight Tool

🔹 10.5 Per-Class Error Analysis

Sponsor Key-Word

🔹 10.6 Visualizing Misclassified Examples Again (Detailed)

🔹 10.7 Computing Overall Test Accuracy

🔹 10.8 Key Takeaways

✅ Summary

🧠 Section 11: Improving Model Performance — Dropout, Regularization & Learning Rate Tuning

🔹 11.1 Understanding Overfitting

🔸 What Is Overfitting?

🔸 Signs of Overfitting

🔹 11.2 Dropout: A Simple Yet Powerful Regularization Technique

🔸 What is Dropout?

Sponsor Key-Word

🔸 Implementing Dropout in PyTorch

🔸 Key Points:

🔸 Comparing Models (With vs Without Dropout)

🔹 11.3 Weight Regularization (L1 and L2 Penalties)

🔸 L2 Regularization (Weight Decay)

🔸 L1 Regularization

🔸 Comparison of Regularization Types

🔹 11.4 Learning Rate and Its Importance

🔸 Learning Rate Schedules

🔹 11.5 Batch Normalization (Bonus Tip)

🔹 11.6 Visualizing the Effects of Regularization

🔹 11.7 Summary of Model Improvement Techniques

✅ Key Takeaways

🧩 Section 12: Visualizing Model Performance — Accuracy & Loss Curves

🔍 Why Visualization Matters

Sponsor Key-Word

🧠 Example Code — Plotting Accuracy and Loss Curves

📊 Interpreting the Graphs

🧮 Mathematical Insight — Accuracy & Loss

1️⃣ Accuracy Formula

2️⃣ Loss Function (Cross-Entropy)

🧩 Adding Real-World Context

🧠 Pro Tip — Save and Compare Training Histories

Sponsor Key-Word

🧩 Section 12: Visualizing Model Performance — Accuracy & Loss Curves

After training your model, it’s crucial to visualize its performance to understand how well it’s learning over time. Visualization helps identify underfitting, overfitting, and learning rate issues — giving you deep insight into your model’s training dynamics.

🔍 Why Visualization Matters

🧠 Example Code — Plotting Accuracy and Loss Curves

📊 Interpreting the Graphs

🧮 Mathematical Insight — Accuracy & Loss

1️⃣ Accuracy Formula

[\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}]In PyTorch, this can be implemented as:correct = (y_pred.argmax(1) == y_true).type(torch.float).sum().item()accuracy = correct / len(y_true)

2️⃣ Loss Function (Cross-Entropy)

[\text{Loss} = -\frac{1}{N} \sum_{i=1}^{N} y_i \log(\hat{y}_i)]Here: ( y_i ): True label (0 or 1 in one-hot encoding) ( \hat{y}_i ): Predicted probability ( N ): Number of samples

🧩 Adding Real-World Context

🧠 Pro Tip — Save and Compare Training Histories

Comments

Post a Comment

[
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Predictions}}
]
In PyTorch, this can be implemented as:
correct = (y_pred.argmax(1) == y_true).type(torch.float).sum().item()
`accuracy = correct / len(y_true)`

[
\text{Loss} = -\frac{1}{N} \sum_{i=1}^{N} y_i \log(\hat{y}_i)
]
Here:

( y_i ): True label (0 or 1 in one-hot encoding)

( \hat{y}_i ): Predicted probability

( N ): Number of samples