First Image Classifier with PyTorch - IV : Hyperparameter Tuning & Optimization Techniques, Model Evaluation & Confusion Matrix Analysis and saving and loading and deployment the pytorch model

Building Your First Image Classifier with PyTorch: A Step-by-Step Guide Using the MNIST Dataset - IV

Content: 

13: Hyperparameter Tuning & Optimization Techniques in PyTorch

14: Model Evaluation & Confusion Matrix Analysis in PyTorch

15: Model Saving, Loading, and Deployment in PyTorch

16: Transfer Learning with Pretrained Models in PyTorch


⚙️ Section 13: Hyperparameter Tuning & Optimization Techniques in PyTorch

Once you have your model training successfully, the next step is to optimize its performance.
Even a small change in learning rate, batch size, or optimizer can dramatically improve accuracy.
This section explores Hyperparameter Tuning — the art and science of finding the best configuration for your neural network.


๐ŸŽฏ What Are Hyperparameters?

Hyperparameters are the external configurations of your model — they are not learned during training but directly influence how your model learns.

Type Examples Description
Model Hyperparameters Number of layers, neurons per layer, activation functions Define the model’s architecture
Training Hyperparameters Learning rate, batch size, epochs Control how the model is trained
Regularization Hyperparameters Dropout rate, L2 penalty Prevent overfitting
Optimizer Hyperparameters Momentum, beta values (for Adam) Fine-tune optimizer behavior

๐Ÿง  Why Hyperparameter Tuning Matters

A model’s accuracy can rise from 85% to 95% simply by adjusting hyperparameters — no architecture change needed.

Example:
For MNIST:

  • Learning rate = 0.01 → accuracy = 89%

  • Learning rate = 0.001 → accuracy = 96%

Just tuning one number led to a 7% performance boost!

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


๐Ÿงฎ Key Hyperparameters in PyTorch

Let’s discuss the most critical ones in detail:

1️⃣ Learning Rate (lr)

Controls how big a step the optimizer takes in each iteration.

  • Too high → model overshoots minima.

  • Too low → model converges very slowly.

Example:

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Visualization Concept:

  • Learning rate too high = bouncing around the minima.

  • Learning rate too low = slow crawl to the minima.

You can also use Learning Rate Schedulers to dynamically adjust it during training:

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

2️⃣ Batch Size

Defines how many samples are used to compute each gradient update.

Batch Size Description
Small (e.g., 16–32) More noise, faster generalization
Large (e.g., 256–512) Stable gradients, but needs more memory
train_loader = DataLoader(dataset=train_data, batch_size=64, shuffle=True)

3️⃣ Number of Epochs

One epoch = one complete pass through the dataset.

  • Too few → underfitting

  • Too many → overfitting

Use Early Stopping to avoid training too long:

if val_loss > best_val_loss:
    trigger_times += 1
    if trigger_times >= patience:
        print("Early stopping triggered!")
        break

4️⃣ Optimizer Choice

PyTorch offers several optimizers. Each suits different learning patterns.

Optimizer Description Code Example
SGD Basic gradient descent torch.optim.SGD(model.parameters(), lr=0.01)
Adam Adaptive learning, widely used torch.optim.Adam(model.parameters(), lr=0.001)
RMSprop Great for RNNs torch.optim.RMSprop(model.parameters(), lr=0.001)

5️⃣ Activation Functions

These define nonlinearities and help models learn complex relationships.

Activation Use Case PyTorch Example
ReLU Most CNNs, general networks torch.nn.ReLU()
Sigmoid Binary classification torch.nn.Sigmoid()
Tanh Normalized outputs (-1 to 1) torch.nn.Tanh()
LeakyReLU Avoids “dead ReLU” issue torch.nn.LeakyReLU(0.01)

6️⃣ Regularization (Dropout, Weight Decay)

Prevents overfitting by adding constraints.

  • Dropout: Randomly drops neurons during training.

torch.nn.Dropout(p=0.3)
  • Weight Decay: Penalizes large weights (L2 regularization).

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

๐Ÿ”ฌ Manual Hyperparameter Search (Grid Search)

You can loop through combinations manually:

learning_rates = [0.1, 0.01, 0.001]
batch_sizes = [32, 64, 128]

for lr in learning_rates:
    for batch in batch_sizes:
        print(f"Training with lr={lr}, batch={batch}")
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        train_loader = DataLoader(train_data, batch_size=batch, shuffle=True)
        train_model(model, train_loader, optimizer)

This approach is simple but computationally expensive for larger models.


๐Ÿค– Automated Hyperparameter Tuning

For larger projects, libraries like Optuna, Ray Tune, or Weights & Biases (W&B) automate the search.

Example (Optuna):

import optuna

def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-1)
    dropout = trial.suggest_uniform('dropout', 0.1, 0.5)
    
    model = Net(dropout=dropout)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    
    acc = train_and_validate(model, optimizer)
    return acc

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=30)
print(study.best_params)

This allows you to automatically find the best hyperparameters.


๐Ÿ“ˆ Visualization: Learning Rate vs Accuracy

You can visualize the effect of hyperparameters:

import matplotlib.pyplot as plt

lrs = [0.1, 0.01, 0.001, 0.0001]
accuracy = [0.80, 0.88, 0.95, 0.91]

plt.plot(lrs, accuracy, 'bo-')
plt.xscale('log')
plt.xlabel('Learning Rate')
plt.ylabel('Validation Accuracy')
plt.title('Learning Rate vs Accuracy')
plt.show()

๐Ÿ’ก Best Practices for Hyperparameter Tuning

  1. Start with defaults – frameworks like PyTorch choose reasonable defaults.

  2. Tune one parameter at a time – isolate effects.

  3. Use validation sets – never tune on test data.

  4. Track experiments – log results systematically.

  5. Leverage GPU resources – use CUDA to parallelize training.

  6. Use early stopping – avoid unnecessary computation.


๐Ÿง  Real-World Example:

In Google’s BERT model, tuning learning rate (2e-5 to 5e-5) drastically affected convergence.
In ResNet training, batch size scaling (32 → 512) required learning rate warmup and adaptive scheduling.
This shows how hyperparameter tuning is an integral part of every production-grade AI system.


Conclusion of Section 13

Hyperparameter tuning transforms a working model into a high-performing one.
It’s not guesswork — it’s a process of systematic experimentation, observation, and optimization.


๐Ÿ“Š Section 14: Model Evaluation & Confusion Matrix Analysis in PyTorch

After successfully training and tuning your model, the next crucial step is to evaluate its performance.
Model evaluation ensures that your neural network not only performs well on the training data but also generalizes effectively to unseen (test) data.

In this section, you’ll learn:
✅ How to measure model performance using metrics like accuracy, precision, recall, F1-score
✅ How to generate and visualize a confusion matrix
✅ How to interpret evaluation metrics for real-world decision-making


๐Ÿง  1️⃣ The Need for Model Evaluation

Training accuracy alone doesn’t tell the full story.
A model can have high training accuracy but low test accuracy, meaning it’s memorizing rather than learning — a classic case of overfitting.

Hence, we use evaluation metrics to:

  • Understand model performance on unseen data

  • Identify biases or misclassifications

  • Compare multiple models systematically


⚙️ 2️⃣ Basic Evaluation Metrics

Let’s go through key metrics used in deep learning model evaluation.

Metric Formula Description
Accuracy ( \frac{TP + TN}{TP + TN + FP + FN} ) Overall correctness of the model
Precision ( \frac{TP}{TP + FP} ) How many predicted positives are actually positive
Recall (Sensitivity) ( \frac{TP}{TP + FN} ) How many actual positives were correctly predicted
F1-Score ( 2 \times \frac{Precision \times Recall}{Precision + Recall} ) Balance between precision and recall

Where:

  • TP = True Positives

  • TN = True Negatives

  • FP = False Positives

  • FN = False Negatives


๐Ÿ” 3️⃣ Confusion Matrix Explained

A Confusion Matrix gives a detailed breakdown of correct and incorrect predictions.

Predicted: Positive Predicted: Negative
Actual: Positive True Positive (TP) False Negative (FN)
Actual: Negative False Positive (FP) True Negative (TN)

For example:

  • In a digit classifier, predicting 7 when the true label is 7 → ✅ (TP)

  • Predicting 3 when true label is 7 → ❌ (FP or FN depending on context)


๐Ÿงฉ 4️⃣ Implementing Model Evaluation in PyTorch

Let’s evaluate our MNIST image classifier trained earlier.

import torch
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Switch model to evaluation mode
model.eval()

y_true = []
y_pred = []

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        y_true.extend(labels.cpu().numpy())
        y_pred.extend(predicted.cpu().numpy())

# Print classification report
print(classification_report(y_true, y_pred))

# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(10,8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix for MNIST Classifier")
plt.show()

๐Ÿงฎ 5️⃣ Example Output

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      980
           1       0.98      0.98      0.98     1135
           2       0.96      0.97      0.97     1032
           3       0.97      0.95      0.96     1010
           4       0.98      0.97      0.98      982
           5       0.95      0.95      0.95      892
           6       0.98      0.99      0.99      958
           7       0.97      0.97      0.97     1028
           8       0.96      0.95      0.96      974
           9       0.96      0.97      0.97     1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

Here, the macro average gives equal weight to each class, while weighted average considers class imbalance.


๐Ÿ“ˆ 6️⃣ Visualizing Confusion Matrix

A heatmap representation makes misclassifications visible.

Example visualization:

๐ŸŸฆ Diagonal cells = correct predictions (TP)
๐ŸŸฅ Off-diagonal cells = misclassifications

If most of the heatmap’s intensity is along the diagonal — your model is performing well.

This visual cue helps you spot systematic errors — e.g., the model confusing 3 with 8, or 5 with 6.


๐Ÿง  7️⃣ Real-World Analogy

Think of a COVID-19 test:

Prediction / Reality Positive Negative
Predicted Positive True Positive (correctly identified infected) False Positive (healthy person incorrectly flagged)
Predicted Negative False Negative (infected person missed) True Negative (correctly identified healthy)

Depending on the use case, you might prioritize:

  • Recall → For medical tests (catch all positives)

  • Precision → For spam filters (avoid false alarms)

Similarly, in AI systems:

  • Fraud detection → prioritize recall

  • Email classification → prioritize precision


๐Ÿงช 8️⃣ Practical Tip — Normalize Confusion Matrix

For clearer visual comparison between classes:

cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
sns.heatmap(cm_normalized, annot=True, fmt='.2f', cmap='Greens')
plt.title("Normalized Confusion Matrix")
plt.show()

This displays percentage-based performance per class — helpful when class counts differ.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


๐Ÿ’ก 9️⃣ Save Metrics for Reporting

You can log evaluation metrics for reproducibility or further analysis:

import pandas as pd

metrics = classification_report(y_true, y_pred, output_dict=True)
df_metrics = pd.DataFrame(metrics).transpose()
df_metrics.to_csv("mnist_metrics.csv", index=True)

This can later be integrated into dashboards or monitoring systems for continuous evaluation.


๐Ÿง  10️⃣ Beyond Accuracy: Advanced Metrics

In advanced use cases (like medical imaging or NLP), you might use:

  • AUC-ROC Curve → Tradeoff between True Positive Rate & False Positive Rate

  • PR Curve (Precision-Recall) → Especially useful for imbalanced data

  • Top-k Accuracy → For multi-class classification (e.g., ImageNet)

Example:

from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_true, y_pred_binary)
roc_auc = auc(fpr, tpr)

Conclusion of Section 14

Model evaluation is where training meets reality — it’s how you confirm that your neural network truly understands patterns rather than memorizing data.

You now know:

  • How to compute precision, recall, and F1-score

  • How to generate and interpret confusion matrices

  • How to visualize and export evaluation metrics for real-world reporting


๐Ÿš€ Section 15: Model Saving, Loading, and Deployment in PyTorch

You’ve trained, tuned, and evaluated your PyTorch model — now it’s time to make it useful in the real world.
Model deployment bridges the gap between training and inference — turning your research code into a production-ready AI application.

In this section, we’ll explore how to:
✅ Save and load PyTorch models safely
✅ Perform inference on new (unseen) data
✅ Export models for deployment (TorchScript / ONNX)
✅ Integrate with APIs and real-world applications


๐Ÿง  1️⃣ Why Model Saving and Loading Matters

Training a model can take hours or even days, depending on data and architecture.
If you don’t save your model, you’ll lose all your learned weights!

By saving the model:

  • You can pause and resume training anytime.

  • You can share the model with other developers.

  • You can deploy it for inference on different platforms.


๐Ÿ’พ 2️⃣ Saving a PyTorch Model

There are two main methods to save models in PyTorch:

A) Save Only Model Weights (Recommended)

This is the most common and flexible approach.

import torch

# Save model weights
torch.save(model.state_dict(), "mnist_model_weights.pth")
print("Model weights saved successfully!")

This saves all the parameters (weights and biases) learned during training — not the model class itself.


B) Save the Entire Model

You can also save both the model structure and weights together (not recommended for production due to serialization issues).

torch.save(model, "mnist_full_model.pth")

However, this approach makes it harder to reload the model across PyTorch versions or systems.


๐Ÿ”„ 3️⃣ Loading the Model

To reuse a model for inference or further training, you load it as follows:

If you saved weights only:

model = Net()  # recreate the model architecture
model.load_state_dict(torch.load("mnist_model_weights.pth"))
model.eval()  # switch to evaluation mode
print("Model loaded successfully!")

If you saved the full model:

model = torch.load("mnist_full_model.pth")
model.eval()

The .eval() method is crucial — it deactivates dropout and batch normalization updates, ensuring consistent predictions during inference.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


๐Ÿ” 4️⃣ Making Predictions (Inference Mode)

Once your model is loaded, you can use it to make predictions on new data:

import torch
from torchvision import transforms
from PIL import Image

# Load image and preprocess
image = Image.open("digit_sample.png").convert("L")
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

image = transform(image).unsqueeze(0)  # add batch dimension

# Predict
model.eval()
with torch.no_grad():
    output = model(image)
    predicted_label = output.argmax(1).item()

print(f"Predicted Digit: {predicted_label}")

Output Example:

Predicted Digit: 7

⚙️ 5️⃣ Deploying PyTorch Models Efficiently

When deploying in production, performance and portability matter.
Two key export formats make deployment easier:


๐Ÿ”น A) TorchScript — Deployment within PyTorch Ecosystem

TorchScript allows you to convert PyTorch models into a format that runs without Python, enabling deployment on mobile or embedded systems.

# Convert to TorchScript
traced_model = torch.jit.trace(model, torch.randn(1, 1, 28, 28))
traced_model.save("mnist_traced_model.pt")
print("Model exported to TorchScript format!")

To load and run:

loaded_model = torch.jit.load("mnist_traced_model.pt")
output = loaded_model(torch.randn(1, 1, 28, 28))

Benefits:

  • Faster inference

  • Runs in C++ environments

  • No need for Python runtime


๐Ÿ”น B) ONNX — Open Neural Network Exchange Format

ONNX is an open standard for AI models, compatible with multiple frameworks like TensorFlow, Caffe2, or even edge devices.

Export to ONNX:

dummy_input = torch.randn(1, 1, 28, 28)
torch.onnx.export(model, dummy_input, "mnist_model.onnx", 
                  input_names=['input'], output_names=['output'])
print("Model exported to ONNX format!")

You can then load the ONNX model using frameworks like ONNX Runtime, TensorRT, or deploy it on Azure, AWS, or NVIDIA Jetson.


๐Ÿงฉ 6️⃣ Real-World Deployment Example — Flask API

You can deploy your trained PyTorch model as a REST API using Flask.

Example:

from flask import Flask, request, jsonify
import torch

app = Flask(__name__)
model = Net()
model.load_state_dict(torch.load("mnist_model_weights.pth"))
model.eval()

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json['image']
    tensor = torch.tensor(data).unsqueeze(0).float()
    with torch.no_grad():
        output = model(tensor)
        pred = output.argmax(1).item()
    return jsonify({'prediction': pred})

if __name__ == '__main__':
    app.run(debug=True)

You can now send a POST request with image data, and the API returns a predicted label in JSON.


๐Ÿง  7️⃣ Model Deployment Options

Depending on your needs, here are popular deployment paths:

Platform Description Example
Local Inference Run directly on CPU/GPU locally Ideal for testing
Flask/FastAPI Lightweight REST API Quick deployment
Docker Containerized deployment For production
ONNX Runtime Fast cross-platform inference Cloud or Edge devices
TorchServe Official PyTorch model server Enterprise-grade deployment

๐Ÿงฎ 8️⃣ Example: Dockerizing Your PyTorch Model

Docker enables your model to run consistently across environments.

Dockerfile Example:

FROM pytorch/pytorch:latest
COPY mnist_model_weights.pth /app/
COPY app.py /app/
WORKDIR /app
RUN pip install flask torchvision
CMD ["python", "app.py"]

Then build and run:

docker build -t mnist-api .
docker run -p 5000:5000 mnist-api

Your model is now live inside a containerized Flask API!

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


๐Ÿ’ก 9️⃣ Model Versioning and Monitoring

In real-world ML systems:

  • You’ll retrain models frequently as new data arrives.

  • You’ll track metrics like accuracy drift or prediction latency.

Use tools like:

  • Weights & Biases (wandb) – Track experiments and versions.

  • MLflow – Manage model lifecycle (training → serving).

  • TorchServe + Prometheus – Monitor inference performance.


Conclusion of Section 15

You’ve now learned how to:

  • Save and load models efficiently

  • Perform inference on new data

  • Export to TorchScript or ONNX for deployment

  • Serve your model using Flask and Docker

With this, your PyTorch model transitions from an academic project to a production-ready AI service.


๐Ÿš€ Section 16: Transfer Learning with Pretrained Models in PyTorch

Training deep neural networks from scratch requires huge datasets and extensive computational resources.
However, in most real-world scenarios, we can leverage existing models that have already learned useful features from massive datasets (like ImageNet).

This is where Transfer Learning comes in — a game-changing technique that helps you build accurate models quickly and efficiently.


๐Ÿง  1️⃣ What is Transfer Learning?

Transfer Learning is the process of taking a pre-trained model (already trained on a large dataset) and adapting it to a new, but related task.

Think of it like this:

“Why reinvent the wheel when you can fine-tune it for your own car?”

A pre-trained model already understands features like edges, shapes, and textures, which can be reused for new image classification tasks.


๐Ÿ”น Key Benefits of Transfer Learning:

Benefit Description
Faster Training You start from pre-learned weights, not random initialization.
Better Accuracy Beneficial when you have limited data.
Reduced Computational Cost Requires less training time and resources.
Proven Architectures Leverage powerful models like ResNet, VGG, or MobileNet.

๐Ÿงฉ 2️⃣ Types of Transfer Learning

There are two main strategies:

A) Feature Extraction

You freeze all layers of the pretrained model and only train the final classifier layer.

  • Used when your dataset is small.

  • The pretrained model acts as a fixed feature extractor.

B) Fine-Tuning

You unfreeze some layers and retrain them along with the final layer.

  • Used when your dataset is large or similar to the original dataset.

  • The model adjusts its internal features to fit the new task.


๐Ÿง  3️⃣ Common Pretrained Models in PyTorch

PyTorch provides a wide range of pretrained models through the torchvision.models library, including:

Model Description
ResNet Deep residual network with skip connections
VGG Deep network with large convolutional layers
AlexNet Classic CNN for image classification
DenseNet Uses dense connections between layers
MobileNet Lightweight model for mobile applications
EfficientNet Scalable model balancing accuracy and efficiency

⚙️ 4️⃣ Example: Transfer Learning with ResNet18

Let’s train a model to classify cats and dogs using a pretrained ResNet18 model.


Step 1: Import Libraries

import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader

Step 2: Load Pretrained Model

# Load pretrained ResNet18
model = models.resnet18(pretrained=True)

# Freeze all layers (Feature Extraction Mode)
for param in model.parameters():
    param.requires_grad = False

# Modify the final layer for 2 output classes (cat/dog)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)

Step 3: Define Transforms and Load Data

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_data = datasets.ImageFolder('data/train', transform=transform)
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

Step 4: Train the Modified Layer

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

for epoch in range(5):
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

Step 5: Save and Test

torch.save(model.state_dict(), "resnet18_cats_dogs.pth")
print("Fine-tuned ResNet18 model saved successfully!")

๐Ÿงฉ 5️⃣ Fine-Tuning Selected Layers

If you have more data, you can unfreeze some deeper layers:

# Unfreeze last two blocks
for name, param in model.named_parameters():
    if "layer4" in name:
        param.requires_grad = True

Then adjust the optimizer to train both new and unfrozen parameters:

optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0001)

This allows the model to adapt to your dataset while retaining its powerful pretrained knowledge.


๐Ÿง  6️⃣ Evaluating Transfer Learning Performance

After training, evaluate the model on the test set:

correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy: {100 * correct / total:.2f}%')

Output Example:

Accuracy: 97.85%

๐Ÿ” 7️⃣ Visualizing Model Predictions

Visualizations make it easier to understand how your model performs.

import matplotlib.pyplot as plt
import numpy as np

def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title:
        plt.title(title)
    plt.show()

inputs, classes = next(iter(test_loader))
outputs = model(inputs)
_, preds = torch.max(outputs, 1)

imshow(inputs[0], title=f"Predicted: {preds[0].item()}")

8️⃣ Transfer Learning in NLP

Transfer learning isn’t limited to computer vision — it’s also dominant in Natural Language Processing (NLP).

Popular pretrained models include:

  • BERT (Bidirectional Encoder Representations from Transformers)

  • GPT (Generative Pretrained Transformer)

  • RoBERTa

  • DistilBERT

Using Hugging Face’s Transformers library:

from transformers import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Fine-tuning BERT for text classification follows a process similar to fine-tuning ResNet for images.


๐ŸŒ 9️⃣ Real-World Applications of Transfer Learning

Domain Application Pretrained Model
Healthcare Medical Image Classification ResNet, DenseNet
Retail Product Recommendation EfficientNet
Finance Fraud Detection TabNet
NLP Sentiment Analysis BERT, RoBERTa
Autonomous Vehicles Object Detection YOLO, Faster R-CNN

๐Ÿง  ๐Ÿ”Ÿ Tips for Effective Transfer Learning

✅ Choose a model pretrained on a similar domain
✅ Use smaller learning rates when fine-tuning
Normalize images properly to match pretrained model stats
Freeze early layers when your dataset is small
✅ Always evaluate before and after fine-tuning to measure improvement


Conclusion of Section 16

You’ve learned how to:

  • Reuse pretrained models like ResNet18

  • Apply feature extraction and fine-tuning strategies

  • Perform training and evaluation efficiently

  • Extend transfer learning to NLP models like BERT

  • Understand real-world deployment scenarios

Transfer learning allows you to achieve state-of-the-art performance without massive datasets or compute power.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

Comments