PyTorch -II

content

5. Building Neural Networks with torch.nn

6. Optimization, Loss Functions, and Regularization

7. Convolutional Neural Networks (CNNs)

8. Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data

9. Model Optimization and Deployment in PyTorch

🧱 Section 5: Building Neural Networks with `torch.nn`

Now that you understand tensors, autograd, and computational graphs, it’s time to bring everything together and build neural networks efficiently using PyTorch’s torch.nn module.

This section will show you how to define models, use layers and activation functions, and perform forward and backward propagation automatically.

🧠 5.1 Introduction to `torch.nn`

torch.nn is a high-level abstraction built on top of tensors and autograd.
It helps you define neural network layers, activation functions, and loss functions easily.

You no longer need to manually track weights, biases, or gradient updates — PyTorch’s nn.Module does it all for you.

⚙️ 5.2 Anatomy of a Neural Network

A neural network consists of:

Input Layer: Accepts data (features)
Hidden Layers: Perform transformations using learned weights and activations
Output Layer: Produces predictions

Each layer performs a linear transformation followed by a non-linear activation.

Mathematically:
[
y = f(Wx + b)
]
where:

( W ): weights
( b ): bias
( f ): activation function (e.g., ReLU, Sigmoid)

🧩 5.3 The `nn.Module` Class

In PyTorch, every model inherits from the base class torch.nn.Module.

Structure of a custom model:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # Define layers
        self.linear1 = nn.Linear(3, 4)
        self.linear2 = nn.Linear(4, 1)
    
    def forward(self, x):
        # Define forward pass
        x = torch.relu(self.linear1(x))
        x = self.linear2(x)
        return x

When you create an instance of MyModel, all parameters (weights and biases) are automatically registered and tracked.

🔢 5.4 Understanding `nn.Linear`

nn.Linear(in_features, out_features)
Performs a linear transformation:
[
y = xW^T + b
]

Example:

layer = nn.Linear(3, 2)  # input size 3, output size 2
x = torch.tensor([[1.0, 2.0, 3.0]])
output = layer(x)
print(output)

Output (random weights):

tensor([[0.4231, -0.5713]], grad_fn=<AddmmBackward>)

Each layer internally stores:

weight: shape (out_features, in_features)
bias: shape (out_features,)

⚡ 5.5 Activation Functions

Activation functions introduce non-linearity — allowing neural networks to model complex patterns.

Common Activation Functions in PyTorch:

Activation	Function	PyTorch Equivalent
ReLU	( f(x) = \max(0, x) )	`nn.ReLU()`
Sigmoid	( f(x) = 1 / (1 + e^{-x}) )	`nn.Sigmoid()`
Tanh	( f(x) = \tanh(x) )	`nn.Tanh()`
LeakyReLU	( f(x) = \max(0.01x, x) )	`nn.LeakyReLU()`
Softmax	Converts logits to probabilities	`nn.Softmax(dim=1)`

Example:

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu = nn.ReLU()
print(relu(x))

Output:

tensor([0., 0., 0., 1., 2.])

🧮 5.6 Building a Simple Feedforward Neural Network

Let’s create a neural network for a simple regression task.

Step 1: Define the Model

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

Step 2: Initialize Model, Loss, and Optimizer

model = NeuralNet(1, 8, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

Step 3: Training Loop

# Sample data: y = 2x + 1
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

for epoch in range(1000):
    outputs = model(X)
    loss = criterion(outputs, Y)
    
    optimizer.zero_grad()  # Clear old gradients
    loss.backward()        # Compute new gradients
    optimizer.step()       # Update weights
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}')

Output:

Epoch [1000/1000], Loss: 0.0001

✅ The model successfully learns the linear relationship ( y = 2x + 1 ).

🔍 5.7 Exploring Model Parameters

You can inspect and print model parameters easily:

for name, param in model.named_parameters():
    print(name, param.data)

Example output:

fc1.weight tensor([[0.8451]])
fc1.bias tensor([0.9945])
fc2.weight tensor([[1.9912]])
fc2.bias tensor([1.0023])

⚙️ 5.8 Saving and Loading Models

Training can be time-consuming — PyTorch allows you to save and reload models effortlessly.

Save model:

torch.save(model.state_dict(), 'model.pth')

Load model:

model = NeuralNet(1, 8, 1)
model.load_state_dict(torch.load('model.pth'))
model.eval()

📉 5.9 Visualizing Training Progress (Optional)

You can visualize the loss curve to monitor convergence.

import matplotlib.pyplot as plt

losses = []

for epoch in range(300):
    outputs = model(X)
    loss = criterion(outputs, Y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Curve')
plt.show()

📊 The loss decreases steadily, confirming that the model is learning effectively.

🧠 5.10 Using Predefined Layers and Sequential API

For simpler models, PyTorch offers nn.Sequential — a compact way to stack layers.

model = nn.Sequential(
    nn.Linear(1, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)

This is functionally equivalent to defining a custom nn.Module, but more concise.

🧩 5.11 Adding Batch Normalization and Dropout

To improve performance and reduce overfitting, include:

Batch Normalization: Normalizes layer inputs.
Dropout: Randomly disables neurons during training.

Example:

model = nn.Sequential(
    nn.Linear(1, 16),
    nn.BatchNorm1d(16),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(16, 1)
)

🚀 5.12 Summary of Section 5

You’ve learned how to:

Define neural networks with torch.nn.Module
Use layers, activations, and optimizers
Train models using automatic differentiation
Save, load, and visualize your models

This section establishes the core workflow of deep learning in PyTorch.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

⚙️ Section 6: Training Deep Neural Networks — Optimization, Loss Functions, and Regularization

🎯 6.1 What Happens During Training?

Training a neural network involves adjusting weights so that the model’s predictions get closer to the actual values.

At a high level, each training iteration consists of these steps:

Forward Pass: Feed inputs through the model to generate predictions.
Loss Calculation: Measure how far predictions are from actual labels.
Backward Pass: Use backpropagation to compute gradients.
Optimization Step: Update weights using an optimizer (like SGD or Adam).

This process repeats for many epochs until the loss converges.

🧮 6.2 The Mathematics Behind Learning

Let’s define:

( x ): input
( y ): true label
( \hat{y} ): predicted output
( L(y, \hat{y}) ): loss function

Each iteration:
[
w_{new} = w_{old} - \eta \cdot \frac{\partial L}{\partial w}
]

Where:

( \eta ): learning rate
( \frac{\partial L}{\partial w} ): gradient of the loss with respect to weight

This is gradient descent — the core mechanism of learning in neural networks.

📉 6.3 Understanding Loss Functions

A loss function quantifies how far the model’s predictions are from actual values.

Common Loss Functions in PyTorch:

Problem Type	Loss Function	PyTorch Class	Description
Regression	Mean Squared Error	`nn.MSELoss()`	Penalizes squared differences
Regression	Mean Absolute Error	`nn.L1Loss()`	Penalizes absolute differences
Binary Classification	Binary Cross Entropy	`nn.BCELoss()`	Measures binary prediction error
Multi-Class Classification	Cross Entropy	`nn.CrossEntropyLoss()`	For multi-class outputs
Probabilistic Models	KL Divergence	`nn.KLDivLoss()`	Compares two distributions

Example — Using MSE Loss:

criterion = nn.MSELoss()
y_pred = torch.tensor([2.5, 0.8, 1.4])
y_true = torch.tensor([3.0, 1.0, 1.3])
loss = criterion(y_pred, y_true)
print(loss)

Output:

tensor(0.0500)

⚡ 6.4 Introduction to Optimizers

Optimizers decide how model parameters are updated based on computed gradients.

1️⃣ Stochastic Gradient Descent (SGD)

Updates weights with a constant learning rate:
[
w = w - \eta \cdot \frac{\partial L}{\partial w}
]

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

2️⃣ Momentum

Adds inertia to updates to escape local minima:
[
v = \beta v - \eta \nabla_w L
]
[
w = w + v
]

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

3️⃣ Adam (Adaptive Moment Estimation)

The most popular optimizer — adaptive learning rates for each parameter.

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Adam combines the benefits of both Momentum and RMSProp, providing faster convergence and better stability.

🧠 6.5 Complete Training Loop Example

Let’s implement a training loop that combines forward pass, backward pass, and optimization.

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
model = nn.Sequential(
    nn.Linear(1, 10),
    nn.ReLU(),
    nn.Linear(10, 1)
)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Data
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

# Training
for epoch in range(500):
    # Forward pass
    y_pred = model(X)
    loss = criterion(y_pred, Y)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch {epoch+1}/500, Loss = {loss.item():.6f}')

Output:

Epoch 500/500, Loss = 0.000045

✅ The model has learned the underlying relationship successfully.

🔍 6.6 Monitoring Model Performance

You can track loss over epochs to visualize learning progress.

import matplotlib.pyplot as plt

losses = []
for epoch in range(200):
    y_pred = model(X)
    loss = criterion(y_pred, Y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss over Time')
plt.show()

📊 A steadily decreasing loss indicates proper learning.

🧩 6.7 Learning Rate — The Most Critical Hyperparameter

The learning rate (η) determines how fast your model learns.

Learning Rate	Behavior
Too high	Model diverges or oscillates
Too low	Training becomes painfully slow
Just right	Smooth, steady convergence

You can experiment or use learning rate schedulers to adjust automatically:

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
for epoch in range(200):
    ...
    scheduler.step()

🧮 6.8 Regularization: Avoiding Overfitting

Overfitting occurs when your model memorizes training data instead of learning general patterns.
Regularization helps control model complexity.

🧱 1. L2 Regularization (Weight Decay)

Adds penalty for large weights.

optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.001)

Mathematically:
[
L_{total} = L + \lambda \sum w^2
]

☁️ 2. Dropout

Randomly “drops” neurons during training to improve generalization.

model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(64, 1)
)

⚖️ 3. Early Stopping

Stops training when validation loss stops improving — prevents overfitting.

Conceptually:

if val_loss > best_val_loss:
    patience_counter += 1
    if patience_counter > patience:
        print("Early stopping triggered")
        break
else:
    best_val_loss = val_loss
    patience_counter = 0

🧠 6.9 Advanced Optimization Techniques

Technique	Description
Batch Normalization	Normalizes activations to stabilize training
Gradient Clipping	Prevents exploding gradients
Learning Rate Warm-up	Gradually increases LR at start
Adaptive Gradient Clipping (AGC)	Scales gradients relative to weights
Cosine Annealing Scheduler	Smooth cyclic learning rate decay

Example — Gradient Clipping:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

📦 6.10 Putting It All Together — Modular Training Function

Here’s a clean reusable function that trains any model:

def train_model(model, criterion, optimizer, X, Y, epochs=300):
    losses = []
    for epoch in range(epochs):
        model.train()
        y_pred = model(X)
        loss = criterion(y_pred, Y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
    return losses

Usage:

losses = train_model(model, criterion, optimizer, X, Y)
plt.plot(losses)

✅ 6.11 Summary of Section 6

By now, you understand:

The mathematics behind training (gradient descent)
The purpose of loss functions
How to choose optimizers (SGD, Adam, RMSProp)
Methods to regularize models and prevent overfitting
How to visualize and fine-tune training performance

This section bridges the gap between building a model and mastering the training process.

🧠 Section 7: Convolutional Neural Networks (CNNs) with PyTorch

Convolutional Neural Networks (CNNs) are the foundation of modern computer vision. They are used in applications like image recognition, object detection, facial recognition, and even medical imaging.

In this section, you’ll learn:

The intuition and mathematics behind CNNs
Core building blocks (convolution, pooling, activation, fully connected layers)
Implementing a CNN using PyTorch
Training a CNN on real image data (CIFAR-10 or MNIST)
Using pre-trained CNNs like ResNet for transfer learning

🧩 7.1. What is a CNN?

A Convolutional Neural Network (CNN) is a special type of neural network designed to process data with grid-like topology, such as images (2D grids of pixels).

Instead of connecting every neuron to every pixel (as in dense networks), CNNs use convolutional filters that slide over the image to detect local patterns like:

Edges
Corners
Textures
Complex features (like eyes, faces, or objects)

🧠 Analogy:
Think of a CNN filter as a “pattern detector” that scans an image — much like how our brain identifies shapes and edges.

🔢 7.2. Key Components of a CNN

1️⃣ Convolution Layer

Performs a mathematical operation that multiplies and sums pixel values with a small filter (kernel).

Mathematically:
[
O(i,j) = \sum_m \sum_n I(i+m, j+n) \times K(m, n)
]
Where:

( I ) = input image
( K ) = kernel (filter)
( O ) = output feature map

In PyTorch:

torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)

2️⃣ Activation Layer (ReLU)

Applies non-linearity to make the network capable of learning complex patterns.

[
f(x) = \max(0, x)
]

3️⃣ Pooling Layer

Reduces spatial size (height × width) while keeping the important features.

torch.nn.MaxPool2d(kernel_size=2, stride=2)

4️⃣ Fully Connected Layer

Connects flattened features to output classes for classification.

5️⃣ Softmax

Converts final scores to probabilities.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🧱 7.3. Visual Intuition

🧩 CNN Layers Flow:

Image → Convolution → ReLU → Pooling → Flatten → Dense → Softmax

Layer	Function	Output Example
Conv2D	Extracts features	32×32×16
MaxPool	Downsamples	16×16×16
Conv2D	Deeper features	16×16×32
Flatten + Dense	Classification	10 classes (e.g., digits)

🧪 7.4. Implementing a CNN from Scratch in PyTorch

Let’s build a CNN to classify CIFAR-10 images — a dataset of 60,000 color images across 10 classes (cat, dog, airplane, etc.).

Step 1️⃣: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

Step 2️⃣: Load Dataset

# Transform: Normalize and convert to tensor
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

Step 3️⃣: Define CNN Architecture

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleCNN()
print(model)

✅ Output:

SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), padding=(1, 1))
  (fc1): Linear(in_features=2048, out_features=128)
  (fc2): Linear(in_features=128, out_features=10)
)

Step 4️⃣: Define Loss and Optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Step 5️⃣: Train the Model

for epoch in range(5):  # loop over dataset multiple times
    running_loss = 0.0
    for images, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")

Step 6️⃣: Evaluate Accuracy

correct, total = 0, 0
with torch.no_grad():
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

✅ Output:

Test Accuracy: ~70–75%

🧠 7.5. Visualizing Feature Maps

To understand what CNNs “see”, visualize intermediate activations.

import matplotlib.pyplot as plt

def visualize_features(model, image):
    layer = model.conv1
    with torch.no_grad():
        features = layer(image.unsqueeze(0))
    fig, axes = plt.subplots(1, 6, figsize=(12, 4))
    for i in range(6):
        axes[i].imshow(features[0][i].detach().numpy(), cmap='gray')
        axes[i].axis('off')
    plt.show()

🖼️ This helps explain how early CNN layers detect edges and deeper layers detect patterns like eyes, fur, or wheels.

🔁 7.6. Transfer Learning with Pretrained CNNs

Instead of training from scratch, you can use pretrained CNNs like ResNet, VGG, or MobileNet.

from torchvision import models

model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False  # Freeze base layers

# Replace final layer for custom classification
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)

✅ Advantages:

Faster training
Better accuracy
Works even with smaller datasets

📊 7.7. Common CNN Architectures

Architecture	Year	Key Innovation
LeNet-5	1998	First CNN for handwritten digits
AlexNet	2012	Deep CNN, won ImageNet
VGGNet	2014	Uniform 3×3 filters
ResNet	2015	Skip connections to fight vanishing gradients
EfficientNet	2019	Parameter-efficient scaling

Each evolution made CNNs deeper, faster, and more accurate.

🌍 7.8. Real-World Use Cases of CNNs

Industry	Application	Description
🏥 Healthcare	Tumor Detection	Identify cancerous cells in MRI scans
🚗 Automotive	Self-Driving Cars	Detect pedestrians and traffic signs
📱 Mobile	Face Recognition	Unlock devices using CNN-based models
🛒 E-commerce	Visual Search	Suggest similar products from images
🎥 Media	Video Analytics	Detect scenes, objects, or logos in videos

🧾 7.9. Key Takeaways

CNNs learn spatial hierarchies automatically — from pixels to patterns.
Layers like convolution, pooling, and ReLU are the backbone of vision models.
Transfer learning saves time and improves accuracy on limited data.
Tools like TorchVision make image preprocessing and model loading easy.
Real-world applications range from healthcare to autonomous driving..

🔁 Section 8: Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data

🧠 8.1 What Are RNNs?

A Recurrent Neural Network (RNN) is a special type of neural network designed to process sequences of data, where each input depends on previous ones.

Unlike feedforward networks that treat all inputs independently, RNNs retain a hidden state that captures information from previous steps.

Intuitive Example:

Think of predicting the next word in a sentence:

“I am going to the ___.”

The prediction “market” depends on earlier words — that’s sequence awareness, which RNNs excel at.

🧮 8.2 How RNNs Work (Mathematical Intuition)

For a sequence ( x_1, x_2, ..., x_t ):

At each time step:
[
h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b_h)
]
[
y_t = W_y \cdot h_t + b_y
]

Where:

( h_t ): hidden state (memory)
( x_t ): input at time step ( t )
( y_t ): output
( f ): activation (usually tanh or ReLU)

So the hidden state is recursively updated, carrying information from past inputs — this gives RNNs their “memory”.

🧩 8.3 RNNs vs Feedforward Networks

Feature	Feedforward NN	RNN
Input type	Independent samples	Sequential data
Memory	No memory	Retains hidden states
Weight sharing	Different weights per input	Same weights across time
Use case	Images, tabular data	Text, audio, time series

🧱 8.4 Implementing a Simple RNN from Scratch

Let’s build a small RNN to predict the next number in a sequence.

Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

Step 2: Create Data

We’ll use a simple sequence ( [0, 1, 2, 3, 4, 5, ...] ) and try to predict the next number.

seq = torch.arange(0, 10, dtype=torch.float32)
X = seq[:-1].unsqueeze(1)  # inputs
Y = seq[1:].unsqueeze(1)   # targets

Step 3: Define RNN Model

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden):
        out, hidden = self.rnn(x, hidden)
        out = self.fc(out)
        return out, hidden

Step 4: Initialize

model = SimpleRNN(input_size=1, hidden_size=10, output_size=1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

Step 5: Train Model

X = X.unsqueeze(0)  # batch dimension
Y = Y.unsqueeze(0)

hidden = None
for epoch in range(300):
    output, hidden = model(X, hidden)
    loss = criterion(output, Y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    hidden = hidden.detach()  # prevent exploding gradients

    if (epoch + 1) % 50 == 0:
        print(f'Epoch [{epoch+1}/300], Loss: {loss.item():.6f}')

Output:

Epoch [300/300], Loss: 0.000012

✅ The model learns to predict the next number in the sequence!

🧮 8.5 Using `nn.RNN` Directly (Built-in Simplicity)

PyTorch makes it simple to create RNNs with its built-in layer:

rnn = nn.RNN(input_size=5, hidden_size=10, num_layers=2, batch_first=True)

Parameters:

input_size: features per timestep
hidden_size: hidden layer dimension
num_layers: stack multiple RNN layers
batch_first=True: input shape as (batch, seq, features)

🧩 8.6 Limitations of Vanilla RNNs

Despite their simplicity, RNNs struggle with long sequences due to:

Vanishing gradients (earlier information fades)
Exploding gradients (gradients become too large)
Limited long-term memory

To overcome this, we use LSTM and GRU architectures.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

⚙️ 8.7 Long Short-Term Memory (LSTM)

LSTMs (Long Short-Term Memory networks) were designed to remember information over longer time intervals.

They introduce gates that control information flow:

Forget Gate: Decides what to discard
Input Gate: Decides what to update
Output Gate: Decides what to output

Equations:
[
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
]
[
i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)
]
[
\tilde{C}t = \tanh(W_c [h{t-1}, x_t] + b_c)
]
[
C_t = f_t * C_{t-1} + i_t * \tilde{C}_t
]
[
h_t = o_t * \tanh(C_t)
]

💻 8.8 Implementing an LSTM in PyTorch

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out)
        return out

Usage:

model = LSTMModel(1, 32, 1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

The LSTM retains memory over longer sequences and performs better than vanilla RNNs in most real-world problems.

🔁 8.9 GRU: Gated Recurrent Unit

GRUs are a simplified version of LSTMs that merge the forget and input gates.
They’re faster to train but perform comparably.

gru = nn.GRU(input_size=1, hidden_size=32, batch_first=True)

GRUs are widely used for speech, time-series forecasting, and chatbots where long-term dependencies are moderate.

🌍 8.10 Real-World Use Cases of RNNs

Domain	Application	Model Type
📊 Finance	Stock price prediction	LSTM
🗣️ NLP	Text generation, translation	GRU/LSTM
🎶 Audio	Speech recognition	LSTM
🕒 IoT	Sensor data forecasting	GRU
💬 Chatbots	Context understanding	LSTM/Transformer hybrid

💬 8.11 Example: Character-Level Text Generation

Let’s create a mini text generator that learns sequences of characters.

import torch.nn.functional as F

chars = list("hello")
char2idx = {ch: i for i, ch in enumerate(chars)}
idx2char = {i: ch for ch, i in char2idx.items()}

seq = torch.tensor([[char2idx['h'], char2idx['e'], char2idx['l'], char2idx['l']]])
target = torch.tensor([[char2idx['e'], char2idx['l'], char2idx['l'], char2idx['o']]])

model = nn.Sequential(
    nn.Embedding(len(chars), 8),
    nn.RNN(8, 16, batch_first=True),
    nn.Linear(16, len(chars))
)

The model can then learn to predict “e”, “l”, “l”, “o” given “h”, “e”, “l”, “l”.

🧮 8.12 Visualizing Hidden States

You can visualize how hidden states evolve:

import matplotlib.pyplot as plt

hidden_states = []
hidden = None
for i in range(len(X[0])):
    out, hidden = model(X[:, i:i+1], hidden)
    hidden_states.append(hidden[0].detach().numpy().flatten())

plt.plot(hidden_states)
plt.title("Hidden State Evolution Over Time")
plt.show()

This shows how memory updates through the sequence.

🧠 8.13 Tips for Training RNNs

Normalize or scale sequential data
Use gradient clipping to prevent exploding gradients
Initialize hidden states properly (hidden = torch.zeros(...))
Use LSTMs/GRUs for complex tasks
Experiment with sequence lengths (shorter sequences → faster training)

✅ 8.14 Summary of Section 8

You now understand:

What makes RNNs special for sequential data
How LSTMs and GRUs solve memory challenges
How to build, train, and visualize RNNs in PyTorch
Real-world use cases in text, audio, and forecasting

🚀 Up Next:

We’ll continue with Section 9: Natural Language Processing (NLP) with PyTorch, where we’ll apply RNNs and Transformers for:

Text preprocessing
Sentiment analysis
Word embeddings (Word2Vec, GloVe)
Sequence-to-sequence translation models

🧩 Section 9: Model Optimization and Deployment in PyTorch

Building and training a model is only part of the deep learning journey. Once you have a working model, the next big steps are:

Optimizing it for speed, memory, and accuracy.
Deploying it in a scalable way to real-world applications — from cloud servers to mobile devices.

PyTorch provides an excellent ecosystem for both these stages — with tools like TorchScript, ONNX, Quantization, and TorchServe.

⚙️ 9.1. Model Optimization: The Key to Efficiency

Model optimization in PyTorch focuses on reducing computational cost and improving inference speed without sacrificing accuracy.

🚀 Common Optimization Techniques

Optimization Type	Description	Example
Quantization	Converts model parameters from float32 to int8 to reduce size and speed up inference	Mobile & Edge AI
Pruning	Removes weights or neurons that contribute little to output	Compress large models
Knowledge Distillation	Trains a smaller model (student) using the outputs of a large model (teacher)	Efficient model serving
Mixed Precision Training	Uses float16 and float32 together for faster GPU training	NVIDIA Ampere GPUs

🧮 9.2. Quantization Example

Quantization helps deploy models to mobile or embedded devices by reducing memory footprint.

import torch
from torchvision import models

# Load a pretrained model
model = models.resnet18(pretrained=True)
model.eval()

# Static quantization preparation
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)

print("Quantized model size:", sum(p.numel() for p in model.parameters()))

✅ Result: You’ll see up to 4× reduction in model size and improved inference time.

🔪 9.3. Model Pruning Example

Model pruning eliminates unnecessary weights.

import torch.nn.utils.prune as prune

# Prune 30% of connections in linear layer
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        prune.l1_unstructured(module, name="weight", amount=0.3)

✅ Outcome: Model becomes smaller and faster with minimal accuracy drop.

⚗️ 9.4. Knowledge Distillation (Student–Teacher Learning)

Knowledge Distillation allows training a compact student model by learning from a larger, pre-trained teacher model.

teacher_model = models.resnet50(pretrained=True)
student_model = models.resnet18()

criterion = torch.nn.KLDivLoss(reduction='batchmean')
optimizer = torch.optim.Adam(student_model.parameters(), lr=0.001)

for data, target in dataloader:
    with torch.no_grad():
        teacher_output = torch.nn.functional.log_softmax(teacher_model(data) / 5, dim=1)
    student_output = torch.nn.functional.log_softmax(student_model(data) / 5, dim=1)
    loss = criterion(student_output, teacher_output)
    loss.backward()
    optimizer.step()

✅ Real-world Example:
Companies like Google and Meta use distillation to deploy small transformer models on mobile devices (e.g., BERT → TinyBERT).

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

🚀 9.5. Deployment: From Research to Production

Once optimized, models need to be served efficiently in production environments — APIs, web services, or mobile apps.

PyTorch offers multiple deployment options:

TorchScript (Convert model to static graph)
ONNX (Export model for cross-framework compatibility)
TorchServe (Production-grade model server)
PyTorch Mobile (For mobile/edge devices)

🧱 9.5.1. TorchScript

TorchScript converts dynamic PyTorch models into a serialized form that runs without Python — ideal for production environments.

# Convert model to TorchScript
traced_model = torch.jit.trace(model, torch.randn(1, 3, 224, 224))
torch.jit.save(traced_model, "resnet18_traced.pt")

# Load and run TorchScript model
loaded = torch.jit.load("resnet18_traced.pt")
output = loaded(torch.randn(1, 3, 224, 224))

✅ Benefits:

Faster inference
Portable to C++ runtime
No dependency on Python at inference time

🔗 9.5.2. ONNX (Open Neural Network Exchange)

ONNX enables exporting models to other frameworks like TensorFlow, Caffe2, or OpenVINO.

dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "resnet18.onnx", export_params=True)

✅ Benefits:

Framework interoperability
Deploy on non-PyTorch systems
Optimized inference with ONNX Runtime

🧰 9.5.3. TorchServe (Model Serving)

TorchServe is an official tool by AWS & Facebook for serving PyTorch models in production.

Steps:

Save model file (.mar)
Launch TorchServe server
Expose REST API for inference

torch-model-archiver --model-name resnet18 \
--version 1.0 --serialized-file resnet18.pt \
--handler image_classifier --export-path model_store

torchserve --start --ncs --model-store model_store --models resnet=resnet18.mar

✅ Benefits:

Handles multiple models
Supports batching and metrics
Scalable for enterprise deployments

📱 9.5.4. PyTorch Mobile

For mobile or edge deployment, convert model using TorchScript and integrate with:

Android (Java API)
iOS (Swift API)

# Convert to TorchScript
scripted_model = torch.jit.script(model)
scripted_model.save("mobile_model.pt")

✅ Use Case Example:
AI-based camera filters, speech recognition, and on-device pose estimation models.

🌩️ 9.6. Deploying Models to the Cloud

Cloud platforms make large-scale deployment easier:

AWS Sagemaker → Native PyTorch support
Google Cloud AI Platform → Scalable REST APIs
Azure ML → Preconfigured PyTorch containers

Each allows automatic scaling, version control, and CI/CD pipelines.

📊 9.7. Monitoring and Logging

Post-deployment, it’s crucial to track model performance using metrics such as:

Latency
Accuracy drift
Resource utilization

Tools for monitoring:

Weights & Biases
MLflow
Prometheus + Grafana

🧠 9.8. Real-World Example: Image Classification API

Here’s how a simple Flask-based API might use a deployed PyTorch model:

from flask import Flask, request, jsonify
import torch
from torchvision import models, transforms
from PIL import Image

app = Flask(__name__)
model = torch.jit.load("resnet18_traced.pt")
model.eval()

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
])

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['file']
    image = Image.open(file)
    img_tensor = transform(image).unsqueeze(0)
    with torch.no_grad():
        preds = model(img_tensor)
    predicted = torch.argmax(preds, 1).item()
    return jsonify({"class": int(predicted)})

if __name__ == '__main__':
    app.run(debug=True)

✅ Use Case:
Deploying your PyTorch model as a REST API for an image recognition service.

🏁 9.9. Summary

Concept	Purpose	Tools/Methods
Quantization	Reduce model size	`torch.quantization`
Pruning	Remove redundant weights	`torch.nn.utils.prune`
Distillation	Compress model via teacher-student learning	KLDivLoss
Deployment	Run in production	TorchScript, ONNX, TorchServe
Monitoring	Track and analyze performance	MLflow, W&B

🧠 Final Insight

Model optimization and deployment are where research becomes reality.
With PyTorch, you can go from a prototype on your laptop to a production-grade AI model running in the cloud or on a mobile device — all using the same ecosystem.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

Building Neural Networks with torch.nn, CNN, RNN, Model Optimization and Deployment in PyTorch

PyTorch -II

content

🧱 Section 5: Building Neural Networks with torch.nn

🧠 5.1 Introduction to torch.nn

⚙️ 5.2 Anatomy of a Neural Network

🧩 5.3 The nn.Module Class

🔢 5.4 Understanding nn.Linear

⚡ 5.5 Activation Functions

Common Activation Functions in PyTorch:

🧮 5.6 Building a Simple Feedforward Neural Network

Step 1: Define the Model

Step 2: Initialize Model, Loss, and Optimizer

Step 3: Training Loop

🔍 5.7 Exploring Model Parameters

⚙️ 5.8 Saving and Loading Models

Save model:

Load model:

📉 5.9 Visualizing Training Progress (Optional)

🧠 5.10 Using Predefined Layers and Sequential API

🧩 5.11 Adding Batch Normalization and Dropout

🚀 5.12 Summary of Section 5

Sponsor Key-Word

⚙️ Section 6: Training Deep Neural Networks — Optimization, Loss Functions, and Regularization

🎯 6.1 What Happens During Training?

🧮 6.2 The Mathematics Behind Learning

📉 6.3 Understanding Loss Functions

Common Loss Functions in PyTorch:

⚡ 6.4 Introduction to Optimizers

1️⃣ Stochastic Gradient Descent (SGD)

2️⃣ Momentum

3️⃣ Adam (Adaptive Moment Estimation)

🧠 6.5 Complete Training Loop Example

🔍 6.6 Monitoring Model Performance

🧩 6.7 Learning Rate — The Most Critical Hyperparameter

🧮 6.8 Regularization: Avoiding Overfitting

🧱 1. L2 Regularization (Weight Decay)

☁️ 2. Dropout

⚖️ 3. Early Stopping

🧠 6.9 Advanced Optimization Techniques

📦 6.10 Putting It All Together — Modular Training Function

✅ 6.11 Summary of Section 6

🧠 Section 7: Convolutional Neural Networks (CNNs) with PyTorch

🧩 7.1. What is a CNN?

🔢 7.2. Key Components of a CNN

1️⃣ Convolution Layer

2️⃣ Activation Layer (ReLU)

3️⃣ Pooling Layer

4️⃣ Fully Connected Layer

5️⃣ Softmax

Sponsor Key-Word

🧱 7.3. Visual Intuition

🧪 7.4. Implementing a CNN from Scratch in PyTorch

Step 1️⃣: Import Libraries

Step 2️⃣: Load Dataset

Step 3️⃣: Define CNN Architecture

Step 4️⃣: Define Loss and Optimizer

Step 5️⃣: Train the Model

Step 6️⃣: Evaluate Accuracy

🧠 7.5. Visualizing Feature Maps

🔁 7.6. Transfer Learning with Pretrained CNNs

📊 7.7. Common CNN Architectures

🌍 7.8. Real-World Use Cases of CNNs

🧾 7.9. Key Takeaways

🔁 Section 8: Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data

🧠 8.1 What Are RNNs?

Intuitive Example:

🧮 8.2 How RNNs Work (Mathematical Intuition)

🧩 8.3 RNNs vs Feedforward Networks

🧱 8.4 Implementing a Simple RNN from Scratch

Step 1: Import Libraries

Step 2: Create Data

Step 3: Define RNN Model

Step 4: Initialize

Step 5: Train Model

🧮 8.5 Using nn.RNN Directly (Built-in Simplicity)

🧩 8.6 Limitations of Vanilla RNNs

Sponsor Key-Word

⚙️ 8.7 Long Short-Term Memory (LSTM)

💻 8.8 Implementing an LSTM in PyTorch

🧱 Section 5: Building Neural Networks with `torch.nn`

🧠 5.1 Introduction to `torch.nn`

🧩 5.3 The `nn.Module` Class

🔢 5.4 Understanding `nn.Linear`

🧮 8.5 Using `nn.RNN` Directly (Built-in Simplicity)