Building Neural Networks with torch.nn, CNN, RNN, Model Optimization and Deployment in PyTorch

PyTorch -II

content 

5. Building Neural Networks with torch.nn 

6. Optimization, Loss Functions, and Regularization 

7. Convolutional Neural Networks (CNNs)

8. Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data

9. Model Optimization and Deployment in PyTorch


🧱 Section 5: Building Neural Networks with torch.nn

Now that you understand tensors, autograd, and computational graphs, it’s time to bring everything together and build neural networks efficiently using PyTorch’s torch.nn module.

This section will show you how to define models, use layers and activation functions, and perform forward and backward propagation automatically.


🧠 5.1 Introduction to torch.nn

torch.nn is a high-level abstraction built on top of tensors and autograd.
It helps you define neural network layers, activation functions, and loss functions easily.

You no longer need to manually track weights, biases, or gradient updates — PyTorch’s nn.Module does it all for you.


⚙️ 5.2 Anatomy of a Neural Network

A neural network consists of:

  • Input Layer: Accepts data (features)

  • Hidden Layers: Perform transformations using learned weights and activations

  • Output Layer: Produces predictions

Each layer performs a linear transformation followed by a non-linear activation.

Mathematically:
[
y = f(Wx + b)
]
where:

  • ( W ): weights

  • ( b ): bias

  • ( f ): activation function (e.g., ReLU, Sigmoid)


🧩 5.3 The nn.Module Class

In PyTorch, every model inherits from the base class torch.nn.Module.

Structure of a custom model:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # Define layers
        self.linear1 = nn.Linear(3, 4)
        self.linear2 = nn.Linear(4, 1)
    
    def forward(self, x):
        # Define forward pass
        x = torch.relu(self.linear1(x))
        x = self.linear2(x)
        return x

When you create an instance of MyModel, all parameters (weights and biases) are automatically registered and tracked.


🔢 5.4 Understanding nn.Linear

nn.Linear(in_features, out_features)
Performs a linear transformation:
[
y = xW^T + b
]

Example:

layer = nn.Linear(3, 2)  # input size 3, output size 2
x = torch.tensor([[1.0, 2.0, 3.0]])
output = layer(x)
print(output)

Output (random weights):

tensor([[0.4231, -0.5713]], grad_fn=<AddmmBackward>)

Each layer internally stores:

  • weight: shape (out_features, in_features)

  • bias: shape (out_features,)


⚡ 5.5 Activation Functions

Activation functions introduce non-linearity — allowing neural networks to model complex patterns.

Common Activation Functions in PyTorch:

Activation Function PyTorch Equivalent
ReLU ( f(x) = \max(0, x) ) nn.ReLU()
Sigmoid ( f(x) = 1 / (1 + e^{-x}) ) nn.Sigmoid()
Tanh ( f(x) = \tanh(x) ) nn.Tanh()
LeakyReLU ( f(x) = \max(0.01x, x) ) nn.LeakyReLU()
Softmax Converts logits to probabilities nn.Softmax(dim=1)

Example:

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu = nn.ReLU()
print(relu(x))

Output:

tensor([0., 0., 0., 1., 2.])

🧮 5.6 Building a Simple Feedforward Neural Network

Let’s create a neural network for a simple regression task.

Step 1: Define the Model

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

Step 2: Initialize Model, Loss, and Optimizer

model = NeuralNet(1, 8, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

Step 3: Training Loop

# Sample data: y = 2x + 1
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

for epoch in range(1000):
    outputs = model(X)
    loss = criterion(outputs, Y)
    
    optimizer.zero_grad()  # Clear old gradients
    loss.backward()        # Compute new gradients
    optimizer.step()       # Update weights
    
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}')

Output:

Epoch [1000/1000], Loss: 0.0001

✅ The model successfully learns the linear relationship ( y = 2x + 1 ).


🔍 5.7 Exploring Model Parameters

You can inspect and print model parameters easily:

for name, param in model.named_parameters():
    print(name, param.data)

Example output:

fc1.weight tensor([[0.8451]])
fc1.bias tensor([0.9945])
fc2.weight tensor([[1.9912]])
fc2.bias tensor([1.0023])

⚙️ 5.8 Saving and Loading Models

Training can be time-consuming — PyTorch allows you to save and reload models effortlessly.

Save model:

torch.save(model.state_dict(), 'model.pth')

Load model:

model = NeuralNet(1, 8, 1)
model.load_state_dict(torch.load('model.pth'))
model.eval()

📉 5.9 Visualizing Training Progress (Optional)

You can visualize the loss curve to monitor convergence.

import matplotlib.pyplot as plt

losses = []

for epoch in range(300):
    outputs = model(X)
    loss = criterion(outputs, Y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Curve')
plt.show()

📊 The loss decreases steadily, confirming that the model is learning effectively.


🧠 5.10 Using Predefined Layers and Sequential API

For simpler models, PyTorch offers nn.Sequential — a compact way to stack layers.

model = nn.Sequential(
    nn.Linear(1, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
)

This is functionally equivalent to defining a custom nn.Module, but more concise.


🧩 5.11 Adding Batch Normalization and Dropout

To improve performance and reduce overfitting, include:

  • Batch Normalization: Normalizes layer inputs.

  • Dropout: Randomly disables neurons during training.

Example:

model = nn.Sequential(
    nn.Linear(1, 16),
    nn.BatchNorm1d(16),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(16, 1)
)

🚀 5.12 Summary of Section 5

You’ve learned how to:

  • Define neural networks with torch.nn.Module

  • Use layers, activations, and optimizers

  • Train models using automatic differentiation

  • Save, load, and visualize your models

This section establishes the core workflow of deep learning in PyTorch.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


⚙️ Section 6: Training Deep Neural Networks — Optimization, Loss Functions, and Regularization


🎯 6.1 What Happens During Training?

Training a neural network involves adjusting weights so that the model’s predictions get closer to the actual values.

At a high level, each training iteration consists of these steps:

  1. Forward Pass: Feed inputs through the model to generate predictions.

  2. Loss Calculation: Measure how far predictions are from actual labels.

  3. Backward Pass: Use backpropagation to compute gradients.

  4. Optimization Step: Update weights using an optimizer (like SGD or Adam).

This process repeats for many epochs until the loss converges.


🧮 6.2 The Mathematics Behind Learning

Let’s define:

  • ( x ): input

  • ( y ): true label

  • ( \hat{y} ): predicted output

  • ( L(y, \hat{y}) ): loss function

Each iteration:
[
w_{new} = w_{old} - \eta \cdot \frac{\partial L}{\partial w}
]

Where:

  • ( \eta ): learning rate

  • ( \frac{\partial L}{\partial w} ): gradient of the loss with respect to weight

This is gradient descent — the core mechanism of learning in neural networks.


📉 6.3 Understanding Loss Functions

A loss function quantifies how far the model’s predictions are from actual values.

Common Loss Functions in PyTorch:

Problem Type Loss Function PyTorch Class Description
Regression Mean Squared Error nn.MSELoss() Penalizes squared differences
Regression Mean Absolute Error nn.L1Loss() Penalizes absolute differences
Binary Classification Binary Cross Entropy nn.BCELoss() Measures binary prediction error
Multi-Class Classification Cross Entropy nn.CrossEntropyLoss() For multi-class outputs
Probabilistic Models KL Divergence nn.KLDivLoss() Compares two distributions

Example — Using MSE Loss:

criterion = nn.MSELoss()
y_pred = torch.tensor([2.5, 0.8, 1.4])
y_true = torch.tensor([3.0, 1.0, 1.3])
loss = criterion(y_pred, y_true)
print(loss)

Output:

tensor(0.0500)

⚡ 6.4 Introduction to Optimizers

Optimizers decide how model parameters are updated based on computed gradients.

1️⃣ Stochastic Gradient Descent (SGD)

Updates weights with a constant learning rate:
[
w = w - \eta \cdot \frac{\partial L}{\partial w}
]

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

2️⃣ Momentum

Adds inertia to updates to escape local minima:
[
v = \beta v - \eta \nabla_w L
]
[
w = w + v
]

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

3️⃣ Adam (Adaptive Moment Estimation)

The most popular optimizer — adaptive learning rates for each parameter.

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Adam combines the benefits of both Momentum and RMSProp, providing faster convergence and better stability.


🧠 6.5 Complete Training Loop Example

Let’s implement a training loop that combines forward pass, backward pass, and optimization.

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
model = nn.Sequential(
    nn.Linear(1, 10),
    nn.ReLU(),
    nn.Linear(10, 1)
)

# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Data
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

# Training
for epoch in range(500):
    # Forward pass
    y_pred = model(X)
    loss = criterion(y_pred, Y)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch {epoch+1}/500, Loss = {loss.item():.6f}')

Output:

Epoch 500/500, Loss = 0.000045

✅ The model has learned the underlying relationship successfully.


🔍 6.6 Monitoring Model Performance

You can track loss over epochs to visualize learning progress.

import matplotlib.pyplot as plt

losses = []
for epoch in range(200):
    y_pred = model(X)
    loss = criterion(y_pred, Y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss over Time')
plt.show()

📊 A steadily decreasing loss indicates proper learning.


🧩 6.7 Learning Rate — The Most Critical Hyperparameter

The learning rate (η) determines how fast your model learns.

Learning Rate Behavior
Too high Model diverges or oscillates
Too low Training becomes painfully slow
Just right Smooth, steady convergence

You can experiment or use learning rate schedulers to adjust automatically:

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
for epoch in range(200):
    ...
    scheduler.step()

🧮 6.8 Regularization: Avoiding Overfitting

Overfitting occurs when your model memorizes training data instead of learning general patterns.
Regularization helps control model complexity.

🧱 1. L2 Regularization (Weight Decay)

Adds penalty for large weights.

optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.001)

Mathematically:
[
L_{total} = L + \lambda \sum w^2
]


☁️ 2. Dropout

Randomly “drops” neurons during training to improve generalization.

model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(64, 1)
)

⚖️ 3. Early Stopping

Stops training when validation loss stops improving — prevents overfitting.

Conceptually:

if val_loss > best_val_loss:
    patience_counter += 1
    if patience_counter > patience:
        print("Early stopping triggered")
        break
else:
    best_val_loss = val_loss
    patience_counter = 0

🧠 6.9 Advanced Optimization Techniques

Technique Description
Batch Normalization Normalizes activations to stabilize training
Gradient Clipping Prevents exploding gradients
Learning Rate Warm-up Gradually increases LR at start
Adaptive Gradient Clipping (AGC) Scales gradients relative to weights
Cosine Annealing Scheduler Smooth cyclic learning rate decay

Example — Gradient Clipping:

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

📦 6.10 Putting It All Together — Modular Training Function

Here’s a clean reusable function that trains any model:

def train_model(model, criterion, optimizer, X, Y, epochs=300):
    losses = []
    for epoch in range(epochs):
        model.train()
        y_pred = model(X)
        loss = criterion(y_pred, Y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
    return losses

Usage:

losses = train_model(model, criterion, optimizer, X, Y)
plt.plot(losses)

✅ 6.11 Summary of Section 6

By now, you understand:

  • The mathematics behind training (gradient descent)

  • The purpose of loss functions

  • How to choose optimizers (SGD, Adam, RMSProp)

  • Methods to regularize models and prevent overfitting

  • How to visualize and fine-tune training performance

This section bridges the gap between building a model and mastering the training process.


🧠 Section 7: Convolutional Neural Networks (CNNs) with PyTorch

Convolutional Neural Networks (CNNs) are the foundation of modern computer vision. They are used in applications like image recognition, object detection, facial recognition, and even medical imaging.

In this section, you’ll learn:

  • The intuition and mathematics behind CNNs

  • Core building blocks (convolution, pooling, activation, fully connected layers)

  • Implementing a CNN using PyTorch

  • Training a CNN on real image data (CIFAR-10 or MNIST)

  • Using pre-trained CNNs like ResNet for transfer learning


🧩 7.1. What is a CNN?

A Convolutional Neural Network (CNN) is a special type of neural network designed to process data with grid-like topology, such as images (2D grids of pixels).

Instead of connecting every neuron to every pixel (as in dense networks), CNNs use convolutional filters that slide over the image to detect local patterns like:

  • Edges

  • Corners

  • Textures

  • Complex features (like eyes, faces, or objects)

🧠 Analogy:
Think of a CNN filter as a “pattern detector” that scans an image — much like how our brain identifies shapes and edges.


🔢 7.2. Key Components of a CNN

1️⃣ Convolution Layer

Performs a mathematical operation that multiplies and sums pixel values with a small filter (kernel).

Mathematically:
[
O(i,j) = \sum_m \sum_n I(i+m, j+n) \times K(m, n)
]
Where:

  • ( I ) = input image

  • ( K ) = kernel (filter)

  • ( O ) = output feature map

In PyTorch:

torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)

2️⃣ Activation Layer (ReLU)

Applies non-linearity to make the network capable of learning complex patterns.

[
f(x) = \max(0, x)
]

3️⃣ Pooling Layer

Reduces spatial size (height × width) while keeping the important features.

torch.nn.MaxPool2d(kernel_size=2, stride=2)

4️⃣ Fully Connected Layer

Connects flattened features to output classes for classification.

5️⃣ Softmax

Converts final scores to probabilities.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


🧱 7.3. Visual Intuition

🧩 CNN Layers Flow:

Image → Convolution → ReLU → Pooling → Flatten → Dense → Softmax

Layer Function Output Example
Conv2D Extracts features 32×32×16
MaxPool Downsamples 16×16×16
Conv2D Deeper features 16×16×32
Flatten + Dense Classification 10 classes (e.g., digits)

🧪 7.4. Implementing a CNN from Scratch in PyTorch

Let’s build a CNN to classify CIFAR-10 images — a dataset of 60,000 color images across 10 classes (cat, dog, airplane, etc.).

Step 1️⃣: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

Step 2️⃣: Load Dataset

# Transform: Normalize and convert to tensor
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

Step 3️⃣: Define CNN Architecture

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleCNN()
print(model)

✅ Output:

SimpleCNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), padding=(1, 1))
  (fc1): Linear(in_features=2048, out_features=128)
  (fc2): Linear(in_features=128, out_features=10)
)

Step 4️⃣: Define Loss and Optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Step 5️⃣: Train the Model

for epoch in range(5):  # loop over dataset multiple times
    running_loss = 0.0
    for images, labels in trainloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")

Step 6️⃣: Evaluate Accuracy

correct, total = 0, 0
with torch.no_grad():
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

✅ Output:

Test Accuracy: ~70–75%

🧠 7.5. Visualizing Feature Maps

To understand what CNNs “see”, visualize intermediate activations.

import matplotlib.pyplot as plt

def visualize_features(model, image):
    layer = model.conv1
    with torch.no_grad():
        features = layer(image.unsqueeze(0))
    fig, axes = plt.subplots(1, 6, figsize=(12, 4))
    for i in range(6):
        axes[i].imshow(features[0][i].detach().numpy(), cmap='gray')
        axes[i].axis('off')
    plt.show()

🖼️ This helps explain how early CNN layers detect edges and deeper layers detect patterns like eyes, fur, or wheels.


🔁 7.6. Transfer Learning with Pretrained CNNs

Instead of training from scratch, you can use pretrained CNNs like ResNet, VGG, or MobileNet.

from torchvision import models

model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False  # Freeze base layers

# Replace final layer for custom classification
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)

Advantages:

  • Faster training

  • Better accuracy

  • Works even with smaller datasets


📊 7.7. Common CNN Architectures

Architecture Year Key Innovation
LeNet-5 1998 First CNN for handwritten digits
AlexNet 2012 Deep CNN, won ImageNet
VGGNet 2014 Uniform 3×3 filters
ResNet 2015 Skip connections to fight vanishing gradients
EfficientNet 2019 Parameter-efficient scaling

Each evolution made CNNs deeper, faster, and more accurate.


🌍 7.8. Real-World Use Cases of CNNs

Industry Application Description
🏥 Healthcare Tumor Detection Identify cancerous cells in MRI scans
🚗 Automotive Self-Driving Cars Detect pedestrians and traffic signs
📱 Mobile Face Recognition Unlock devices using CNN-based models
🛒 E-commerce Visual Search Suggest similar products from images
🎥 Media Video Analytics Detect scenes, objects, or logos in videos

🧾 7.9. Key Takeaways

  • CNNs learn spatial hierarchies automatically — from pixels to patterns.

  • Layers like convolution, pooling, and ReLU are the backbone of vision models.

  • Transfer learning saves time and improves accuracy on limited data.

  • Tools like TorchVision make image preprocessing and model loading easy.

  • Real-world applications range from healthcare to autonomous driving..


🔁 Section 8: Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data


🧠 8.1 What Are RNNs?

A Recurrent Neural Network (RNN) is a special type of neural network designed to process sequences of data, where each input depends on previous ones.

Unlike feedforward networks that treat all inputs independently, RNNs retain a hidden state that captures information from previous steps.

Intuitive Example:

Think of predicting the next word in a sentence:

“I am going to the ___.”

The prediction “market” depends on earlier words — that’s sequence awareness, which RNNs excel at.


🧮 8.2 How RNNs Work (Mathematical Intuition)

For a sequence ( x_1, x_2, ..., x_t ):

At each time step:
[
h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b_h)
]
[
y_t = W_y \cdot h_t + b_y
]

Where:

  • ( h_t ): hidden state (memory)

  • ( x_t ): input at time step ( t )

  • ( y_t ): output

  • ( f ): activation (usually tanh or ReLU)

So the hidden state is recursively updated, carrying information from past inputs — this gives RNNs their “memory”.


🧩 8.3 RNNs vs Feedforward Networks

Feature Feedforward NN RNN
Input type Independent samples Sequential data
Memory No memory Retains hidden states
Weight sharing Different weights per input Same weights across time
Use case Images, tabular data Text, audio, time series

🧱 8.4 Implementing a Simple RNN from Scratch

Let’s build a small RNN to predict the next number in a sequence.

Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

Step 2: Create Data

We’ll use a simple sequence ( [0, 1, 2, 3, 4, 5, ...] ) and try to predict the next number.

seq = torch.arange(0, 10, dtype=torch.float32)
X = seq[:-1].unsqueeze(1)  # inputs
Y = seq[1:].unsqueeze(1)   # targets

Step 3: Define RNN Model

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden):
        out, hidden = self.rnn(x, hidden)
        out = self.fc(out)
        return out, hidden

Step 4: Initialize

model = SimpleRNN(input_size=1, hidden_size=10, output_size=1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

Step 5: Train Model

X = X.unsqueeze(0)  # batch dimension
Y = Y.unsqueeze(0)

hidden = None
for epoch in range(300):
    output, hidden = model(X, hidden)
    loss = criterion(output, Y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    hidden = hidden.detach()  # prevent exploding gradients

    if (epoch + 1) % 50 == 0:
        print(f'Epoch [{epoch+1}/300], Loss: {loss.item():.6f}')

Output:

Epoch [300/300], Loss: 0.000012

✅ The model learns to predict the next number in the sequence!


🧮 8.5 Using nn.RNN Directly (Built-in Simplicity)

PyTorch makes it simple to create RNNs with its built-in layer:

rnn = nn.RNN(input_size=5, hidden_size=10, num_layers=2, batch_first=True)

Parameters:

  • input_size: features per timestep

  • hidden_size: hidden layer dimension

  • num_layers: stack multiple RNN layers

  • batch_first=True: input shape as (batch, seq, features)


🧩 8.6 Limitations of Vanilla RNNs

Despite their simplicity, RNNs struggle with long sequences due to:

  • Vanishing gradients (earlier information fades)

  • Exploding gradients (gradients become too large)

  • Limited long-term memory

To overcome this, we use LSTM and GRU architectures.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


⚙️ 8.7 Long Short-Term Memory (LSTM)

LSTMs (Long Short-Term Memory networks) were designed to remember information over longer time intervals.

They introduce gates that control information flow:

  • Forget Gate: Decides what to discard

  • Input Gate: Decides what to update

  • Output Gate: Decides what to output

Equations:
[
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
]
[
i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)
]
[
\tilde{C}t = \tanh(W_c [h{t-1}, x_t] + b_c)
]
[
C_t = f_t * C_{t-1} + i_t * \tilde{C}_t
]
[
h_t = o_t * \tanh(C_t)
]


💻 8.8 Implementing an LSTM in PyTorch

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out)
        return out

Usage:

model = LSTMModel(1, 32, 1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

The LSTM retains memory over longer sequences and performs better than vanilla RNNs in most real-world problems.


🔁 8.9 GRU: Gated Recurrent Unit

GRUs are a simplified version of LSTMs that merge the forget and input gates.
They’re faster to train but perform comparably.

gru = nn.GRU(input_size=1, hidden_size=32, batch_first=True)

GRUs are widely used for speech, time-series forecasting, and chatbots where long-term dependencies are moderate.


🌍 8.10 Real-World Use Cases of RNNs

Domain Application Model Type
📊 Finance Stock price prediction LSTM
🗣️ NLP Text generation, translation GRU/LSTM
🎶 Audio Speech recognition LSTM
🕒 IoT Sensor data forecasting GRU
💬 Chatbots Context understanding LSTM/Transformer hybrid

💬 8.11 Example: Character-Level Text Generation

Let’s create a mini text generator that learns sequences of characters.

import torch.nn.functional as F

chars = list("hello")
char2idx = {ch: i for i, ch in enumerate(chars)}
idx2char = {i: ch for ch, i in char2idx.items()}

seq = torch.tensor([[char2idx['h'], char2idx['e'], char2idx['l'], char2idx['l']]])
target = torch.tensor([[char2idx['e'], char2idx['l'], char2idx['l'], char2idx['o']]])

model = nn.Sequential(
    nn.Embedding(len(chars), 8),
    nn.RNN(8, 16, batch_first=True),
    nn.Linear(16, len(chars))
)

The model can then learn to predict “e”, “l”, “l”, “o” given “h”, “e”, “l”, “l”.


🧮 8.12 Visualizing Hidden States

You can visualize how hidden states evolve:

import matplotlib.pyplot as plt

hidden_states = []
hidden = None
for i in range(len(X[0])):
    out, hidden = model(X[:, i:i+1], hidden)
    hidden_states.append(hidden[0].detach().numpy().flatten())

plt.plot(hidden_states)
plt.title("Hidden State Evolution Over Time")
plt.show()

This shows how memory updates through the sequence.


🧠 8.13 Tips for Training RNNs

  • Normalize or scale sequential data

  • Use gradient clipping to prevent exploding gradients

  • Initialize hidden states properly (hidden = torch.zeros(...))

  • Use LSTMs/GRUs for complex tasks

  • Experiment with sequence lengths (shorter sequences → faster training)


✅ 8.14 Summary of Section 8

You now understand:

  • What makes RNNs special for sequential data

  • How LSTMs and GRUs solve memory challenges

  • How to build, train, and visualize RNNs in PyTorch

  • Real-world use cases in text, audio, and forecasting


🚀 Up Next:

We’ll continue with Section 9: Natural Language Processing (NLP) with PyTorch, where we’ll apply RNNs and Transformers for:

  • Text preprocessing

  • Sentiment analysis

  • Word embeddings (Word2Vec, GloVe)

  • Sequence-to-sequence translation models


🧩 Section 9: Model Optimization and Deployment in PyTorch

Building and training a model is only part of the deep learning journey. Once you have a working model, the next big steps are:

  • Optimizing it for speed, memory, and accuracy.

  • Deploying it in a scalable way to real-world applications — from cloud servers to mobile devices.

PyTorch provides an excellent ecosystem for both these stages — with tools like TorchScript, ONNX, Quantization, and TorchServe.


⚙️ 9.1. Model Optimization: The Key to Efficiency

Model optimization in PyTorch focuses on reducing computational cost and improving inference speed without sacrificing accuracy.

🚀 Common Optimization Techniques

Optimization Type Description Example
Quantization Converts model parameters from float32 to int8 to reduce size and speed up inference Mobile & Edge AI
Pruning Removes weights or neurons that contribute little to output Compress large models
Knowledge Distillation Trains a smaller model (student) using the outputs of a large model (teacher) Efficient model serving
Mixed Precision Training Uses float16 and float32 together for faster GPU training NVIDIA Ampere GPUs

🧮 9.2. Quantization Example

Quantization helps deploy models to mobile or embedded devices by reducing memory footprint.

import torch
from torchvision import models

# Load a pretrained model
model = models.resnet18(pretrained=True)
model.eval()

# Static quantization preparation
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)

print("Quantized model size:", sum(p.numel() for p in model.parameters()))

Result: You’ll see up to 4× reduction in model size and improved inference time.


🔪 9.3. Model Pruning Example

Model pruning eliminates unnecessary weights.

import torch.nn.utils.prune as prune

# Prune 30% of connections in linear layer
for name, module in model.named_modules():
    if isinstance(module, torch.nn.Linear):
        prune.l1_unstructured(module, name="weight", amount=0.3)

Outcome: Model becomes smaller and faster with minimal accuracy drop.


⚗️ 9.4. Knowledge Distillation (Student–Teacher Learning)

Knowledge Distillation allows training a compact student model by learning from a larger, pre-trained teacher model.

teacher_model = models.resnet50(pretrained=True)
student_model = models.resnet18()

criterion = torch.nn.KLDivLoss(reduction='batchmean')
optimizer = torch.optim.Adam(student_model.parameters(), lr=0.001)

for data, target in dataloader:
    with torch.no_grad():
        teacher_output = torch.nn.functional.log_softmax(teacher_model(data) / 5, dim=1)
    student_output = torch.nn.functional.log_softmax(student_model(data) / 5, dim=1)
    loss = criterion(student_output, teacher_output)
    loss.backward()
    optimizer.step()

Real-world Example:
Companies like Google and Meta use distillation to deploy small transformer models on mobile devices (e.g., BERT → TinyBERT).

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


🚀 9.5. Deployment: From Research to Production

Once optimized, models need to be served efficiently in production environments — APIs, web services, or mobile apps.

PyTorch offers multiple deployment options:

  • TorchScript (Convert model to static graph)

  • ONNX (Export model for cross-framework compatibility)

  • TorchServe (Production-grade model server)

  • PyTorch Mobile (For mobile/edge devices)


🧱 9.5.1. TorchScript

TorchScript converts dynamic PyTorch models into a serialized form that runs without Python — ideal for production environments.

# Convert model to TorchScript
traced_model = torch.jit.trace(model, torch.randn(1, 3, 224, 224))
torch.jit.save(traced_model, "resnet18_traced.pt")

# Load and run TorchScript model
loaded = torch.jit.load("resnet18_traced.pt")
output = loaded(torch.randn(1, 3, 224, 224))

Benefits:

  • Faster inference

  • Portable to C++ runtime

  • No dependency on Python at inference time


🔗 9.5.2. ONNX (Open Neural Network Exchange)

ONNX enables exporting models to other frameworks like TensorFlow, Caffe2, or OpenVINO.

dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "resnet18.onnx", export_params=True)

Benefits:

  • Framework interoperability

  • Deploy on non-PyTorch systems

  • Optimized inference with ONNX Runtime


🧰 9.5.3. TorchServe (Model Serving)

TorchServe is an official tool by AWS & Facebook for serving PyTorch models in production.

Steps:

  1. Save model file (.mar)

  2. Launch TorchServe server

  3. Expose REST API for inference

torch-model-archiver --model-name resnet18 \
--version 1.0 --serialized-file resnet18.pt \
--handler image_classifier --export-path model_store

torchserve --start --ncs --model-store model_store --models resnet=resnet18.mar

Benefits:

  • Handles multiple models

  • Supports batching and metrics

  • Scalable for enterprise deployments


📱 9.5.4. PyTorch Mobile

For mobile or edge deployment, convert model using TorchScript and integrate with:

  • Android (Java API)

  • iOS (Swift API)

# Convert to TorchScript
scripted_model = torch.jit.script(model)
scripted_model.save("mobile_model.pt")

Use Case Example:
AI-based camera filters, speech recognition, and on-device pose estimation models.


🌩️ 9.6. Deploying Models to the Cloud

Cloud platforms make large-scale deployment easier:

  • AWS Sagemaker → Native PyTorch support

  • Google Cloud AI Platform → Scalable REST APIs

  • Azure ML → Preconfigured PyTorch containers

Each allows automatic scaling, version control, and CI/CD pipelines.


📊 9.7. Monitoring and Logging

Post-deployment, it’s crucial to track model performance using metrics such as:

  • Latency

  • Accuracy drift

  • Resource utilization

Tools for monitoring:

  • Weights & Biases

  • MLflow

  • Prometheus + Grafana


🧠 9.8. Real-World Example: Image Classification API

Here’s how a simple Flask-based API might use a deployed PyTorch model:

from flask import Flask, request, jsonify
import torch
from torchvision import models, transforms
from PIL import Image

app = Flask(__name__)
model = torch.jit.load("resnet18_traced.pt")
model.eval()

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
])

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['file']
    image = Image.open(file)
    img_tensor = transform(image).unsqueeze(0)
    with torch.no_grad():
        preds = model(img_tensor)
    predicted = torch.argmax(preds, 1).item()
    return jsonify({"class": int(predicted)})

if __name__ == '__main__':
    app.run(debug=True)

Use Case:
Deploying your PyTorch model as a REST API for an image recognition service.


🏁 9.9. Summary

Concept Purpose Tools/Methods
Quantization Reduce model size torch.quantization
Pruning Remove redundant weights torch.nn.utils.prune
Distillation Compress model via teacher-student learning KLDivLoss
Deployment Run in production TorchScript, ONNX, TorchServe
Monitoring Track and analyze performance MLflow, W&B

🧠 Final Insight

Model optimization and deployment are where research becomes reality.
With PyTorch, you can go from a prototype on your laptop to a production-grade AI model running in the cloud or on a mobile device — all using the same ecosystem.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

Comments