PyTorch -II
content
5. Building Neural Networks with torch.nn
6. Optimization, Loss Functions, and Regularization
7. Convolutional Neural Networks (CNNs)
8. Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data
9. Model Optimization and Deployment in PyTorch
🧱 Section 5: Building Neural Networks with torch.nn
Now that you understand tensors, autograd, and computational graphs, it’s time to bring everything together and build neural networks efficiently using PyTorch’s torch.nn module.
This section will show you how to define models, use layers and activation functions, and perform forward and backward propagation automatically.
🧠 5.1 Introduction to torch.nn
torch.nn is a high-level abstraction built on top of tensors and autograd.
It helps you define neural network layers, activation functions, and loss functions easily.
You no longer need to manually track weights, biases, or gradient updates — PyTorch’s nn.Module does it all for you.
⚙️ 5.2 Anatomy of a Neural Network
A neural network consists of:
-
Input Layer: Accepts data (features)
-
Hidden Layers: Perform transformations using learned weights and activations
-
Output Layer: Produces predictions
Each layer performs a linear transformation followed by a non-linear activation.
Mathematically:
[
y = f(Wx + b)
]
where:
-
( W ): weights
-
( b ): bias
-
( f ): activation function (e.g., ReLU, Sigmoid)
🧩 5.3 The nn.Module Class
In PyTorch, every model inherits from the base class torch.nn.Module.
Structure of a custom model:
import torch
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
# Define layers
self.linear1 = nn.Linear(3, 4)
self.linear2 = nn.Linear(4, 1)
def forward(self, x):
# Define forward pass
x = torch.relu(self.linear1(x))
x = self.linear2(x)
return x
When you create an instance of MyModel, all parameters (weights and biases) are automatically registered and tracked.
🔢 5.4 Understanding nn.Linear
nn.Linear(in_features, out_features)
Performs a linear transformation:
[
y = xW^T + b
]
Example:
layer = nn.Linear(3, 2) # input size 3, output size 2
x = torch.tensor([[1.0, 2.0, 3.0]])
output = layer(x)
print(output)
Output (random weights):
tensor([[0.4231, -0.5713]], grad_fn=<AddmmBackward>)
Each layer internally stores:
-
weight: shape(out_features, in_features) -
bias: shape(out_features,)
⚡ 5.5 Activation Functions
Activation functions introduce non-linearity — allowing neural networks to model complex patterns.
Common Activation Functions in PyTorch:
| Activation | Function | PyTorch Equivalent |
|---|---|---|
| ReLU | ( f(x) = \max(0, x) ) | nn.ReLU() |
| Sigmoid | ( f(x) = 1 / (1 + e^{-x}) ) | nn.Sigmoid() |
| Tanh | ( f(x) = \tanh(x) ) | nn.Tanh() |
| LeakyReLU | ( f(x) = \max(0.01x, x) ) | nn.LeakyReLU() |
| Softmax | Converts logits to probabilities | nn.Softmax(dim=1) |
Example:
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu = nn.ReLU()
print(relu(x))
Output:
tensor([0., 0., 0., 1., 2.])
🧮 5.6 Building a Simple Feedforward Neural Network
Let’s create a neural network for a simple regression task.
Step 1: Define the Model
import torch
import torch.nn as nn
import torch.optim as optim
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
Step 2: Initialize Model, Loss, and Optimizer
model = NeuralNet(1, 8, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
Step 3: Training Loop
# Sample data: y = 2x + 1
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])
for epoch in range(1000):
outputs = model(X)
loss = criterion(outputs, Y)
optimizer.zero_grad() # Clear old gradients
loss.backward() # Compute new gradients
optimizer.step() # Update weights
if (epoch+1) % 100 == 0:
print(f'Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}')
Output:
Epoch [1000/1000], Loss: 0.0001
✅ The model successfully learns the linear relationship ( y = 2x + 1 ).
🔍 5.7 Exploring Model Parameters
You can inspect and print model parameters easily:
for name, param in model.named_parameters():
print(name, param.data)
Example output:
fc1.weight tensor([[0.8451]])
fc1.bias tensor([0.9945])
fc2.weight tensor([[1.9912]])
fc2.bias tensor([1.0023])
⚙️ 5.8 Saving and Loading Models
Training can be time-consuming — PyTorch allows you to save and reload models effortlessly.
Save model:
torch.save(model.state_dict(), 'model.pth')
Load model:
model = NeuralNet(1, 8, 1)
model.load_state_dict(torch.load('model.pth'))
model.eval()
📉 5.9 Visualizing Training Progress (Optional)
You can visualize the loss curve to monitor convergence.
import matplotlib.pyplot as plt
losses = []
for epoch in range(300):
outputs = model(X)
loss = criterion(outputs, Y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss Curve')
plt.show()
📊 The loss decreases steadily, confirming that the model is learning effectively.
🧠 5.10 Using Predefined Layers and Sequential API
For simpler models, PyTorch offers nn.Sequential — a compact way to stack layers.
model = nn.Sequential(
nn.Linear(1, 8),
nn.ReLU(),
nn.Linear(8, 1)
)
This is functionally equivalent to defining a custom nn.Module, but more concise.
🧩 5.11 Adding Batch Normalization and Dropout
To improve performance and reduce overfitting, include:
-
Batch Normalization: Normalizes layer inputs.
-
Dropout: Randomly disables neurons during training.
Example:
model = nn.Sequential(
nn.Linear(1, 16),
nn.BatchNorm1d(16),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(16, 1)
)
🚀 5.12 Summary of Section 5
You’ve learned how to:
-
Define neural networks with
torch.nn.Module -
Use layers, activations, and optimizers
-
Train models using automatic differentiation
-
Save, load, and visualize your models
This section establishes the core workflow of deep learning in PyTorch.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
⚙️ Section 6: Training Deep Neural Networks — Optimization, Loss Functions, and Regularization
🎯 6.1 What Happens During Training?
Training a neural network involves adjusting weights so that the model’s predictions get closer to the actual values.
At a high level, each training iteration consists of these steps:
-
Forward Pass: Feed inputs through the model to generate predictions.
-
Loss Calculation: Measure how far predictions are from actual labels.
-
Backward Pass: Use backpropagation to compute gradients.
-
Optimization Step: Update weights using an optimizer (like SGD or Adam).
This process repeats for many epochs until the loss converges.
🧮 6.2 The Mathematics Behind Learning
Let’s define:
-
( x ): input
-
( y ): true label
-
( \hat{y} ): predicted output
-
( L(y, \hat{y}) ): loss function
Each iteration:
[
w_{new} = w_{old} - \eta \cdot \frac{\partial L}{\partial w}
]
Where:
-
( \eta ): learning rate
-
( \frac{\partial L}{\partial w} ): gradient of the loss with respect to weight
This is gradient descent — the core mechanism of learning in neural networks.
📉 6.3 Understanding Loss Functions
A loss function quantifies how far the model’s predictions are from actual values.
Common Loss Functions in PyTorch:
| Problem Type | Loss Function | PyTorch Class | Description |
|---|---|---|---|
| Regression | Mean Squared Error | nn.MSELoss() |
Penalizes squared differences |
| Regression | Mean Absolute Error | nn.L1Loss() |
Penalizes absolute differences |
| Binary Classification | Binary Cross Entropy | nn.BCELoss() |
Measures binary prediction error |
| Multi-Class Classification | Cross Entropy | nn.CrossEntropyLoss() |
For multi-class outputs |
| Probabilistic Models | KL Divergence | nn.KLDivLoss() |
Compares two distributions |
Example — Using MSE Loss:
criterion = nn.MSELoss()
y_pred = torch.tensor([2.5, 0.8, 1.4])
y_true = torch.tensor([3.0, 1.0, 1.3])
loss = criterion(y_pred, y_true)
print(loss)
Output:
tensor(0.0500)
⚡ 6.4 Introduction to Optimizers
Optimizers decide how model parameters are updated based on computed gradients.
1️⃣ Stochastic Gradient Descent (SGD)
Updates weights with a constant learning rate:
[
w = w - \eta \cdot \frac{\partial L}{\partial w}
]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
2️⃣ Momentum
Adds inertia to updates to escape local minima:
[
v = \beta v - \eta \nabla_w L
]
[
w = w + v
]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
3️⃣ Adam (Adaptive Moment Estimation)
The most popular optimizer — adaptive learning rates for each parameter.
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Adam combines the benefits of both Momentum and RMSProp, providing faster convergence and better stability.
🧠 6.5 Complete Training Loop Example
Let’s implement a training loop that combines forward pass, backward pass, and optimization.
import torch
import torch.nn as nn
import torch.optim as optim
# Simple model
model = nn.Sequential(
nn.Linear(1, 10),
nn.ReLU(),
nn.Linear(10, 1)
)
# Loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Data
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
Y = torch.tensor([[3.0], [5.0], [7.0], [9.0]])
# Training
for epoch in range(500):
# Forward pass
y_pred = model(X)
loss = criterion(y_pred, Y)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 100 == 0:
print(f'Epoch {epoch+1}/500, Loss = {loss.item():.6f}')
Output:
Epoch 500/500, Loss = 0.000045
✅ The model has learned the underlying relationship successfully.
🔍 6.6 Monitoring Model Performance
You can track loss over epochs to visualize learning progress.
import matplotlib.pyplot as plt
losses = []
for epoch in range(200):
y_pred = model(X)
loss = criterion(y_pred, Y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss over Time')
plt.show()
📊 A steadily decreasing loss indicates proper learning.
🧩 6.7 Learning Rate — The Most Critical Hyperparameter
The learning rate (η) determines how fast your model learns.
| Learning Rate | Behavior |
|---|---|
| Too high | Model diverges or oscillates |
| Too low | Training becomes painfully slow |
| Just right | Smooth, steady convergence |
You can experiment or use learning rate schedulers to adjust automatically:
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
for epoch in range(200):
...
scheduler.step()
🧮 6.8 Regularization: Avoiding Overfitting
Overfitting occurs when your model memorizes training data instead of learning general patterns.
Regularization helps control model complexity.
🧱 1. L2 Regularization (Weight Decay)
Adds penalty for large weights.
optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.001)
Mathematically:
[
L_{total} = L + \lambda \sum w^2
]
☁️ 2. Dropout
Randomly “drops” neurons during training to improve generalization.
model = nn.Sequential(
nn.Linear(10, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 1)
)
⚖️ 3. Early Stopping
Stops training when validation loss stops improving — prevents overfitting.
Conceptually:
if val_loss > best_val_loss:
patience_counter += 1
if patience_counter > patience:
print("Early stopping triggered")
break
else:
best_val_loss = val_loss
patience_counter = 0
🧠 6.9 Advanced Optimization Techniques
| Technique | Description |
|---|---|
| Batch Normalization | Normalizes activations to stabilize training |
| Gradient Clipping | Prevents exploding gradients |
| Learning Rate Warm-up | Gradually increases LR at start |
| Adaptive Gradient Clipping (AGC) | Scales gradients relative to weights |
| Cosine Annealing Scheduler | Smooth cyclic learning rate decay |
Example — Gradient Clipping:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
📦 6.10 Putting It All Together — Modular Training Function
Here’s a clean reusable function that trains any model:
def train_model(model, criterion, optimizer, X, Y, epochs=300):
losses = []
for epoch in range(epochs):
model.train()
y_pred = model(X)
loss = criterion(y_pred, Y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
return losses
Usage:
losses = train_model(model, criterion, optimizer, X, Y)
plt.plot(losses)
✅ 6.11 Summary of Section 6
By now, you understand:
-
The mathematics behind training (gradient descent)
-
The purpose of loss functions
-
How to choose optimizers (SGD, Adam, RMSProp)
-
Methods to regularize models and prevent overfitting
-
How to visualize and fine-tune training performance
This section bridges the gap between building a model and mastering the training process.
🧠 Section 7: Convolutional Neural Networks (CNNs) with PyTorch
Convolutional Neural Networks (CNNs) are the foundation of modern computer vision. They are used in applications like image recognition, object detection, facial recognition, and even medical imaging.
In this section, you’ll learn:
-
The intuition and mathematics behind CNNs
-
Core building blocks (convolution, pooling, activation, fully connected layers)
-
Implementing a CNN using PyTorch
-
Training a CNN on real image data (CIFAR-10 or MNIST)
-
Using pre-trained CNNs like ResNet for transfer learning
🧩 7.1. What is a CNN?
A Convolutional Neural Network (CNN) is a special type of neural network designed to process data with grid-like topology, such as images (2D grids of pixels).
Instead of connecting every neuron to every pixel (as in dense networks), CNNs use convolutional filters that slide over the image to detect local patterns like:
-
Edges
-
Corners
-
Textures
-
Complex features (like eyes, faces, or objects)
🧠 Analogy:
Think of a CNN filter as a “pattern detector” that scans an image — much like how our brain identifies shapes and edges.
🔢 7.2. Key Components of a CNN
1️⃣ Convolution Layer
Performs a mathematical operation that multiplies and sums pixel values with a small filter (kernel).
Mathematically:
[
O(i,j) = \sum_m \sum_n I(i+m, j+n) \times K(m, n)
]
Where:
-
( I ) = input image
-
( K ) = kernel (filter)
-
( O ) = output feature map
In PyTorch:
torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1)
2️⃣ Activation Layer (ReLU)
Applies non-linearity to make the network capable of learning complex patterns.
[
f(x) = \max(0, x)
]
3️⃣ Pooling Layer
Reduces spatial size (height × width) while keeping the important features.
torch.nn.MaxPool2d(kernel_size=2, stride=2)
4️⃣ Fully Connected Layer
Connects flattened features to output classes for classification.
5️⃣ Softmax
Converts final scores to probabilities.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
🧱 7.3. Visual Intuition
🧩 CNN Layers Flow:
Image → Convolution → ReLU → Pooling → Flatten → Dense → Softmax
| Layer | Function | Output Example |
|---|---|---|
| Conv2D | Extracts features | 32×32×16 |
| MaxPool | Downsamples | 16×16×16 |
| Conv2D | Deeper features | 16×16×32 |
| Flatten + Dense | Classification | 10 classes (e.g., digits) |
🧪 7.4. Implementing a CNN from Scratch in PyTorch
Let’s build a CNN to classify CIFAR-10 images — a dataset of 60,000 color images across 10 classes (cat, dog, airplane, etc.).
Step 1️⃣: Import Libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
Step 2️⃣: Load Dataset
# Transform: Normalize and convert to tensor
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)
classes = ('plane', 'car', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')
Step 3️⃣: Define CNN Architecture
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.fc1 = nn.Linear(32 * 8 * 8, 128)
self.fc2 = nn.Linear(128, 10)
self.relu = nn.ReLU()
def forward(self, x):
x = self.pool(self.relu(self.conv1(x)))
x = self.pool(self.relu(self.conv2(x)))
x = x.view(-1, 32 * 8 * 8)
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
model = SimpleCNN()
print(model)
✅ Output:
SimpleCNN(
(conv1): Conv2d(3, 16, kernel_size=(3, 3), padding=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2)
(conv2): Conv2d(16, 32, kernel_size=(3, 3), padding=(1, 1))
(fc1): Linear(in_features=2048, out_features=128)
(fc2): Linear(in_features=128, out_features=10)
)
Step 4️⃣: Define Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
Step 5️⃣: Train the Model
for epoch in range(5): # loop over dataset multiple times
running_loss = 0.0
for images, labels in trainloader:
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")
Step 6️⃣: Evaluate Accuracy
correct, total = 0, 0
with torch.no_grad():
for images, labels in testloader:
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f"Test Accuracy: {100 * correct / total:.2f}%")
✅ Output:
Test Accuracy: ~70–75%
🧠 7.5. Visualizing Feature Maps
To understand what CNNs “see”, visualize intermediate activations.
import matplotlib.pyplot as plt
def visualize_features(model, image):
layer = model.conv1
with torch.no_grad():
features = layer(image.unsqueeze(0))
fig, axes = plt.subplots(1, 6, figsize=(12, 4))
for i in range(6):
axes[i].imshow(features[0][i].detach().numpy(), cmap='gray')
axes[i].axis('off')
plt.show()
🖼️ This helps explain how early CNN layers detect edges and deeper layers detect patterns like eyes, fur, or wheels.
🔁 7.6. Transfer Learning with Pretrained CNNs
Instead of training from scratch, you can use pretrained CNNs like ResNet, VGG, or MobileNet.
from torchvision import models
model = models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False # Freeze base layers
# Replace final layer for custom classification
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)
✅ Advantages:
-
Faster training
-
Better accuracy
-
Works even with smaller datasets
📊 7.7. Common CNN Architectures
| Architecture | Year | Key Innovation |
|---|---|---|
| LeNet-5 | 1998 | First CNN for handwritten digits |
| AlexNet | 2012 | Deep CNN, won ImageNet |
| VGGNet | 2014 | Uniform 3×3 filters |
| ResNet | 2015 | Skip connections to fight vanishing gradients |
| EfficientNet | 2019 | Parameter-efficient scaling |
Each evolution made CNNs deeper, faster, and more accurate.
🌍 7.8. Real-World Use Cases of CNNs
| Industry | Application | Description |
|---|---|---|
| 🏥 Healthcare | Tumor Detection | Identify cancerous cells in MRI scans |
| 🚗 Automotive | Self-Driving Cars | Detect pedestrians and traffic signs |
| 📱 Mobile | Face Recognition | Unlock devices using CNN-based models |
| 🛒 E-commerce | Visual Search | Suggest similar products from images |
| 🎥 Media | Video Analytics | Detect scenes, objects, or logos in videos |
🧾 7.9. Key Takeaways
-
CNNs learn spatial hierarchies automatically — from pixels to patterns.
-
Layers like convolution, pooling, and ReLU are the backbone of vision models.
-
Transfer learning saves time and improves accuracy on limited data.
-
Tools like TorchVision make image preprocessing and model loading easy.
-
Real-world applications range from healthcare to autonomous driving..
🔁 Section 8: Recurrent Neural Networks (RNNs) — Deep Learning for Sequential Data
🧠 8.1 What Are RNNs?
A Recurrent Neural Network (RNN) is a special type of neural network designed to process sequences of data, where each input depends on previous ones.
Unlike feedforward networks that treat all inputs independently, RNNs retain a hidden state that captures information from previous steps.
Intuitive Example:
Think of predicting the next word in a sentence:
“I am going to the ___.”
The prediction “market” depends on earlier words — that’s sequence awareness, which RNNs excel at.
🧮 8.2 How RNNs Work (Mathematical Intuition)
For a sequence ( x_1, x_2, ..., x_t ):
At each time step:
[
h_t = f(W_h \cdot h_{t-1} + W_x \cdot x_t + b_h)
]
[
y_t = W_y \cdot h_t + b_y
]
Where:
-
( h_t ): hidden state (memory)
-
( x_t ): input at time step ( t )
-
( y_t ): output
-
( f ): activation (usually
tanhorReLU)
So the hidden state is recursively updated, carrying information from past inputs — this gives RNNs their “memory”.
🧩 8.3 RNNs vs Feedforward Networks
| Feature | Feedforward NN | RNN |
|---|---|---|
| Input type | Independent samples | Sequential data |
| Memory | No memory | Retains hidden states |
| Weight sharing | Different weights per input | Same weights across time |
| Use case | Images, tabular data | Text, audio, time series |
🧱 8.4 Implementing a Simple RNN from Scratch
Let’s build a small RNN to predict the next number in a sequence.
Step 1: Import Libraries
import torch
import torch.nn as nn
import torch.optim as optim
Step 2: Create Data
We’ll use a simple sequence ( [0, 1, 2, 3, 4, 5, ...] ) and try to predict the next number.
seq = torch.arange(0, 10, dtype=torch.float32)
X = seq[:-1].unsqueeze(1) # inputs
Y = seq[1:].unsqueeze(1) # targets
Step 3: Define RNN Model
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x, hidden):
out, hidden = self.rnn(x, hidden)
out = self.fc(out)
return out, hidden
Step 4: Initialize
model = SimpleRNN(input_size=1, hidden_size=10, output_size=1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
Step 5: Train Model
X = X.unsqueeze(0) # batch dimension
Y = Y.unsqueeze(0)
hidden = None
for epoch in range(300):
output, hidden = model(X, hidden)
loss = criterion(output, Y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
hidden = hidden.detach() # prevent exploding gradients
if (epoch + 1) % 50 == 0:
print(f'Epoch [{epoch+1}/300], Loss: {loss.item():.6f}')
Output:
Epoch [300/300], Loss: 0.000012
✅ The model learns to predict the next number in the sequence!
🧮 8.5 Using nn.RNN Directly (Built-in Simplicity)
PyTorch makes it simple to create RNNs with its built-in layer:
rnn = nn.RNN(input_size=5, hidden_size=10, num_layers=2, batch_first=True)
Parameters:
-
input_size: features per timestep -
hidden_size: hidden layer dimension -
num_layers: stack multiple RNN layers -
batch_first=True: input shape as(batch, seq, features)
🧩 8.6 Limitations of Vanilla RNNs
Despite their simplicity, RNNs struggle with long sequences due to:
-
Vanishing gradients (earlier information fades)
-
Exploding gradients (gradients become too large)
-
Limited long-term memory
To overcome this, we use LSTM and GRU architectures.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
⚙️ 8.7 Long Short-Term Memory (LSTM)
LSTMs (Long Short-Term Memory networks) were designed to remember information over longer time intervals.
They introduce gates that control information flow:
-
Forget Gate: Decides what to discard
-
Input Gate: Decides what to update
-
Output Gate: Decides what to output
Equations:
[
f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)
]
[
i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)
]
[
\tilde{C}t = \tanh(W_c [h{t-1}, x_t] + b_c)
]
[
C_t = f_t * C_{t-1} + i_t * \tilde{C}_t
]
[
h_t = o_t * \tanh(C_t)
]
💻 8.8 Implementing an LSTM in PyTorch
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(LSTMModel, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, _ = self.lstm(x)
out = self.fc(out)
return out
Usage:
model = LSTMModel(1, 32, 1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
The LSTM retains memory over longer sequences and performs better than vanilla RNNs in most real-world problems.
🔁 8.9 GRU: Gated Recurrent Unit
GRUs are a simplified version of LSTMs that merge the forget and input gates.
They’re faster to train but perform comparably.
gru = nn.GRU(input_size=1, hidden_size=32, batch_first=True)
GRUs are widely used for speech, time-series forecasting, and chatbots where long-term dependencies are moderate.
🌍 8.10 Real-World Use Cases of RNNs
| Domain | Application | Model Type |
|---|---|---|
| 📊 Finance | Stock price prediction | LSTM |
| 🗣️ NLP | Text generation, translation | GRU/LSTM |
| 🎶 Audio | Speech recognition | LSTM |
| 🕒 IoT | Sensor data forecasting | GRU |
| 💬 Chatbots | Context understanding | LSTM/Transformer hybrid |
💬 8.11 Example: Character-Level Text Generation
Let’s create a mini text generator that learns sequences of characters.
import torch.nn.functional as F
chars = list("hello")
char2idx = {ch: i for i, ch in enumerate(chars)}
idx2char = {i: ch for ch, i in char2idx.items()}
seq = torch.tensor([[char2idx['h'], char2idx['e'], char2idx['l'], char2idx['l']]])
target = torch.tensor([[char2idx['e'], char2idx['l'], char2idx['l'], char2idx['o']]])
model = nn.Sequential(
nn.Embedding(len(chars), 8),
nn.RNN(8, 16, batch_first=True),
nn.Linear(16, len(chars))
)
The model can then learn to predict “e”, “l”, “l”, “o” given “h”, “e”, “l”, “l”.
🧮 8.12 Visualizing Hidden States
You can visualize how hidden states evolve:
import matplotlib.pyplot as plt
hidden_states = []
hidden = None
for i in range(len(X[0])):
out, hidden = model(X[:, i:i+1], hidden)
hidden_states.append(hidden[0].detach().numpy().flatten())
plt.plot(hidden_states)
plt.title("Hidden State Evolution Over Time")
plt.show()
This shows how memory updates through the sequence.
🧠 8.13 Tips for Training RNNs
-
Normalize or scale sequential data
-
Use gradient clipping to prevent exploding gradients
-
Initialize hidden states properly (
hidden = torch.zeros(...)) -
Use LSTMs/GRUs for complex tasks
-
Experiment with sequence lengths (shorter sequences → faster training)
✅ 8.14 Summary of Section 8
You now understand:
-
What makes RNNs special for sequential data
-
How LSTMs and GRUs solve memory challenges
-
How to build, train, and visualize RNNs in PyTorch
-
Real-world use cases in text, audio, and forecasting
🚀 Up Next:
We’ll continue with Section 9: Natural Language Processing (NLP) with PyTorch, where we’ll apply RNNs and Transformers for:
-
Text preprocessing
-
Sentiment analysis
-
Word embeddings (Word2Vec, GloVe)
-
Sequence-to-sequence translation models
🧩 Section 9: Model Optimization and Deployment in PyTorch
Building and training a model is only part of the deep learning journey. Once you have a working model, the next big steps are:
-
Optimizing it for speed, memory, and accuracy.
-
Deploying it in a scalable way to real-world applications — from cloud servers to mobile devices.
PyTorch provides an excellent ecosystem for both these stages — with tools like TorchScript, ONNX, Quantization, and TorchServe.
⚙️ 9.1. Model Optimization: The Key to Efficiency
Model optimization in PyTorch focuses on reducing computational cost and improving inference speed without sacrificing accuracy.
🚀 Common Optimization Techniques
| Optimization Type | Description | Example |
|---|---|---|
| Quantization | Converts model parameters from float32 to int8 to reduce size and speed up inference | Mobile & Edge AI |
| Pruning | Removes weights or neurons that contribute little to output | Compress large models |
| Knowledge Distillation | Trains a smaller model (student) using the outputs of a large model (teacher) | Efficient model serving |
| Mixed Precision Training | Uses float16 and float32 together for faster GPU training | NVIDIA Ampere GPUs |
🧮 9.2. Quantization Example
Quantization helps deploy models to mobile or embedded devices by reducing memory footprint.
import torch
from torchvision import models
# Load a pretrained model
model = models.resnet18(pretrained=True)
model.eval()
# Static quantization preparation
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)
torch.quantization.convert(model, inplace=True)
print("Quantized model size:", sum(p.numel() for p in model.parameters()))
✅ Result: You’ll see up to 4× reduction in model size and improved inference time.
🔪 9.3. Model Pruning Example
Model pruning eliminates unnecessary weights.
import torch.nn.utils.prune as prune
# Prune 30% of connections in linear layer
for name, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
prune.l1_unstructured(module, name="weight", amount=0.3)
✅ Outcome: Model becomes smaller and faster with minimal accuracy drop.
⚗️ 9.4. Knowledge Distillation (Student–Teacher Learning)
Knowledge Distillation allows training a compact student model by learning from a larger, pre-trained teacher model.
teacher_model = models.resnet50(pretrained=True)
student_model = models.resnet18()
criterion = torch.nn.KLDivLoss(reduction='batchmean')
optimizer = torch.optim.Adam(student_model.parameters(), lr=0.001)
for data, target in dataloader:
with torch.no_grad():
teacher_output = torch.nn.functional.log_softmax(teacher_model(data) / 5, dim=1)
student_output = torch.nn.functional.log_softmax(student_model(data) / 5, dim=1)
loss = criterion(student_output, teacher_output)
loss.backward()
optimizer.step()
✅ Real-world Example:
Companies like Google and Meta use distillation to deploy small transformer models on mobile devices (e.g., BERT → TinyBERT).
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"
🚀 9.5. Deployment: From Research to Production
Once optimized, models need to be served efficiently in production environments — APIs, web services, or mobile apps.
PyTorch offers multiple deployment options:
-
TorchScript (Convert model to static graph)
-
ONNX (Export model for cross-framework compatibility)
-
TorchServe (Production-grade model server)
-
PyTorch Mobile (For mobile/edge devices)
🧱 9.5.1. TorchScript
TorchScript converts dynamic PyTorch models into a serialized form that runs without Python — ideal for production environments.
# Convert model to TorchScript
traced_model = torch.jit.trace(model, torch.randn(1, 3, 224, 224))
torch.jit.save(traced_model, "resnet18_traced.pt")
# Load and run TorchScript model
loaded = torch.jit.load("resnet18_traced.pt")
output = loaded(torch.randn(1, 3, 224, 224))
✅ Benefits:
-
Faster inference
-
Portable to C++ runtime
-
No dependency on Python at inference time
🔗 9.5.2. ONNX (Open Neural Network Exchange)
ONNX enables exporting models to other frameworks like TensorFlow, Caffe2, or OpenVINO.
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "resnet18.onnx", export_params=True)
✅ Benefits:
-
Framework interoperability
-
Deploy on non-PyTorch systems
-
Optimized inference with ONNX Runtime
🧰 9.5.3. TorchServe (Model Serving)
TorchServe is an official tool by AWS & Facebook for serving PyTorch models in production.
Steps:
-
Save model file (
.mar) -
Launch TorchServe server
-
Expose REST API for inference
torch-model-archiver --model-name resnet18 \
--version 1.0 --serialized-file resnet18.pt \
--handler image_classifier --export-path model_store
torchserve --start --ncs --model-store model_store --models resnet=resnet18.mar
✅ Benefits:
-
Handles multiple models
-
Supports batching and metrics
-
Scalable for enterprise deployments
📱 9.5.4. PyTorch Mobile
For mobile or edge deployment, convert model using TorchScript and integrate with:
-
Android (Java API)
-
iOS (Swift API)
# Convert to TorchScript
scripted_model = torch.jit.script(model)
scripted_model.save("mobile_model.pt")
✅ Use Case Example:
AI-based camera filters, speech recognition, and on-device pose estimation models.
🌩️ 9.6. Deploying Models to the Cloud
Cloud platforms make large-scale deployment easier:
-
AWS Sagemaker → Native PyTorch support
-
Google Cloud AI Platform → Scalable REST APIs
-
Azure ML → Preconfigured PyTorch containers
Each allows automatic scaling, version control, and CI/CD pipelines.
📊 9.7. Monitoring and Logging
Post-deployment, it’s crucial to track model performance using metrics such as:
-
Latency
-
Accuracy drift
-
Resource utilization
Tools for monitoring:
-
Weights & Biases
-
MLflow
-
Prometheus + Grafana
🧠 9.8. Real-World Example: Image Classification API
Here’s how a simple Flask-based API might use a deployed PyTorch model:
from flask import Flask, request, jsonify
import torch
from torchvision import models, transforms
from PIL import Image
app = Flask(__name__)
model = torch.jit.load("resnet18_traced.pt")
model.eval()
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
])
@app.route('/predict', methods=['POST'])
def predict():
file = request.files['file']
image = Image.open(file)
img_tensor = transform(image).unsqueeze(0)
with torch.no_grad():
preds = model(img_tensor)
predicted = torch.argmax(preds, 1).item()
return jsonify({"class": int(predicted)})
if __name__ == '__main__':
app.run(debug=True)
✅ Use Case:
Deploying your PyTorch model as a REST API for an image recognition service.
🏁 9.9. Summary
| Concept | Purpose | Tools/Methods |
|---|---|---|
| Quantization | Reduce model size | torch.quantization |
| Pruning | Remove redundant weights | torch.nn.utils.prune |
| Distillation | Compress model via teacher-student learning | KLDivLoss |
| Deployment | Run in production | TorchScript, ONNX, TorchServe |
| Monitoring | Track and analyze performance | MLflow, W&B |
🧠 Final Insight
Model optimization and deployment are where research becomes reality.
With PyTorch, you can go from a prototype on your laptop to a production-grade AI model running in the cloud or on a mobile device — all using the same ecosystem.
Sponsor Key-Word
"This Content Sponsored by SBO Digital Marketing.
Mobile-Based Part-Time Job Opportunity by SBO!
Earn money online by doing simple content publishing and sharing tasks. Here's how:
Job Type: Mobile-based part-time work
Work Involves:
Content publishing
Content sharing on social media
Time Required: As little as 1 hour a day
Earnings: ₹300 or more daily
Requirements:
Active Facebook and Instagram account
Basic knowledge of using mobile and social media
For more details:
WhatsApp your Name and Qualification to 9994104160
a.Online Part Time Jobs from Home
b.Work from Home Jobs Without Investment
c.Freelance Jobs Online for Students
d.Mobile Based Online Jobs
e.Daily Payment Online Jobs
Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"



Comments
Post a Comment