Autoencoders Explained: A Complete Guide - II - Building Your First Autoencoder and Denoising, types of autoencoders and building it

  Autoencoders Explained: A Complete Guide- II

content: 

5. Building Your First Autoencoder in PyTorch (Full Code + Explanation)
6. Building a Denoising Autoencoder (Theory + Full PyTorch Code)
7. Types of Autoencoders
8. Building an Autoencoder — Step-by-Step (Conceptual Walkthrough)

📘 Section 5: Building Your First Autoencoder in PyTorch (Full Code + Explanation)

Now that we understand the theory and math, it’s time to build a real autoencoder using PyTorch.
In this section, we’ll walk step-by-step through:

✔ Preparing the dataset
✔ Writing the Autoencoder class
✔ Training the model
✔ Evaluating reconstructions
✔ Visualizing outputs

This is your first working autoencoder, and it becomes the foundation for all advanced versions (Denoising AE, VAE, CVAE, etc.).


🔶 1. Import Dependencies

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
  • torch.nn → building model

  • torch.optim → optimizers

  • torchvision.datasets → MNIST dataset

  • matplotlib → visualization


🔶 2. Preparing the MNIST Dataset

We use 28×28 grayscale handwritten digits, perfect for beginners.

transform = transforms.Compose([
    transforms.ToTensor()
])

train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_data  = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=128, shuffle=True)
test_loader  = DataLoader(test_data, batch_size=128, shuffle=False)

✔ Pixel values converted to tensors
✔ No normalization required (autoencoder learns distribution)


🔶 3. Define the Autoencoder Class

Here’s a simple fully connected (dense) autoencoder:

Architecture:

  • Input: 784 (flattened 28×28)

  • Hidden: 256 → 64 → bottleneck = 16

  • Decoder: reverse of encoder

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Linear(64, 16)   # bottleneck
        )
        
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(16, 64),
            nn.ReLU(),
            nn.Linear(64, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Sigmoid()       # outputs 0–1
        )
    
    def forward(self, x):
        x = x.view(-1, 784)  # flatten
        z = self.encoder(x)
        out = self.decoder(z)
        out = out.view(-1, 1, 28, 28)  # reshape back
        return out, z

encoder() compresses input
decoder() reconstructs input
✔ Sigmoid output suits image reconstruction


🔶 4. Initialize Model, Loss Function & Optimizer

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Autoencoder().to(device)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
  • MSE works perfectly for pixel reconstruction

  • Adam is stable and efficient


🔶 5. Training Loop

num_epochs = 10

for epoch in range(num_epochs):
    total_loss = 0
    
    for images, _ in train_loader:
        images = images.to(device)
        
        optimizer.zero_grad()
        
        outputs, latent = model(images)
        loss = criterion(outputs, images)
        
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}")

✔ Forward pass
✔ Compute reconstruction loss
✔ Backpropagation
✔ Parameter update


🔶 6. Reconstruct Images (Qualitative Evaluation)

def show_images(original, reconstructed, n=10):
    plt.figure(figsize=(15, 4))
    
    for i in range(n):
        # Original
        plt.subplot(2, n, i+1)
        plt.imshow(original[i].squeeze().cpu().numpy(), cmap='gray')
        plt.axis('off')
        
        # Reconstructed
        plt.subplot(2, n, i+1+n)
        plt.imshow(reconstructed[i].squeeze().cpu().detach().numpy(), cmap='gray')
        plt.axis('off')

    plt.show()

Now test:

test_images, _ = next(iter(test_loader))
test_images = test_images.to(device)

reconstructed, _ = model(test_images)
show_images(test_images, reconstructed)

🔶 7. Inspecting the Latent Space (Optional)

The latent vector (size 16) is accessible via:

_, latent_vectors = model(test_images)
print(latent_vectors.shape)

Output:

torch.Size([128, 16])

Each image is now compressed from 784 → 16 dimensions.


🔶 8. Example Reconstruction Results

You will typically see:

✔ Blurry but accurate digit reconstructions
✔ Clear retention of original shapes
✔ Effective noise reduction (even without training as denoiser)

This confirms that the autoencoder has learned essential features.


🔶 Section 5 Summary

In this section, we:

✔ Loaded MNIST
✔ Built a fully connected autoencoder
✔ Defined encoder + decoder architecture
✔ Trained it for 10 epochs
✔ Reconstructed test images
✔ Saw latent vectors representing compressed data

This is the foundational model upon which all advanced autoencoders are built.


📘 Section 6: Building a Denoising Autoencoder (Theory + Full PyTorch Code)

So far, we have built a vanilla autoencoder that learns to reconstruct images.
But real-world data is often corrupted, noisy, or incomplete.

To handle this, researchers introduced a powerful variation:

🔶 Denoising Autoencoder (DAE)

A model trained to remove noise and recover original clean data.

This turns the autoencoder into a robust feature extractor.


🔶 1. What Is a Denoising Autoencoder?

A Denoising Autoencoder works like this:

  1. Start with clean input ( x )

  2. Add noise → corrupted input ( \tilde{x} )

  3. Feed ( \tilde{x} ) into the encoder

  4. The decoder reconstructs clean output ( \hat{x} \approx x )

Formally:

[
\tilde{x} = x + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2)
]

[
\hat{x} = \text{Decoder}(\text{Encoder}(\tilde{x}))
]

The loss is still Mean Squared Error (MSE):

[
\mathcal{L} = | x - \hat{x} |^2
]

✔ Why DAE is powerful?

  • Learns robust, noise-invariant features

  • Avoids trivial identity function

  • Generalizes better than vanilla AE

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


🔶 2. Adding Noise to Images

The most common way:

Gaussian Noise

[
\tilde{x} = x + \mathcal{N}(0, 0.1)
]

Salt & Pepper Noise

Random black/white pixels.

In this tutorial, we use Gaussian noise.


🔶 3. Dataset with Added Noise

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

# Add Gaussian Noise
class AddGaussianNoise(object):
    def __init__(self, mean=0., std=0.2):
        self.mean = mean
        self.std = std
        
    def __call__(self, tensor):
        noise = torch.randn(tensor.size()) * self.std + self.mean
        return torch.clamp(tensor + noise, 0., 1.)

Training data = noisy

Target = clean

train_transform = transforms.Compose([
    transforms.ToTensor(),
    AddGaussianNoise(0., 0.3)
])

clean_transform = transforms.ToTensor()

train_data_noisy = datasets.MNIST(root='./data', train=True, download=True, transform=train_transform)
train_data_clean = datasets.MNIST(root='./data', train=True, download=True, transform=clean_transform)

train_loader = DataLoader(list(zip(train_data_noisy, train_data_clean)), batch_size=128, shuffle=True)

🔶 4. Denoising Autoencoder Architecture

We’ll reuse the same autoencoder structure from Section 5.

class DenoisingAutoencoder(nn.Module):
    def __init__(self):
        super(DenoisingAutoencoder, self).__init__()
        
        self.encoder = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Linear(64, 16),
        )
        
        self.decoder = nn.Sequential(
            nn.Linear(16, 64),
            nn.ReLU(),
            nn.Linear(64, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        x = x.view(-1, 784)
        z = self.encoder(x)
        out = self.decoder(z)
        return out.view(-1, 1, 28, 28)

🔶 5. Training the Denoising Autoencoder

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = DenoisingAutoencoder().to(device)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

num_epochs = 10

for epoch in range(num_epochs):
    total_loss = 0
    
    for (noisy_imgs, clean_imgs) in train_loader:
        noisy_imgs = noisy_imgs.to(device)
        clean_imgs = clean_imgs.to(device)
        
        optimizer.zero_grad()
        
        reconstructed = model(noisy_imgs)
        loss = criterion(reconstructed, clean_imgs)
        
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}")

✔ The model learns to map noisy → clean
✔ Loss steadily decreases over epochs


🔶 6. Testing the Denoising Model

test_data = datasets.MNIST(root='./data', train=False, download=True,
                           transform=transforms.ToTensor())

test_loader = DataLoader(test_data, batch_size=10, shuffle=True)

# Add noise manually for testing
noise = AddGaussianNoise(0., 0.3)

images, _ = next(iter(test_loader))
noisy_images = noise(images)

images = images.to(device)
noisy_images = noisy_images.to(device)

output = model(noisy_images)

🔶 7. Visualizing Results

def show_denoising(original, noisy, reconstructed):
    plt.figure(figsize=(15, 5))
    
    for i in range(10):
        # Original
        plt.subplot(3, 10, i+1)
        plt.imshow(original[i].cpu().squeeze(), cmap='gray')
        plt.axis('off')

        # Noisy
        plt.subplot(3, 10, i+11)
        plt.imshow(noisy[i].cpu().squeeze(), cmap='gray')
        plt.axis('off')

        # Reconstructed
        plt.subplot(3, 10, i+21)
        plt.imshow(reconstructed[i].detach().cpu().squeeze(), cmap='gray')
        plt.axis('off')

    plt.show()

show_denoising(images, noisy_images, output)

🔶 8. Expected Result

Your output visuals will show:

  • Top row → original digits

  • Middle row → noisy corrupted digits

  • Bottom row → cleaned images produced by the autoencoder

A Denoising Autoencoder can remove:

✔ Gaussian blur
✔ Random pixel noise
✔ Light distortions


🔶 Section 6 Summary

You now have:

✔ Full theory of denoising autoencoders
✔ Noise injection pipeline
✔ Complete PyTorch implementation
✔ Training + visualization
✔ Reconstructed clean images

This model is extremely useful and forms the basis for:

  • Deepfake cleaners

  • Speech denoisers

  • Image restoration tools

  • Medical image cleanup

  • Real-world preprocessing pipelines

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


7. Types of Autoencoders

Autoencoders come in several variants, each designed to improve reconstruction quality, generalization, or latent space structure. Below are the most important types used in modern AI.


7.1 Undercomplete Autoencoder

Definition

An autoencoder where the latent space (bottleneck) has fewer dimensions than the input.

Why it exists

To force the model to learn the most important features, not memorize.

Use Cases

  • Feature extraction

  • Dimensionality reduction (PCA alternative)

  • Noise removal

Diagram (conceptually)

Input → Encoder → Small bottleneck → Decoder → Output


7.2 Overcomplete Autoencoder

Definition

Latent space has more dimensions than the input.

Why it exists

For tasks where richer representations are needed.

Risk

The model may memorize the data.

Fix

Use regularization:

  • Sparse autoencoder

  • Denoising autoencoder

  • Contractive autoencoder


7.3 Sparse Autoencoder

Definition

Uses sparsity constraint (like L1 regularization or KL divergence):
Only a small number of neurons activate at once.

Why it exists

Mimics biological neurons → leads to features like edge detection.

Use Cases

  • Extracting meaningful features

  • Pretraining deep networks

  • Speech feature extraction


7.4 Denoising Autoencoder (DAE)

Definition

The model removes noise:
Noisy Input → Autoencoder → Clean Output

Why it exists

To create a robust encoder that can recover from corrupted data.

Use Cases

  • Noise removal in images

  • Improving robustness

  • Pretraining deep networks

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


7.5 Contractive Autoencoder (CAE)

Definition

Adds a penalty on the encoder gradients to ensure the mapping is stable.

Why it exists

To make the representation less sensitive to small changes in input.

Use Cases

  • Semantic feature extraction

  • Smooth latent space learning


7.6 Variational Autoencoder (VAE)

Definition

A probabilistic autoencoder that learns a distribution of latent variables instead of fixed values.

Why it exists

For generating new images, not just reconstructing old ones.

Use Cases

  • Image generation

  • Synthetic dataset creation

  • Style transfer

  • Anomaly detection


7.7 Convolutional Autoencoder (CAE)

Definition

Uses Conv2D layers instead of dense layers → works well for images.

Why it exists

Captures spatial structure.

Use Cases

  • Image compression

  • Image denoising

  • Feature extraction from images


7.8 Sequence Autoencoder

Definition

Uses LSTM/GRU for sequence-to-sequence autoencoding.

Why it exists

To handle sequential data.

Use Cases

  • Text embedding

  • Speech compression

  • Time-series anomaly detection


7.9 Multimodal Autoencoders

Definition

Autoencoders that process multiple data types (e.g., image + text).

Use Cases

  • Vision + Language models

  • Cross-modal retrieval

  • Multimodal representation learning


✔ Summary Table (Perfect for Your Blog)

Autoencoder Type Key Idea Best For
Undercomplete Latent space < input Compact feature learning
Overcomplete Latent space > input Rich representations
Sparse Few active neurons Meaningful features
Denoising Recover from noise Image denoising
Contractive Stable mapping Smooth latent space
Convolutional Conv layers Image tasks
VAE Probabilistic latent space Generative models
Sequence LSTM-based NLP, time series
Multimodal Multi-input Vision–Language

Section 8: Building an Autoencoder — Step-by-Step (Conceptual Walkthrough)

In this section, we break down exactly how an autoencoder is built, from data preparation to training. This gives your blog readers a clear roadmap before jumping into code.


8. Building an Autoencoder — Step-by-Step Guide

Autoencoders follow a simple pipeline:
Input → Encoder → Latent Space → Decoder → Output (Reconstruction)

Here is the complete conceptual workflow:


8.1 Step 1 — Choose Your Dataset

Autoencoders work best on:

  • Images (MNIST, CIFAR-10, Fashion MNIST)

  • Tabular data

  • Text (for sequence autoencoders)

  • Time-series data

For your blog example, we normally use:
👉 MNIST Handwritten Digits dataset
because it is simple, clean, and widely used for autoencoder demos.


8.2 Step 2 — Normalize the Data

Autoencoders are sensitive to scale.

For image datasets (0–255 pixel values):

x = x / 255.0

Why normalization?

  • Faster training

  • More stable gradients

  • Better reconstruction quality


8.3 Step 3 — Define the Encoder Architecture

The encoder compresses data into a small representation.

Typical choices:

  • Dense layers (for simple examples)

  • Conv2D layers (for image autoencoders)

Example encoder (conceptually):

Input (28×28)
→ Dense(128)
→ Dense(64)
→ Dense(32)  ← latent dimension

Key rule:
Each layer gets smaller, funneling down into the bottleneck.


8.4 Step 4 — Define the Latent Space (Bottleneck)

This is the heart of the autoencoder.

Latent space determines:

  • How much the model compresses

  • What features it learns

  • How well it can reconstruct inputs

Typically:

  • 16, 32, 64 dimensions

  • Much smaller than input (784 for MNIST)

Good rule of thumb:
👉 Start with 16 or 32 latent dims for MNIST.


8.5 Step 5 — Define the Decoder Architecture

The decoder mirrors the encoder.

Conceptually:

Latent vector (32)
→ Dense(64)
→ Dense(128)
→ Dense(784) → Reshape to 28×28

Key idea:
The decoder gradually expands compressed data back to original shape.

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"


8.6 Step 6 — Choose a Loss Function

For autoencoders, the most common is:

Mean Squared Error (MSE)

Measures pixel-by-pixel difference:

MSE = mean((original - reconstructed)^2)

Ideal when:

  • Output is continuous

  • Data is normalized

Alternative losses:

  • Binary Cross-Entropy (BCE)

  • MAE (L1 loss)

  • SSIM (for better image quality)


8.7 Step 7 — Choose Optimizer

Common choices:

  • Adam (best for beginners)

  • RMSprop

  • SGD (slower but stable)

Recommended:

optimizer = Adam(learning_rate=0.001)

8.8 Step 8 — Train the Autoencoder

Key hyperparameters:

  • Batch size: 32–128

  • Epochs: 20–50

  • Validation split: 10–20%

Training goal:
Minimize reconstruction loss
Learn essential features automatically


8.9 Step 9 — Evaluate Reconstruction Quality

You should check:

  • Loss value

  • Reconstructed images

  • Latent space distribution

  • Underfitting or overfitting

For images:
👉 Visual comparison is the best evaluator.


8.10 Step 10 — Use the Autoencoder for Applications

After training, autoencoders can be used for:

  • Denoising images

  • Dimensionality reduction

  • Anomaly detection (compare reconstruction error)

  • Feature extraction

  • Image compression

This is where the model becomes practical.


✔ Section 8 Summary (Perfect for your blog)

Step Description
1 Select dataset
2 Normalize data
3 Build encoder
4 Define latent space
5 Build decoder
6 Choose loss
7 Choose optimizer
8 Train model
9 Evaluate reconstruction
10 Apply the trained autoencoder

Sponsor Key-Word

"This Content Sponsored by SBO Digital Marketing.

Mobile-Based Part-Time Job Opportunity by SBO!

Earn money online by doing simple content publishing and sharing tasks. Here's how:

Job Type: Mobile-based part-time work

Work Involves:

Content publishing

Content sharing on social media

Time Required: As little as 1 hour a day

Earnings: ₹300 or more daily

Requirements:

Active Facebook and Instagram account

Basic knowledge of using mobile and social media

For more details:

WhatsApp your Name and Qualification to 9994104160

a.Online Part Time Jobs from Home

b.Work from Home Jobs Without Investment

c.Freelance Jobs Online for Students

d.Mobile Based Online Jobs

e.Daily Payment Online Jobs

Keyword & Tag: #OnlinePartTimeJob #WorkFromHome #EarnMoneyOnline #PartTimeJob #jobs #jobalerts #withoutinvestmentjob"

Comments