AeVP: Interactive Autoencoder Visualization

Ram Vikas Mishra (246101011)

PhD (CSE), IIT Guwahati

DA623: Computing with Signal (Even Semester, Batch 2025)

Instructor: Dr. Neeraj Sharma

(Scroll down or use navigation buttons to proceed)

Motivation

  • Understanding the "Black Box": Deep learning models can be opaque. How do they *really* work internally?
  • Hands-on Learning: Inspired by Dr. Sharma's interactive teaching for building intuition by doing.
  • Visualizing Core Concepts: Autoencoders are fundamental for dimensionality reduction, feature learning, and generative models.
  • Bridging Theory and Practice: Create a tool to directly manipulate and observe AE components (layers, bottleneck).
  • Making Learning Engaging: Move beyond static examples to a dynamic, configurable laboratory environment.

Connection to Multimodal Learning

Autoencoders are crucial precursors and components in multimodal systems.

  • Representation Learning: Core task in ML. AEs learn compressed, meaningful representations (the *latent code*) from input data (like images).
  • Shared Latent Spaces: In multimodal learning, variants of AEs help map data from different modalities (e.g., images and text) into a common latent space.
  • Cross-Modal Generation: Decoding from this shared space can allow generating one modality from another (e.g., generating image captions).
  • Dimensionality Reduction: Essential for handling high-dimensional multimodal data efficiently.

(Generic diagram showing modality mapping via latent space)

This project focuses on visualizing the *unimodal* representation learning aspect, a building block for more complex multimodal architectures.

Learning Journey & Takeaways

  • Deep Dive into Autoencoders: Solidified understanding of encoder/decoder structure, convolutional/transpose layers, and the bottleneck's role.
  • TensorFlow/Keras Implementation: Practical experience defining, training, and saving models using the Keras API.
  • Backend Development (Flask): Building API endpoints to handle model training, status updates, and inference requests. Managing background tasks (threading).
  • Frontend Interaction (JS): Capturing user input (canvas, sliders, uploads), making asynchronous API calls (`fetch`), updating the DOM dynamically (status, charts, images).
  • Data Handling: Image preprocessing (resizing, normalization), Base64 encoding/decoding, handling NumPy arrays between backend and frontend (via JSON).
  • Visualization Techniques: Using Chart.js for real-time plotting, formatting activation maps for interpretability.
  • Debugging Challenges: Tackling state management issues, frontend/backend synchronization, JavaScript errors, and Colab environment quirks.

Project Core: Configuration & Training

Configuration

  • Select Dataset (MNIST/Fashion-MNIST)
  • Define Bottleneck Size (e.g., 2x2, 3x3)
  • Set Filter Counts for Conv Layers
  • Specify Number of Training Epochs

Real-time Training

  • Initiate training via API call.
  • Backend trains model in background thread.
  • Frontend polls for status updates.
  • Live Training/Validation Loss chart (Chart.js).
  • Clear status messages (Epoch progress, final loss).

(Screenshot of Config Area & Chart)

Project Core: Interactive Inference (1/2)

Input

  • Draw directly on the canvas.
  • Upload custom images (PNG/JPG).
  • Image is preprocessed (28x28 grayscale, normalized).

(Screenshot of Canvas Input Area)

Encoder Visualization

  • Input image passed through the trained encoder.
  • Visualize activation maps from each Conv layer.
  • Shows features extracted at different stages (edges, patterns).
  • Uses colormaps (like Viridis) for clarity.
  • Hover to zoom on individual feature maps.

(Screenshot of Encoder Activations)

Project Core: Interactive Inference (2/2)

Bottleneck

  • View the compressed latent vector values.
  • **Crucially:** Manipulate values using sliders.
  • Observe real-time changes in the decoded output.
  • Provides intuition for the latent space structure.

(Screenshot of Bottleneck Grid and Output of Decoder)

Decoder & Output

  • Latent vector (original or modified) fed to decoder.
  • Visualize activations from ConvTranspose layers.
  • Shows how features are gradually reconstructed.
  • View the final 28x28 reconstructed image.
  • Compare input vs. output.

(Screenshot of Decoder Activations)

Architecture Snippet (Encoder Example)

Simplified Keras code demonstrating the encoder structure:


from tensorflow.keras import layers, models, Input

def build_encoder(input_shape, filters1, filters2, latent_dim):
    encoder_inputs = Input(shape=input_shape, name='encoder_input')

    # Conv 1 -> 14x14
    x = layers.Conv2D(filters1, (3, 3), activation='relu', padding='same',
                      strides=2, name='encoder_conv1')(encoder_inputs)
    x = layers.BatchNormalization(name='encoder_bn1')(x)

    # Conv 2 -> 7x7
    x = layers.Conv2D(filters2, (3, 3), activation='relu', padding='same',
                      strides=2, name='encoder_conv2')(x)
    x = layers.BatchNormalization(name='encoder_bn2')(x)

    x = layers.Flatten(name='encoder_flatten')(x)

    # Bottleneck
    encoder_outputs = layers.Dense(latent_dim, name='bottleneck')(x)
    encoder = models.Model(encoder_inputs, encoder_outputs, name='encoder')
    return encoder

# Usage:
# encoder = build_encoder((28, 28, 1), f1=8, f2=4, latent_dim=9)
                

Key components: Convolutional layers for spatial reduction, Batch Normalization for stability, Flattening, and a Dense layer for the final bottleneck.

Reflections

What Surprised Me?

  • **Sensitivity:** How much reconstruction quality depends on bottleneck size, filter counts, and epochs. Small changes have big impacts.
  • **Latent Space:** The bottleneck values aren't always easily interpretable, but modifying them clearly affects specific output features.
  • **Complexity:** Integrating backend (TF, Flask), frontend (JS, Canvas), and real-time updates involves many moving parts and potential points of failure.
  • **Visualization Power:** Seeing activations directly provides much more insight than just looking at loss curves.

Scope for Improvement

  • **Architectures:** Implement Variational AEs (VAEs) or Denoising AEs.
  • **Datasets:** Add support for more complex datasets (e.g., CIFAR-10, CelebA - might need deeper models).
  • **Viz:** More advanced techniques (t-SNE/UMAP on bottleneck), Grad-CAM for layer importance.
  • **UI/UX:** More configuration options, smoother animations, ability to save/load specific bottleneck states.
  • **Deployment:** Package as a standalone web app (e.g., using Docker, cloud services).

References & Tools Used

Key Concepts / Inspirations:

  • Convolutional Neural Networks (CNNs)
  • Autoencoder Principles (Encoder, Decoder, Bottleneck)
  • Dimensionality Reduction & Feature Learning
  • Keras Documentation & Examples
  • Dr. Sharma's Course Lectures (DA623)
  • (Add specific papers if any were directly referenced)

Technologies & Libraries:

  • **Backend:** Python, TensorFlow/Keras, Flask, NumPy, Pillow
  • **Frontend:** HTML5, CSS3, JavaScript (ES6+)
  • **Visualization:** Chart.js, Matplotlib (backend map generation)
  • **Environment:** Google Colaboratory
  • **Assistance:** Generative AI (ChatGPT/Claude/Gemini) for code generation, debugging, and content suggestions.
  • **Styling/Animation:** Font Awesome, Animate.css

Thank You!

Questions?

Project Colab Link: AeVP Notebook

Author: Ram Vikas Mishra | rvmishra.in