Skip to content

Latest commit

 

History

History
46 lines (34 loc) · 3.02 KB

File metadata and controls

46 lines (34 loc) · 3.02 KB

Shoe Pair Classification using PyTorch

This project implements and evaluates custom Convolutional Neural Network (CNN) architectures for image similarity detection. The core objective is to classify whether a pair of images belongs to the same shoe or different shoes.

Beyond basic classification, this project explores architectural trade-offs and hardware-constrained optimization, specifically addressing VRAM limitations and memory management efficiency.

🚀 Project Overview

The system processes image pairs of size $224 \times 224 \times 3$. The challenge lies in learning spatial correlations between two distinct images to determine identity. Two primary architectures were benchmarked:

  1. Standard CNN: A modular 4-block architecture utilizing Batch Normalization and Dropout for regularization.
  2. CNNChannel (Optimized): A specialized approach that concatenates input images along the channel dimension. Instead of treating images as separate entities, the model processes a 6-channel tensor: $$X_{input} \in \mathbb{R}^{N \times 6 \times 224 \times 224}$$ This allows the initial convolutional layers to learn filters that directly compare corresponding spatial features between the two images, leading to better convergence.

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

Training deep models on consumer-grade hardware (e.g., NVIDIA GTX 1660 Ti) presented significant Out-of-Memory (OOM) challenges during the training of multi-layer CNNs.

  • Solution: Implemented dynamic batch-size scaling and efficient data loading through PyTorch DataLoaders.
  • Optimization: Applied mean-subtraction and normalization as a pre-processing step to accelerate convergence and reduce the computational overhead of the first layer.

2. Overfitting & Generalization

Initial tests showed a significant gap between Validation Accuracy (~88%) and Test Accuracy (~70%) in basic models.

  • Regularization: Integrated Dropout ($p=0.5$) and Batch Normalization across all convolutional blocks to stabilize training.
  • Early Stopping: Monitored validation loss to prevent the model from memorizing training set noise.
  • Result: The CNNChannel model proved significantly more stable, achieving an average accuracy of 86.5% on unseen test sets.

📊 Performance Benchmarking

Model Val Accuracy Test (Male) Test (Female) Status
Standard CNN 84.0% 71.2% 75.0% Overfitted
CNNChannel 88.2% 86.1% 87.2% Optimal

📁 Repository Structure

├── data/                 # Dataset (Ignored by Git)
├── models/
│   ├── shoe_models.py    # PyTorch architectures (CNN & CNNChannel)
│   └── weights/          # Saved model checkpoints (.pk files)
├── notebooks/
│   └── shoe_classification.ipynb # Training loops & Performance Analysis
├── requirements.txt      # Dependency list
└── README.md             # Project documentation