Shoe Pair Classification using PyTorch

This project implements and evaluates custom Convolutional Neural Network (CNN) architectures for image similarity detection. The core objective is to classify whether a pair of images belongs to the same shoe or different shoes.

Beyond basic classification, this project explores architectural trade-offs and hardware-constrained optimization, specifically addressing VRAM limitations and memory management efficiency.

🚀 Project Overview

The system processes image pairs of size $224 \times 224 \times 3$. The challenge lies in learning spatial correlations between two distinct images to determine identity. Two primary architectures were benchmarked:

Standard CNN: A modular 4-block architecture utilizing Batch Normalization and Dropout for regularization.
CNNChannel (Optimized): A specialized approach that concatenates input images along the channel dimension. Instead of treating images as separate entities, the model processes a 6-channel tensor: $$X_{input} \in \mathbb{R}^{N \times 6 \times 224 \times 224}$$ This allows the initial convolutional layers to learn filters that directly compare corresponding spatial features between the two images, leading to better convergence.

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

Training deep models on consumer-grade hardware (e.g., NVIDIA GTX 1660 Ti) presented significant Out-of-Memory (OOM) challenges during the training of multi-layer CNNs.

Solution: Implemented dynamic batch-size scaling and efficient data loading through PyTorch DataLoaders.
Optimization: Applied mean-subtraction and normalization as a pre-processing step to accelerate convergence and reduce the computational overhead of the first layer.

2. Overfitting & Generalization

Initial tests showed a significant gap between Validation Accuracy (~88%) and Test Accuracy (~70%) in basic models.

Regularization: Integrated Dropout ($p=0.5$) and Batch Normalization across all convolutional blocks to stabilize training.
Early Stopping: Monitored validation loss to prevent the model from memorizing training set noise.
Result: The CNNChannel model proved significantly more stable, achieving an average accuracy of 86.5% on unseen test sets.

📊 Performance Benchmarking

Model	Val Accuracy	Test (Male)	Test (Female)	Status
Standard CNN	84.0%	71.2%	75.0%	Overfitted
CNNChannel	88.2%	86.1%	87.2%	Optimal

📁 Repository Structure

├── data/                 # Dataset (Ignored by Git)
├── models/
│   ├── shoe_models.py    # PyTorch architectures (CNN & CNNChannel)
│   └── weights/          # Saved model checkpoints (.pk files)
├── notebooks/
│   └── shoe_classification.ipynb # Training loops & Performance Analysis
├── requirements.txt      # Dependency list
└── README.md             # Project documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shoe Pair Classification using PyTorch

🚀 Project Overview

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

2. Overfitting & Generalization

📊 Performance Benchmarking

📁 Repository Structure

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Shoe Pair Classification using PyTorch

🚀 Project Overview

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

2. Overfitting & Generalization

📊 Performance Benchmarking

📁 Repository Structure