Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.8.0] - 2026-02-02

Added

Text Generation Utilities: TextVocabulary, CharacterEmbedding, sampling functions
- sample_with_temperature, sample_top_k, sample_nucleus
- argmax, softmax helpers
- text_utils_example.rs demonstration

Changed

text_generation_advanced.rs refactored to use new text utilities

[0.7.0] - 2026-01-06

Added

Linear (Dense) Layer: Fully connected layer for classification and output projections
- Forward and backward passes with gradient computation
- Xavier/Glorot weight initialization
- Works with all optimizers (SGD, Adam, RMSprop)
- linear_layer_example.rs demonstration

[0.6.1] - 2025-11-30

Fixed

Text Generation: Fixed broken text generation in advanced example
- Implemented temperature-based sampling for character selection
- Added direct network access for more efficient inference
- Text generation now works continuously for specified length

[0.6.0] - 2025-09-04

Added

Early Stopping: Complete early stopping implementation for all trainers
- Configurable Patience: Stop training after N epochs without improvement
- Multiple Metrics: Monitor validation loss or training loss
- Best Weight Restoration: Automatically restore best weights when stopping
- Flexible Configuration: Customizable min_delta threshold and monitoring options
- Universal Support: Works with LSTMTrainer, ScheduledLSTMTrainer, and LSTMBatchTrainer
Enhanced Training Features:
- Visual Best Epoch Indicators: "*" markers in training logs for best epochs
- Automatic Overfitting Prevention: Stop training before performance degrades
- Comprehensive Logging: Detailed early stopping trigger information
- Weight Management: Optional best weight restoration for optimal model recovery
Examples and Documentation:
- Complete Early Stopping Example: Demonstration of all early stopping configurations
- Multiple Scenarios: Validation loss, training loss, and custom patience examples
- Integration Examples: Shows usage with different trainer types
- Comprehensive Testing: Full test suite for early stopping functionality

Enhanced

All Trainers: Added early stopping support to LSTMTrainer, ScheduledLSTMTrainer, and LSTMBatchTrainer
Training Configuration: Extended TrainingConfig with optional early stopping settings
Training Loops: Enhanced all training loops with early stopping logic and best epoch tracking

[0.5.0] - 2025-08-14

Added

Batch Processing System: Complete implementation of batch processing capabilities
- Batch Forward/Backward Passes: Efficient batch processing in LSTM cells and networks
- LSTMBatchTrainer: New trainer specifically designed for batch training
- Batch Prediction Support: Process multiple sequences simultaneously for inference
- LSTMCellBatchCache and LSTMNetworkBatchCache: Caching structures for batch training
- Batch Loss Computation: Enhanced loss functions with batch processing support
Performance Improvements:
- 4-5x Training Speedup: Significant performance gains through batch processing
- Memory Efficient: Optimized memory usage for large batch sizes
- Scalable Architecture: Handles varying dataset sizes efficiently
- Configurable Batch Sizes: Flexible batch size configuration for different use cases
New Training Infrastructure:
- Batch Training Functions: create_basic_batch_trainer(), create_adam_batch_trainer()
- Batch Sequence Processing: forward_batch_sequences() for multiple sequence handling
- Enhanced Gradient Computation: Batch-optimized gradient calculations
- Batch Evaluation: Efficient evaluation with batch processing
Examples and Documentation:
- Comprehensive Batch Processing Example: Complete demonstration with performance benchmarks
- Scalability Tests: Examples showing performance across different dataset sizes
- Performance Comparisons: Side-by-side comparison of single vs batch processing
- Updated Library Documentation: Enhanced documentation for new batch features

Enhanced

LSTM Cell: Added forward_batch(), forward_batch_with_cache(), and backward_batch() methods
LSTM Network: Extended with batch processing capabilities and multi-sequence support
Loss Functions: Enhanced MSE, MAE, and CrossEntropy with batch computation methods
Training System: Improved training pipeline with batch processing integration

Performance

4.3x speedup with batch size 8
5.4x speedup with batch size 16
Efficient memory utilization across different batch sizes
Scales well from small (50 sequences) to large (500+ sequences) datasets

Compatibility

Backward Compatible: All existing APIs remain unchanged
Drop-in Replacement: Existing code continues to work without modifications
Progressive Enhancement: Users can opt into batch processing when beneficial

[0.4.0] - 2025-07-06

Added

Advanced Learning Rate Scheduling: Comprehensive expansion of learning rate scheduling capabilities
- PolynomialLR: Polynomial decay with configurable power for smooth learning rate transitions
- CyclicalLR: Cyclical learning rates with triangular, triangular2, and exponential range modes
- WarmupScheduler: Generic warmup wrapper that can be applied to any base scheduler
- LRScheduleVisualizer: ASCII visualization tool for learning rate schedules
Enhanced Scheduler Integration:
- Convenience factory methods for new schedulers in ScheduledOptimizer
- Helper functions: polynomial, cyclical, cyclical_triangular2, cyclical_exp_range
- Complete integration with existing training infrastructure
- Comprehensive test coverage for all new schedulers
Learning Rate Visualization:
- ASCII-based schedule visualization with customizable dimensions
- Schedule generation utilities for analysis and debugging
- Visual comparison tools for different scheduler behaviors
- Integration examples showing visualization usage
Advanced Training Examples:
- advanced_lr_scheduling.rs: Comprehensive demonstration of new schedulers
- Warmup + cyclical learning rate combinations
- Best practices example with dropout + gradient clipping + advanced scheduling
- Performance comparison between different scheduling strategies

Technical Improvements

Extended scheduler trait system to support generic warmup wrapper
Robust cyclical learning rate computation with proper cycle handling
Polynomial decay implementation with numerical stability
Comprehensive error handling and edge case management
Enhanced documentation with visual examples and mathematical formulations

Benefits

More sophisticated learning rate control for better training quality
Modern scheduling techniques used in state-of-the-art deep learning
Visualization capabilities for schedule analysis and debugging
Flexible warmup support for any existing scheduler
Production-ready implementations with comprehensive testing

[0.3.0] - 2025-07-03

Added

Learning Rate Scheduling System: Comprehensive scheduling framework with multiple schedulers
- StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, OneCycleLR, ReduceLROnPlateau, LinearLR
- Convenient trainer factory functions: create_step_lr_trainer, create_one_cycle_trainer, etc.
- Automatic scheduler stepping during training with optional logging
- Support for both validation-based and epoch-based scheduling
GRU (Gated Recurrent Unit) Implementation: Complete GRU architecture as alternative to LSTM
- Multi-layer GRU networks with configurable architecture
- Full dropout support (input, recurrent, output dropout and zoneout)
- Forward and backward propagation with gradient computation
- Training integration with existing optimizer and loss function systems
- Comprehensive example demonstrating GRU usage and comparison with LSTM
Bidirectional LSTM Networks: Complete BiLSTM implementation with flexible output combination
- Multiple combine modes: concatenation, sum, and average of forward/backward outputs
- Multi-layer BiLSTM with proper input/output size handling
- Full dropout compatibility (input, recurrent, output dropout and zoneout)
- Sequence processing with both past and future context
- Text classification example comparing BiLSTM vs unidirectional LSTM
Enhanced Dropout Regularization: Comprehensive dropout system with extensive testing
- Variational dropout with consistent masks across time steps
- Layer-specific dropout configuration for fine-grained control
- Zoneout implementation for RNN-specific regularization
- Automatic train/eval mode switching
- Comprehensive dropout demonstration example
Model Persistence System: Complete model saving and loading infrastructure
- JSON and binary serialization formats
- Model metadata tracking (architecture, training info, timestamps)
- State preservation for weights, biases, and optimizer states
- Model inspection utilities for debugging and analysis
- Comprehensive persistence testing
Enhanced Examples and Documentation:
- Learning rate scheduling comprehensive example
- GRU vs LSTM comparison example
- Bidirectional LSTM demonstration
- Enhanced dropout regularization examples
- Model inspection and analysis utilities
- Real-world applications (stock prediction, weather forecasting, text classification)

Technical Improvements

Modular scheduler architecture with trait-based design
Efficient bidirectional sequence processing implementation
Enhanced gradient computation for GRU cells
Improved testing coverage with comprehensive test suites
Better code organization and documentation
Performance optimizations for training workflows

Benefits

More flexible training with advanced learning rate strategies
Alternative RNN architecture (GRU) for different use cases
Better sequence modeling with bidirectional context
Robust regularization options for preventing overfitting
Reliable model persistence for production deployments
Comprehensive examples for real-world applications

[0.2.0] - 2024-06-09

Added

Complete Training System: Full backpropagation through time (BPTT) implementation
Multiple Optimizers: SGD, Adam, and RMSprop with configurable parameters
Loss Functions: MSE, MAE, and Cross-Entropy with numerically stable implementations
Training Infrastructure:
- LSTMTrainer with configurable training loops
- Gradient clipping to prevent exploding gradients
- Validation support with metrics tracking
- Training configuration system
Advanced Features:
- Gradient accumulation across time steps
- Parameter state management for optimizers
- Comprehensive metrics tracking and history
Enhanced Network Architecture:
- Network-level backward propagation
- Multi-layer gradient computation
- Sequence processing with caching
Examples and Documentation:
- Complete training example with sine wave prediction
- Optimizer comparison demonstrations
- Comprehensive API documentation

Changed

LSTM Cell: Added backward propagation with full gradient computation
LSTM Network: Enhanced with training capabilities and caching
Architecture: Restructured for better separation of concerns
Performance: Optimized gradient computation and memory usage

Technical Details

Implemented mathematically correct BPTT algorithm
Added bias correction for Adam optimizer
Implemented numerically stable softmax function
Enhanced multi-layer gradient flow
Added proper parameter update mechanisms

[0.1.0] - 2024-06-10

Added

Core LSTM Implementation:
- Basic LSTM cell with forward propagation
- Standard LSTM equations implementation
- Multi-layer LSTM network support
Peephole LSTM Variant:
- Peephole connections for enhanced performance
- Direct cell state to gate connections
Basic Architecture:
- Modular design with separate layers and models
- Clean separation between cell and network levels
- Configurable network architecture (input size, hidden size, layers)
Utility Functions:
- Sigmoid and tanh activation functions
- Random weight initialization
Examples:
- Basic usage demonstrations
- Multi-layer network examples
- Time series prediction examples
- Text generation examples
- Peephole LSTM usage

Technical Details

Forward-pass only implementation
Xavier-uniform weight initialization
Support for variable sequence lengths
Proper hidden and cell state management

Architecture Evolution

Phase 1: Basic Implementation (v0.1.0)

Forward-pass only LSTM cells
Basic multi-layer networks
Simple examples and demonstrations

Phase 2: Complete Training Framework (v0.2.0)

Full backward propagation (BPTT)
Multiple optimization algorithms
Comprehensive loss functions
Training infrastructure with validation
Advanced gradient management

Phase 3: Documentation & Polish (Current)

Cleaned and optimized documentation
Enhanced code clarity
Comprehensive examples
Performance optimizations

Future Roadmap

Planned Features

Enhanced Architectures:
- Bidirectional LSTM
- GRU (Gated Recurrent Unit)
- Attention mechanisms
Performance Optimizations:
- GPU acceleration support
- Batch processing capabilities
- SIMD optimizations
Advanced Training Features:
- Learning rate scheduling
- Regularization techniques (dropout, weight decay)
- Early stopping
Data Handling:
- Built-in data preprocessing
- Common dataset loaders
- Data augmentation
Visualization & Monitoring:
- Training progress visualization
- Loss curve plotting
- Model analysis tools

Contributing

When contributing to this project, please:

Update the CHANGELOG.md file with your changes
Follow the established coding style
Add tests for new functionality
Update documentation as needed

Version History Summary

v0.1.0: Initial LSTM implementation with forward pass
v0.2.0: Complete training system with BPTT and optimizers
v0.3.0: Learning rate scheduling, GRU implementation, BiLSTM, enhanced dropout, and model persistence
v0.4.0: Advanced learning rate scheduling with 12 different schedulers, warmup support, cyclical rates, and visualization

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.8.0] - 2026-02-02

Added

Changed

[0.7.0] - 2026-01-06

Added

[0.6.1] - 2025-11-30

Fixed

[0.6.0] - 2025-09-04

Added

Enhanced

[0.5.0] - 2025-08-14

Added

Enhanced

Performance

Compatibility

[0.4.0] - 2025-07-06

Added

Technical Improvements

Benefits

[0.3.0] - 2025-07-03

Added

Technical Improvements

Benefits

[0.2.0] - 2024-06-09

Added

Changed

Technical Details

[0.1.0] - 2024-06-10

Added

Technical Details

Architecture Evolution

Phase 1: Basic Implementation (v0.1.0)

Phase 2: Complete Training Framework (v0.2.0)

Phase 3: Documentation & Polish (Current)

Future Roadmap

Planned Features

Contributing

Version History Summary