A hyper-modern, over-engineered, cutting-edge AutoML platform that combines the latest technologies in machine learning, hyperparameter optimization, and distributed computing. Built for Python 3.13+ with the uv packaging ecosystem.
-
๐ฅ State-of-the-Art Optimization
- Optuna with TPE, CMA-ES, NSGA-II samplers
- Hyperband and Successive Halving pruning
- Multi-objective optimization support
- Bayesian optimization with Gaussian Processes
-
๐ค Advanced Model Support
- XGBoost, LightGBM, CatBoost (GPU-enabled)
- Scikit-learn ensemble methods
- Neural Architecture Search (NAS)
- Automated ensemble & stacking
- Custom model registration
-
โก High-Performance Computing
- GPU acceleration for gradient boosting
- Distributed optimization with Ray Tune
- Parallel cross-validation
- Multi-process hyperparameter search
-
๐ง Feature Engineering
- Polynomial features & interactions
- Target encoding for categoricals
- Robust & power transformations
- Automated feature selection
- Missing value imputation strategies
-
๐ Model Explainability
- SHAP values for any model
- LIME for local interpretability
- Feature importance analysis
- Model interpretation dashboards
-
๐จ Beautiful CLI & UI
- Rich terminal interface with progress bars
- Real-time optimization monitoring
- Configuration validation
- Experiment tracking integration
-
๐ Distributed Computing
- Ray Tune for distributed hyperparameter search (1000s of trials in parallel)
- Dask for out-of-core computation (datasets larger than RAM)
- Cloud storage integration (S3, GCS, Azure Blob)
- Horizontal scaling across clusters
-
๐ Experiment Tracking & Registry
- MLflow for experiment tracking and model versioning
- Centralized artifact storage (S3/GCS)
- Model lifecycle management (Staging โ Production)
- A/B testing support
-
โก Production Model Serving
- FastAPI REST API with async support
- Redis distributed caching (10-100x speedup)
- Batch prediction endpoints
- Horizontal autoscaling with Kubernetes
-
๐ Monitoring & Observability
- Prometheus metrics collection
- Grafana dashboards for visualization
- Model drift detection (KS test)
- Performance degradation alerts
- Real-time health checks
-
๐พ Data Processing at Scale
- Streaming data support for infinite datasets
- Chunked processing for exabyte-scale files
- Incremental model training
- Parquet/Arrow for columnar efficiency
-
๐ Extensible Architecture
- Plugin system for custom components
- Event-driven instrumentation
- Type-safe configuration with Pydantic
- Thread-safe component registry
# Clone the repository
git clone https://github.com/Jainam1673/AutoML.git
cd AutoML
# Install with uv (fastest, recommended)
uv pip install -r requirements.txt
# Or install with pip
pip install -r requirements.txt
# Verify installation
python -c "import sys; sys.path.insert(0, 'src'); from automl import __version__; print(f'โ
AutoML {__version__}')"๐ Detailed installation guide: See INSTALL.md for complete instructions, troubleshooting, and GPU setup.
What gets installed: 164 packages including scikit-learn, xgboost, lightgbm, catboost, optuna, mlflow, streamlit, fastapi, and more.
Requirements: Python 3.13+
# Run AutoML with a configuration file
python -m automl run --config configs/iris_classification.yaml
# Or use the CLI directly
automl run --config configs/iris_classification.yaml
# Validate configuration
automl validate configs/iris_classification.yaml
# Display system information
automl info
# Show version
automl versionfrom automl.core.engine import default_engine
from automl.core.config import (
AutoMLConfig,
DatasetConfig,
PipelineConfig,
ModelConfig,
OptimizerConfig,
PreprocessorConfig,
)
# Create configuration
config = AutoMLConfig(
run_name="iris_classification",
dataset=DatasetConfig(name="iris"),
pipeline=PipelineConfig(
preprocessors=[
PreprocessorConfig(name="standard_scaler"),
PreprocessorConfig(name="pca", params={"n_components": 3}),
],
model=ModelConfig(
name="xgboost_classifier",
base_params={"n_estimators": 100},
search_space=[
{"max_depth": 5, "learning_rate": 0.1},
{"max_depth": 10, "learning_rate": 0.05},
],
),
),
optimizer=OptimizerConfig(
name="optuna",
cv_folds=5,
scoring="accuracy",
params={"n_trials": 50, "sampler": "tpe"},
),
)
# Run optimization
engine = default_engine()
results = engine.run(config)
print(f"Best Score: {results['best_score']:.4f}")
print(f"Best Parameters: {results['best_params']}")- NumPy 2.1+ - High-performance numerical computing
- Pandas 2.2+ - Data manipulation and analysis
- Scikit-learn 1.5+ - Machine learning algorithms
- Optuna 4.1+ - Hyperparameter optimization
- XGBoost 2.1+ - Gradient boosting framework
- LightGBM 4.5+ - Fast gradient boosting
- CatBoost 1.2+ - Gradient boosting with categorical support
- Pydantic 2.10+ - Data validation and settings
- Rich 13.9+ - Beautiful terminal output
- Typer 0.15+ - Modern CLI framework
# GPU acceleration
uv sync --extra gpu # PyTorch, CUDA, cuML
# Distributed computing
uv sync --extra distributed # Ray, Dask
# Vision tasks
uv sync --extra vision # timm, torchvision, albumentations
# NLP tasks
uv sync --extra nlp # transformers, sentence-transformers
# Time series
uv sync --extra timeseries # Prophet, NeuralProphet, Darts
# AutoGluon integration
uv sync --extra autogluon
# REST API
uv sync --extra api # FastAPI, Redis, Celery
# Everything
uv sync --extra allautoml/
โโโ src/automl/
โ โโโ core/ # Core engine and orchestration
โ โ โโโ engine.py # Main AutoML engine
โ โ โโโ config.py # Pydantic configuration models
โ โ โโโ events.py # Event system for monitoring
โ โ โโโ registry.py # Component registry
โ โโโ datasets/ # Dataset providers
โ โ โโโ base.py # Dataset abstractions
โ โ โโโ builtin.py # Built-in datasets
โ โโโ models/ # Model factories
โ โ โโโ sklearn.py # Scikit-learn models
โ โ โโโ boosting.py # XGBoost, LightGBM, CatBoost
โ โ โโโ ensemble.py # Ensemble strategies
โ โโโ optimizers/ # Hyperparameter optimizers
โ โ โโโ random_search.py # Random search
โ โ โโโ optuna_optimizer.py # Optuna-based optimizers
โ โโโ pipelines/ # Pipeline builders
โ โ โโโ sklearn.py # Scikit-learn preprocessing
โ โ โโโ advanced.py # Advanced feature engineering
โ โโโ explainability/ # Model interpretation
โ โ โโโ __init__.py # SHAP, LIME, feature importance
โ โโโ cli.py # Command-line interface
โโโ configs/ # Example configurations
โ โโโ iris_classification.yaml
โ โโโ advanced_ensemble.yaml
โ โโโ gpu_accelerated.yaml
โโโ examples/ # Usage examples
โ โโโ complete_workflow.py
โโโ tests/ # Unit and integration tests
โโโ docs/ # Documentation
โโโ pyproject.toml # Project metadata
run_name: "iris_classification"
dataset:
name: "iris"
pipeline:
preprocessors:
- name: "standard_scaler"
model:
name: "xgboost_classifier"
base_params:
n_estimators: 100
search_space:
- max_depth: 5
learning_rate: 0.1
- max_depth: 10
learning_rate: 0.05
optimizer:
name: "optuna"
cv_folds: 5
scoring: "accuracy"
params:
n_trials: 50run_name: "gpu_boosting"
pipeline:
model:
name: "catboost_classifier"
base_params:
task_type: "GPU"
devices: "0"
iterations: 1000
optimizer:
params:
n_trials: 200
sampler: "cmaes"
n_jobs: 4from automl.explainability import create_explainer
# Create SHAP explainer
explainer = create_explainer(
model=trained_model,
method="shap",
background_data=X_train,
)
# Explain single prediction
explanation = explainer.explain_instance(X_test[0])
# Get global feature importance
global_importance = explainer.explain_global()from automl.core.events import CandidateEvaluated
def on_candidate_evaluated(event: CandidateEvaluated):
print(f"Trial {event.candidate_index}: {event.score:.4f}")
engine.instrumentation.events.subscribe(
CandidateEvaluated,
on_candidate_evaluated
)# Register custom preprocessor
engine.register_preprocessor(
"my_scaler",
my_scaler_factory,
description="Custom scaling strategy"
)
# Register custom model
engine.register_model(
"my_model",
my_model_factory,
description="Custom model implementation"
)# Install development dependencies
uv sync --group dev
# Run tests
uv run pytest
# Format code
uv run ruff format src/
# Type checking
uv run mypy src/
# Build documentation
uv sync --group docs
uv run mkdocs serveComing soon: Performance comparisons with AutoGluon, TPOT, and H2O AutoML.
Contributions are welcome! Please read our contributing guidelines and code of conduct.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with love using cutting-edge technologies:
- Optuna - Hyperparameter optimization framework
- XGBoost, LightGBM, CatBoost - Gradient boosting frameworks
- Ray - Distributed computing
- SHAP - Model explainability
- Rich - Beautiful terminal UI
- Pydantic - Data validation
- uv - Blazing fast Python package manager
For questions, issues, or suggestions, please open an issue on GitHub.
Made with โค๏ธ and over-engineering