Customer Churn Prediction MLOps Pipeline

A production-grade machine learning operations (MLOps) pipeline for predicting customer churn, featuring automated training, experiment tracking, model registry, and real-time inference through a web application.

Overview

This project implements an end-to-end MLOps solution for customer churn prediction. It demonstrates industry best practices for:

Reproducible ML Pipelines: Orchestrated workflows with ZenML
Experiment Tracking: Comprehensive logging with MLflow on DagsHub
Model Versioning: Centralized model registry for governance
Automated Deployment: Quality-gated production deployments
Real-time Inference: Interactive web application for predictions

Key Features

Automated data validation and preprocessing
Multiple model training with hyperparameter optimization
Quality gates ensuring only high-performing models reach production
Real-time single and batch predictions
Model performance monitoring and drift detection capabilities

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              MLOps Pipeline Architecture                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌────────────┐ │
│  │   Data       │───▶│   Feature    │───▶│   Model      │───▶│  Model     │ │
│  │   Ingestion  │    │   Engineering│    │   Training   │    │  Evaluation│ │
│  └──────────────┘    └──────────────┘    └──────────────┘    └────────────┘ │
│         │                   │                   │                   │        │
│         ▼                   ▼                   ▼                   ▼        │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                         ZenML Orchestration                              ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│         │                   │                   │                   │        │
│         ▼                   ▼                   ▼                   ▼        │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                    MLflow Experiment Tracking (DagsHub)                  ││
│  │         Parameters │ Metrics │ Artifacts │ Model Registry                ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│                                      │                                       │
│                                      ▼                                       │
│                        ┌──────────────────────────┐                         │
│                        │   Quality Gate (≥85%)    │                         │
│                        └──────────────────────────┘                         │
│                                      │                                       │
│                          ┌───────────┴───────────┐                          │
│                          ▼                       ▼                          │
│                   ┌────────────┐          ┌────────────┐                    │
│                   │   Deploy   │          │   Reject   │                    │
│                   └────────────┘          └────────────┘                    │
│                          │                                                   │
│                          ▼                                                   │
│                   ┌────────────────────────────────┐                        │
│                   │   Streamlit Web Application    │                        │
│                   │   (Real-time Predictions)      │                        │
│                   └────────────────────────────────┘                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Technology Stack

Category	Technology	Purpose
ML Framework	scikit-learn	Model training and inference
Pipeline Orchestration	ZenML	ML workflow management
Experiment Tracking	MLflow	Metrics, parameters, and artifact logging
Model Registry	MLflow (DagsHub)	Model versioning and governance
Data Processing	Pandas, NumPy	Data manipulation and analysis
Web Application	Streamlit	Interactive prediction interface
Cloud Platform	DagsHub	MLflow hosting and collaboration
Deployment	Streamlit Cloud	Application hosting
Version Control	Git, DVC	Code and data versioning

Project Structure

churn-pipeline/
├── app.py                      # Streamlit web application
├── run_pipeline.py             # Main pipeline entry point
├── run_experiments.py          # Experiment runner for model comparison
├── requirements.txt            # Python dependencies
│
├── pipelines/
│   ├── trainning_pipeline.py   # Training pipeline definition
│   ├── deployement_pipeline.py # Deployment pipeline with quality gates
│   └── inference_pipeline.py   # Batch inference pipeline
│
├── steps/
│   ├── ingest_data.py          # Data ingestion step
│   ├── clean_data.py           # Data preprocessing step
│   ├── train_model.py          # Model training step
│   ├── evaluate_model.py       # Model evaluation step
│   ├── deployment_steps.py     # Deployment-specific steps
│   └── config.py               # Model configurations
│
├── src/
│   ├── ingest_util.py          # Data ingestion utilities
│   ├── clean_util.py           # Data cleaning utilities
│   ├── model_util.py           # Model training utilities
│   └── evaluation_util.py      # Evaluation metrics utilities
│
├── data/
│   └── customer_churn_dataset.zip
│
├── models/                     # Local model artifacts
├── mlruns/                     # Local MLflow tracking (development)
│
├── analysis/
│   └── churn_prediction.ipynb  # Exploratory data analysis
│
└── .streamlit/
    └── secrets.toml            # Streamlit secrets (not in git)

Installation

Prerequisites

Python 3.10+
pip or conda package manager
Git

Setup

Clone the repository

git clone https://github.com/Amanuel-1/churn-pipeline.git
cd churn-pipeline

Create and activate virtual environment

python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Initialize ZenML
```
zenml init
```
Configure DagsHub authentication (for remote tracking)
```
export DAGSHUB_USER_TOKEN="your-token"
```

Pipeline Components

Training Pipeline

The training pipeline handles model development and experimentation. It ingests raw data, validates and preprocesses it, trains the specified model, evaluates performance metrics, and logs everything to MLflow for tracking and comparison.

Deployment Pipeline

The deployment pipeline includes quality gates to ensure only high-performing models reach production. Models must meet a minimum accuracy threshold (default 85%) before being registered in the model registry and deployed.

Supported Models

Model	Configuration Key	Default Hyperparameters
Random Forest	`RandomForest`	n_estimators=100, max_depth=None
Logistic Regression	`LogisticRegression`	C=1.0, max_iter=100
Gradient Boosting	`GradientBoosting`	n_estimators=100, learning_rate=0.1
Support Vector Machine	`SVM`	C=1.0, kernel='rbf'

Usage

Running the Training Pipeline

# Train with default settings (Gradient Boosting)
python run_pipeline.py --mode train

# Train with specific model
python run_pipeline.py --mode train --model RandomForest

Running Experiments

Compare multiple models and hyperparameter configurations:

python run_experiments.py

This executes predefined experiments including:

Random Forest (baseline, deep trees, shallow trees)
Logistic Regression (baseline, high regularization)
Gradient Boosting (baseline, slow learner, fast learner)

Deploying a Model

# Deploy with default 85% accuracy threshold
python run_pipeline.py --mode deploy

# Deploy with custom threshold
python run_pipeline.py --mode deploy --min-accuracy 0.90

Running Inference

python run_pipeline.py --mode inference

Launching the Web Application

streamlit run app.py

Model Registry

Models are registered and versioned in MLflow hosted on DagsHub:

Registry URL: https://dagshub.com/Amanuel-1/churn-pipeline.mlflow
Model Name: churn_predictor_model

Model Lifecycle

Training: Models are trained and logged with metrics
Evaluation: Performance is assessed against quality thresholds
Registration: Passing models are registered in the model registry
Deployment: Registered models are deployed to production

Deployment

Streamlit Cloud Deployment

The application is deployed on Streamlit Cloud with the following configuration:

Repository: Connected to GitHub repository
Main file: app.py
Requirements: requirements-streamlit.txt

Environment Variables

Configure the following secrets in Streamlit Cloud:

Variable	Description
`DAGSHUB_USER_TOKEN`	DagsHub authentication token
`DAGSHUB_USERNAME`	DagsHub username

Live Application

Access the deployed application: https://churn-pipeline-grcgpc5y4pu5glea3r2fwr.streamlit.app/

API Reference

Prediction Input Features

Feature	Type	Description	Range
Gender	Categorical	Customer gender	Male, Female
Age	Integer	Customer age	18-80
Tenure	Integer	Months as customer	1-60
Usage Frequency	Integer	Monthly usage count	1-30
Support Calls	Integer	Support tickets raised	0-10
Payment Delay	Integer	Days of payment delay	0-30
Subscription Type	Categorical	Plan type	Basic, Standard, Premium
Contract Length	Categorical	Contract duration	Monthly, Quarterly, Annual
Total Spend	Float	Total amount spent ($)	0-10000
Last Interaction	Integer	Days since last interaction	1-30

Prediction Output

Field	Type	Description
Prediction	String	"Churn" or "No Churn"
Churn Probability	Float	Probability score (0.0 - 1.0)
Risk Factors	List	Identified risk factors for the customer

Metrics and Monitoring

Model Performance Metrics

Accuracy: Overall prediction correctness
Precision: True positive rate among positive predictions
Recall: True positive rate among actual positives
F1 Score: Harmonic mean of precision and recall
ROC-AUC: Area under the receiver operating characteristic curve

Experiment Tracking

All experiments are tracked in MLflow with hyperparameters, performance metrics, model artifacts, and training metadata.

View experiments: MLflow Dashboard

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add unit tests for new functionality
Update documentation as needed
Ensure all pipelines pass before submitting PR

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

ZenML for ML pipeline orchestration
MLflow for experiment tracking
DagsHub for MLflow hosting
Streamlit for the web application framework

Author: Amanuel
Contact: GitHub
Project Link: https://github.com/Amanuel-1/churn-pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
analysis		analysis
extracted_data		extracted_data
pipelines		pipelines
src		src
steps		steps
.gitignore		.gitignore
README.md		README.md
TODO.txt		TODO.txt
__init__.py		__init__.py
app.py		app.py
requirements-streamlit.txt		requirements-streamlit.txt
requirements.txt		requirements.txt
run_experiments.py		run_experiments.py
run_pipeline.py		run_pipeline.py
test_evaluate_model.py		test_evaluate_model.py

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction MLOps Pipeline

Table of Contents

Overview

Key Features

Architecture

Technology Stack

Project Structure

Installation

Prerequisites

Setup

Pipeline Components

Training Pipeline

Deployment Pipeline

Supported Models

Usage

Running the Training Pipeline

Running Experiments

Deploying a Model

Running Inference

Launching the Web Application

Model Registry

Model Lifecycle

Deployment

Streamlit Cloud Deployment

Environment Variables

Live Application

API Reference

Prediction Input Features

Prediction Output

Metrics and Monitoring

Model Performance Metrics

Experiment Tracking

Contributing

Development Guidelines

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages