A Flask-based data science application for statistical outlier detection, visualization, and cleaning.
The app provides interactive insights into normal distribution analysis using the Empirical Rule (68-95-99.7) and applies Winsorization for robust outlier treatment.
- Outlier Detection (Z-Score Method):
- Identifies extreme values beyond ±3σ using normal distribution properties.
- Provides statistical insights (Mean, Standard Deviation, Standard Error).
- Interactive PDF Plot:
- Visualizes probability density function with highlighted outliers.
- Built using Plotly for dynamic graph rendering.
- Winsorization (Robust Data Cleaning):
- Applies 5% Winsorization on dataset for treating extreme values.
- Displays original vs cleaned data side by side.
- Web Application (Flask):
- Clean UI with separate pages for analysis and cleaned dataset view.
- Flask → Web framework for routing and UI integration.
- Pandas → Data manipulation and analysis (
outlier.csv,dataset.csv). - NumPy → Numerical computations, percentiles, and array processing.
- SciPy (stats) → Normal distribution (PDF), SEM, and Z-score calculation.
- Plotly Graph Objects → Interactive probability density function plots.
- Plotly I/O → Conversion of plots into HTML for Flask rendering.
- OS Module → File handling and dataset existence checks.
- Load dataset (
outlier.csv). - Compute statistics → Mean, Standard Deviation, Standard Error.
- Apply Empirical Rule (68-95-99.7) to interpret spread.
- Detect outliers using Z-Score > ±3.
- Visualize via interactive PDF plot with highlighted outliers.
- Load dataset (
dataset.csv) or generate dummy dataset if not found. - Apply 5% Winsorization → clip values below 5th percentile & above 95th percentile.
- Display side-by-side comparison: Original vs Winsorized values.
├── app.py # Main Flask application
├── templates/
│ ├── index.html # Homepage for outlier detection
│ ├── clean_data.html # Winsorized dataset view
├── static/ # CSS, JS, images if required
├── dataset.csv # Input dataset for Winsorization (optional)
├── outlier.csv # Input dataset for outlier detection
└── README.md # Project documentation
-
Clone the repository:
git clone https://github.com/Imswappy/WinsorWeb.git cd WinsorWeb -
Run Flask app:
python app.py
-
Open browser and navigate to:
http://127.0.0.1:5000/
- PDF Plot with Outliers Highlighted
- Table of Outliers
This application combines statistics, visualization, and robust data cleaning into a seamless web interface.
It is particularly useful for data preprocessing, anomaly detection, and exploratory data analysis (EDA) in machine learning pipelines.
