A machine learning-based system that analyzes health parameters to predict the risk of heart disease.
This project focuses on analyzing and predicting heart disease risk using healthcare data. It covers the full data pipeline from raw dataset → cleaned dataset → analysis → machine learning model.
-
File:
Data_Heart Problem_risk.csv -
Contains raw medical and lifestyle data
-
May include:
- Missing values
- Inconsistent entries
- Noise
-
File:
Data_Heart_Problem_Risk_Cleaned.csv -
Data preprocessing steps include:
- Handling missing values
- Removing duplicates
- Encoding categorical variables
- Feature scaling (if applied)
-
File:
Heart_Problem_Analysis.py -
Includes:
- Exploratory Data Analysis (EDA)
- Data visualization
- Statistical insights
- Identification of key risk factors
-
File:
Heart_Problem-ML.ipynb -
Contains:
- Data loading
- Feature selection
- Model training
- Model evaluation
- Logistic Regression
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- Analyze heart disease risk factors
- Perform data visualization and insights
- Build predictive machine learning models
- Evaluate model performance and accuracy
pip install pandas numpy matplotlib seaborn scikit-learn jupyterpython Heart_Problem_Analysis.pyjupyter notebook Heart_Problem-ML.ipynb- Load raw dataset
- Clean and preprocess data
- Perform data analysis
- Train machine learning models
- Evaluate results
- This project is for educational purposes only
- Not intended for medical diagnosis
- Consult professionals for real-world medical decisions
- Hyperparameter tuning
- Model comparison dashboard
- Deployment using Flask/Streamlit
- Integration with real-time data
Archita J Laxman
This project is licensed under the MIT License.