Depression among students is a serious concern affecting academic performance and overall well-being.
This project analyzes student data, visualizes patterns using EDA, and builds predictive models to identify students at risk. Models such as Logistic Regression, Random Forest, SVM, Decision Tree, KNN, Naive Bayes, Gradient Boosting, AdaBoost, Bagging, and XGBoost are evaluated to find the most effective approach.
Kaggle Notebook: Student Depression Prediction
- Analyze key factors contributing to student depression
- Build predictive ML models to classify depression risk
- Perform EDA to understand feature relationships
- Evaluate model performance using accuracy, precision, recall, and F1-score
- Identify the best-performing machine learning model for prediction
- Exploratory Data Analysis (EDA) including:
- Sleep duration vs depression
- Gender vs depression
- Financial stress vs suicidal thoughts
- Work/study hours vs depression
- Academic pressure vs depression
- Pair plot of all features
- Multiple ML models for classification
- Performance metrics for each model: accuracy, precision, recall, F1-score
-
Python
-
Pandas & NumPy (Data manipulation)
-
Matplotlib & Seaborn (Visualization)
-
Scikit-learn (Machine Learning models & evaluation)
-
XGBoost & AdaBoost (Advanced boosting techniques)
-
Jupyter Notebook (Development environment)
-
Source: Student Depression Dataset
-
Dataset Name: Student Depression Dataset
-
Number of Samples: 5,581
-
Target Variable: Depression (0 = Not Depressed, 1 = Depressed)
-
Features Include:
- Sleep duration
- Gender
- Financial stress
- Work/study hours
- Academic pressure
- Suicidal thoughts
| Model | Accuracy |
|---|---|
| Logistic Regression | 0.8364 |
| Random Forest | 0.8319 |
| Support Vector Machine | 0.8344 |
| Decision Tree | 0.8034 |
| K-Nearest Neighbors | 0.8169 |
| Naive Bayes | 0.8212 |
| Gradient Boosting | 0.8375 |
| AdaBoost | 0.8389 |
| Bagging Classifier | 0.8233 |
| XGBoost | 0.8208 |
- Best Accuracy Achieved: AdaBoost (83.89%)
- Classification reports for all models show high precision, recall, and F1-scores for both depressed and non-depressed classes.
- EDA:
- Sleep duration vs depression
- Gender vs depression
- Financial stress vs suicidal thoughts
- Work/study hours vs depression
- Academic pressure vs depression
- Pair plot of all features
-
Confusion Matrix:
-
Logistic Regression
- Random Forest
- SVM
- Decision Tree
- KNN
- Naive Bayes
- Gradient Boosting
- AdaBoost
- Bagging
- XGBoost
Muqadas Ejaz
BS Computer Science (AI Specialization)
AI/ML Engineer
Data Science & Gen AI Enthusiast
π« Connect with me on LinkedIn
π GitHub: github.com/muqadasejaz
π¬ Kaggle: Kaggle Profile
This project is open-source and available under the MIT License.
β If you find this project useful, donβt forget to star the repository!