π΄ Bike Sharing Demand Prediction (OLS & Linear Regression) π Project Overview This project focuses on building a robust regression framework to predict bike rental demand using the Bike Sharing dataset. The goal was to develop a model that not only performs well but is also interpretable, enabling clear insights into the factors influencing demand. We combined statistical modeling (OLS) with machine learning (Linear Regression) to strike a balance between explainability and predictive performance.
π― Problem Statement Accurately predicting bike rental demand is critical for:
Optimizing inventory and availability
Improving operational efficiency
Enhancing customer experience
This project aims to model demand patterns using historical data and uncover key drivers behind rental behavior.
π§ Workflow & Methodology
- Data Preprocessing
Converted and processed date features
Extracted meaningful variables (e.g., month, temporal indicators)
Identified categorical vs numerical features
Handled missing values and ensured data consistency
- Feature Engineering
Applied One-Hot Encoding to categorical variables β Enabled models to interpret non-numeric categories
Scaled numerical features β Ensured uniform feature contribution and improved model stability
- Train-Test Split
Split dataset into training and testing sets
Ensured unbiased model evaluation and generalization
- OLS Regression (Statistical Modeling)
Built an initial Ordinary Least Squares (OLS) model
Used p-values to evaluate feature significance
Iteratively removed insignificant variables
Addressed multicollinearity (reduced condition number)
π Statistical Validation
Residual analysis performed to validate:
Linearity
Normality of residuals
Homoscedasticity
This step ensured the model was statistically sound and interpretable.
- Linear Regression (Machine Learning)
Built a sklearn Linear Regression model using selected features
Focused on predictive performance rather than inference
π Performance
RΒ² Score β 0.76 β Strong explanatory power
Evaluated using absolute error metrics
Model performs well for most observations, with some larger deviations
π Key Insights
Demand is strongly influenced by:
π¦οΈ Weather conditions
π Seasonality
ποΈ Temporal patterns (weekends vs working days)
Proper feature selection significantly improves model stability
Removing multicollinearity enhances interpretability
π Error Analysis
Residuals show:
Mild heteroscedasticity
Slight skewness
π Interpretation:
Model captures overall trends well
Some complex/non-linear patterns remain unmodeled
Larger errors occur in edge cases
βοΈ Model Strengths
β Interpretable (thanks to OLS)
β Good predictive performance (Linear Regression)
β Statistically validated
β Handles multicollinearity effectively
β Real-world aligned insights
Assumes linear relationships
Does not fully capture:
Non-linear patterns
Feature interactions
Some prediction errors persist for extreme cases
π Future Improvements
Add interaction terms
Apply non-linear transformations
Experiment with advanced models:
Ridge Regression
Lasso Regression
Random Forest / Tree-based models
Incorporate additional features:
Holidays
External/environmental factors
π Conclusion This project delivers a strong baseline model for predicting bike rental demand. It successfully combines:
Rigorous statistical analysis
Thoughtful feature engineering
Practical machine learning implementation
The result is a model that balances simplicity, interpretability, and performance, making it a solid foundation for further enhancements and real-world deployment.
π οΈ Tech Stack
Python π
Pandas & NumPy
Matplotlib & Seaborn
Statsmodels (OLS)
Scikit-learn