Global Commodity Shocks, International Trade Linkages, and Economic Resilience: Causal Impacts and Predictive Modelling of Sectoral Stress
This repository investigates how global commodity shocks—including energy, food, and metal price volatility—propagate through international trade networks and affect sectoral stress and economic resilience. By analyzing trade linkages with major partners such as the U.S., China, the EU, and Gulf economies, the study identifies causal pathways through which external shocks impact agriculture, manufacturing, energy-intensive industries, and exports. Predictive modeling techniques are applied to quantify vulnerabilities and assess resilience under different shock scenarios.
Global commodity volatility can create cascading effects in domestic economies, especially through trade networks. This project integrates high-frequency commodity price data, detailed bilateral trade flows, causal inference methods, and predictive machine learning models to:
- Understand how external shocks transmit through trade linkages.
- Forecast sector-level vulnerabilities using time-series neural networks and gradient boosting.
- Develop policy-relevant resilience metrics for key economic sectors.
Unlike prior work, which often focuses on aggregate effects or isolated shocks, this study combines multiple data dimensions to provide a granular, integrated view of shock propagation and sectoral resilience.
- Global commodity prices (energy, food, metals): World Bank Pink Sheet
- Bilateral trade flows: UN Comtrade
- Sectoral output and value-added: MOSPI
- Exchange rates and macroeconomic controls: RBI database
- Partner-country macro indicators: OECD.Stat and IMF Direction of Trade Statistics
-
Data Collection and Preprocessing
- Aggregate commodity prices, trade flows, and sectoral output.
- Identify major trade partners and sector-specific exposures.
-
Causal Inference
- Use instrumental variables and synthetic control methods to identify causal transmission channels.
- Map the effect of commodity shocks on sectoral performance.
-
Predictive Modeling
- Apply machine learning methods such as tree-based models and gradient boosting to forecast sectoral stress.
- Evaluate model performance using standard metrics (RMSE, MAE, R², MAPE).
-
Resilience Assessment
- Develop quantitative metrics to assess sectoral resilience under varying shock scenarios.
- Analyze which sectors are most vulnerable and where trade linkages amplify or dampen shocks.
A consolidated results document for the "Global Commodity Shocks and Trade Networks" project summarizes all findings, diagnostics, causal estimates, and visualizations.
Key High-Level Takeaways:
- Team: StatGeeks (Aaron T. Mathew, Preetham VJ, Akarsh T, Anirudh K) — Date: November 2025
- Core Finding: Structural distribution shifts (pre-/post-2020 COVID and Ukraine shock) materially affect predictive generalization; causal and network analyses identify critical chokepoints (Petroleum, Trade) and policy-targetable vulnerable sectors.
- Dataset: Unified master dataset (3,476 rows × 93 variables), production network (~131 sectors, ~3,401 edges).
- Sprint 1 (Data & Network): Constructed complete I-O derived network with technical coefficients, Leontief inverse matrices, and centrality measures.
- Sprint 2 (Causal Analysis): IV analysis confirmed energy shocks cause ~4.81% IIP decline (10% price increase); identified network bottlenecks via Betweenness Centrality and shock multiplier effects (>2x systemic amplification).
- Sprint 3 (Modeling & Policy): Tree-based models (tuned Random Forest / XGBoost) proved most robust to distribution shifts; Causal ML identified heterogeneous treatment effects across sectors; policy recommendations target Top 5–7 vulnerable sectors for maximum marginal benefit.
For full methodological details, statistical tables, and comprehensive figures, consult the consolidated "Project Results Document" in the deliverables folder or project archive.
global-trade-shocks-analysis/
│
├── README.md
├── requirements.txt
├── .gitignore
│
├── data/
│ ├── raw/ # Original downloaded data (never modify)
│ │ ├── CMO-Historical-Data-Monthly.xlsx
│ │ ├── IMTSTrade.csv
│ │ ├── WITS-Partner.xlsx
│ │ ├── IndexofIndustrialProduction.xlsx
│ │ ├── WholesalePriceIndexMonthlyData.xlsx
│ │ ├── GDP_Constant.xlsx
│ │ ├── GDP_Current.xlsx
│ │ ├── GVA_Current.xlsx
│ │ └── OECD_file.csv
│ │
│ ├── processed/ # Cleaned, transformed data
│ │ ├── proc_cmo_monthly.csv # Commodity prices with shocks
│ │ ├── climate_oni_clean.csv # Climate indices (ONI)
│ │ ├── trade_india_bilateral.csv # Bilateral trade flows
│ │ ├── country_mapping.csv # ISO3 codes and regions
│ │ ├── iso_dataset_enriched.csv # Trade data with ISO codes
│ │ ├── iip_sectoral.csv # Industrial production indices
│ │ ├── wpi_inflation.csv # Wholesale price inflation
│ │ ├── gdp_quarterly.csv # GDP with growth rates
│ │ ├── global_macro.csv # OECD G20 data
│ │ ├── MOSPI Matrix Final - ALL.csv # Input-Output matrix
│ │ ├── MOSPI_Cleaned_non_matrix.xlsx # I-O non-matrix data
│ │ ├── master_dataset.csv # Complete merged dataset
│ │ ├── master_dataset_filtered.csv # Filtered (2010-2024)
│ │ ├── full_ml_dataset.csv # ML-ready dataset with engineered features
│ │ └── master_dataset_columns.csv # Metadata
│ │
│ ├── processed_io_data/ # Network analysis outputs
│ │ ├── technical_coefficients.csv
│ │ ├── leontief_inverse.csv
│ │ ├── production_network_nodes.csv
│ │ ├── production_network_edges.csv
│ │ └── network_metrics.csv
│ │
│ ├── external/ # Third-party datasets (if any)
│ └── data-dictionary.md # Data documentation
│
├── networks/ # Network graph objects
│ ├── trade_network_full.gpickle
│ ├── trade_network_full.graphml
│ ├── trade_network_energy.gpickle
│ ├── trade_network_energy.graphml
│ ├── trade_network_food.gpickle
│ ├── trade_network_food.graphml
│ ├── trade_network_metals.gpickle
│ ├── trade_network_metals.graphml
│ ├── production_network.gpickle
│ ├── production_network.graphml
│ ├── centrality_degree.csv
│ ├── centrality_betweenness.csv
│ ├── centrality_closeness.csv
│ ├── centrality_eigenvector.csv
│ ├── centrality_pagerank.csv
│ ├── centrality_all.csv
│ ├── network_topology_metrics.csv
│ ├── commodity_network_stats.csv
│ └── trade_network.gephi # Gephi project file
│
├── src/ # Source code (Python scripts)
│ ├── __init__.py
│ │
│ ├── data_collection/
│ │ ├── __init__.py
│ │ └── download_worldbank.py # World Bank data fetcher
│ │
│ ├── data_processing/
│ │ ├── __init__.py
│ │ ├── clean_data.py # Complete data cleaning pipeline
│ │ ├── create_master_dataset.py # Master dataset creation
│ │ ├── clean_commodity_prices.py # Commodity price cleaning
│ │ └── README.md # Data processing documentation
│ │
│ ├── network_analysis/
│ │ ├── __init__.py
│ │ ├── process_io_table.py # I-O table processing & network metrics
│ │ ├── build_trade_network.py # Trade network construction
│ │ └── visualize_networks.py # Network visualization utilities
│ │
│ ├── causal_inference/
│ │ ├── __init__.py
│ │ ├── instrumental_variables.py
│ │ ├── synthetic_control.py
│ │ ├── var_granger.py
│ │ └── causal_utils.py
│ │
│ ├── feature_engineering/
│ │ ├── __init__.py
│ │ ├── extract_network_features.py
│ │ ├── create_lag_features.py
│ │ ├── create_volatility_features.py
│ │ ├── create_shock_indicators.py
│ │ ├── create_interaction_features.py
│ │ └── feature_selection.py
│ │
│ ├── models/
│ │ ├── __init__.py
│ │ ├── baseline_models.py
│ │ ├── lstm_model.py
│ │ ├── xgboost_model.py
│ │ ├── gnn_model.py
│ │ ├── ensemble_model.py
│ │ ├── model_evaluation.py
│ │ └── model_utils.py
│ │
│ ├── scenario_analysis/
│ │ ├── __init__.py
│ │ ├── historical_scenarios.py
│ │ ├── counterfactual_scenarios.py
│ │ ├── policy_interventions.py
│ │ └── vulnerability_index.py
│ │
│ ├── visualization/
│ │ ├── __init__.py
│ │ ├── plot_networks.py
│ │ ├── plot_causal_results.py
│ │ ├── plot_model_results.py
│ │ ├── plot_scenarios.py
│ │ └── viz_utils.py
│ │
│ └── dashboard/
│ ├── __init__.py
│ ├── app.py # Main Streamlit app
│ ├── pages/
│ │ ├── 1_Home.py
│ │ ├── 2_Networks.py
│ │ ├── 3_Predictions.py
│ │ └── 4_Scenarios.py
│ └── components/
│ ├── __init__.py
│ ├── network_viz.py
│ ├── prediction_viz.py
│ └── scenario_viz.py
│
├── notebooks/ # Jupyter notebooks for exploration & development
│ ├── README.md # Notebook overview and usage guide
│ ├── s1_DataCleaning.ipynb # Sprint 1: Data cleaning and EDA
│ │ └── Purpose: Exploratory cleaning steps, outlier handling, temporal alignment.
│ │ Outputs: Insights fed into src/data_processing/clean_data.py
│ │
│ ├── s1_IOTableProcessing.ipynb # Sprint 1: I-O table processing
│ │ └── Purpose: Technical coefficients, Leontief inverse, network metrics (degree, betweenness, PageRank).
│ │ Outputs: Network CSV exports, feed into src/network_analysis/process_io_table.py
│ │
│ ├── s1_CreateMasterDataset.ipynb # Sprint 1: Master dataset creation & feature engineering
│ │ └── Purpose: Merge all processed data, I-O sector mapping (22 manufacturing sectors),
│ │ derive interaction terms, lagged variables for econometric analysis.
│ │ Outputs: data/processed/master_dataset.csv (3,476 rows × 93 cols)
│ │
│ ├── s2_CausalAnalysis.ipynb # Sprint 2: Causal inference (IV, Synthetic Control, VAR)
│ │ └── Purpose: Instrumental Variables (2SLS) with ONI & OPEC quotas;
│ │ Synthetic Control for shock events (2008, 2014, 2022);
│ │ VAR/Granger Causality & Impulse Response Analysis.
│ │ Outputs: Causal estimates, IRF plots, robustness checks
│ │
│ ├── s2_NetworkDynamics.ipynb # Sprint 2: Network resilience & bottleneck analysis
│ │ └── Purpose: Shock propagation simulations, centrality-vulnerability linkages,
│ │ production network dynamics under targeted sector failures.
│ │ Outputs: Shock multiplier estimates, network robustness metrics
│ │
│ ├── s3_FeatureEngineering.ipynb # Sprint 3: Advanced feature engineering
│ │ └── Purpose: Create lag features, volatility measures, shock indicators,
│ │ interaction terms; dimensionality reduction (150+ → 50 features).
│ │ Outputs: Feature importance rankings, engineered datasets
│ │
│ ├── s3_TreeBasedModels.ipynb # Sprint 3: Tree-based predictive models
│ │ └── Purpose: End-to-end ML pipeline:
│ │ - Target capping (2σ outlier handling)
│ │ - Train/test split diagnostics (temporal coherence)
│ │ - Feature scaling (StandardScaler)
│ │ - Baseline models (Mean, Linear Regression)
│ │ - Random Forest baseline & hyperparameter tuning (RandomizedSearchCV)
│ │ - XGBoost baseline & tuning with early stopping
│ │ - Feature importance analysis (RF vs XGB comparison)
│ │ - Weighted ensemble optimization
│ │ - Comprehensive error analysis (sector-level, temporal, residuals)
│ │ - Distribution shift diagnostics (KS test, train vs test)
│ │ Outputs: Model artifacts (pkl), comparison tables, diagnostic plots
│ │
│ └── s3_CausalML.ipynb # Sprint 3: Causal Machine Learning (Heterogeneous Effects)
│ └── Purpose: Causal Forests, R-learner, S-learner for heterogeneous treatment effects;
│ vulnerability classification; policy targeting optimization.
│ Outputs: CATE distributions, policy benefit frontier
│
├── models/ # Saved trained models
│ ├── baseline_ols.pkl
│ ├── baseline_rf.pkl
│ ├── lstm_energy.h5
│ ├── lstm_manufacturing.h5
│ ├── lstm_agriculture.h5
│ ├── lstm_services.h5
│ ├── lstm_exports.h5
│ ├── xgboost_main.pkl
│ ├── xgboost_tuned.pkl
│ ├── gnn_production.pt
│ ├── gnn_trade.pt
│ ├── ensemble_stacked.pkl
│ └── model_metadata.json
│
├── outputs/ # All output files (models, figures, tables)
│ ├── models/ # Trained model artifacts
│ │ ├── linear_regression_baseline.pkl
│ │ ├── random_forest_baseline.pkl
│ │ ├── random_forest_tuned.pkl
│ │ ├── xgboost_baseline.pkl
│ │ └── xgboost_tuned.pkl
│ │
│ ├── figures/ # Publication-quality visualizations
│ │ ├── target_distribution_analysis.png
│ │ │ └── Raw vs 2σ-capped histograms & boxplots
│ │ ├── feature_importance_comparison.png
│ │ │ └── Side-by-side top-30 features: Random Forest vs XGBoost
│ │ ├── model_comparison.png
│ │ │ └── Multi-panel comparison (R², RMSE, MAE, MAPE) across all models
│ │ ├── sector_predictions.png
│ │ │ └── Time-series actual vs predicted for top-5 sectors
│ │ ├── sector_error_analysis.png
│ │ │ └── Top sectors by MAE & error vs sample size
│ │ ├── temporal_error_analysis.png
│ │ │ └── MAE and bias trends over time (year-quarter)
│ │ ├── residual_diagnostics.png
│ │ │ └── Residuals vs predicted, histogram + normal overlay, Q-Q, time series
│ │ ├── distribution_shift_analysis.png
│ │ │ └── Train vs Test overlapping histograms, boxplots, CDFs
│ │ └── (additional sector-specific and network plots as generated)
│ │
│ ├── tables/ # CSV, LaTeX, and summary tables
│ │ ├── model_comparison.csv
│ │ │ └── RMSE, MAE, R², MAPE for all models
│ │ ├── feature_importance.csv
│ │ │ └── Feature rankings from RF, XGB, and ensemble
│ │ ├── sector_error_analysis.csv
│ │ │ └── MAE, Std, Max Error, Bias per sector
│ │ ├── temporal_error_analysis.csv
│ │ │ └── Error metrics by year-quarter
│ │ ├── distribution_shift_summary.csv
│ │ │ └── Train vs Test statistics (mean, std, min, max, KS test)
│ │ └── (additional causal, network, and scenario tables)
│ │
│ └── data_quality/ # Data validation reports
│ ├── commodity_prices_validation.txt
│ ├── trade_data_validation.txt
│ ├── master_dataset_summary.txt
│ └── missing_values_report.csv
│
├── sprint_3_output/ # Sprint 3 experiment-specific outputs
│ ├── target_distribution_analysis.png
│ ├── feature_importance_comparison.png
│ └── (other intermediate or exploratory artifacts)
│
├── sprint3_opts/ # Alternative tuning experiment outputs
│ ├── models/ # Model snapshots from different tuning runs
│ └── (other experiment-specific files)
│
├── docs/ # Documentation
│ ├── data_sources.md
│ ├── data_dictionary.xlsx
│ ├── master_dataset_dictionary.xlsx
│ ├── feature_dictionary.xlsx
│ ├── mospi_io_processing_notes.md
│ ├── methodology_notes.md
│ ├── api_usage_guide.md
│ └── troubleshooting.md
│
├── presentations/ # Presentation materials
│ ├── sprint1_review.pptx
│ ├── sprint2_review.pptx
│ ├── sprint3_review.pptx
│ ├── final_presentation.pptx
│ └── poster.pdf # Optional conference poster
│
├── reports/ # Written reports
│ ├── drafts/
│ │ ├── sprint1_summary.docx
│ │ ├── sprint2_causal_analysis.docx
│ │ └── sprint3_model_results.docx
│ ├── final_report.pdf
│ ├── final_report.docx
│ ├── executive_summary.pdf
│ └── policy_brief.pdf
│
├── tests/ # Unit tests (optional but recommended)
│ ├── __init__.py
│ ├── test_data_processing.py
│ ├── test_network_analysis.py
│ ├── test_models.py
│ └── test_utils.py
│
└── logs/ # Log files
├── data_download.log
├── model_training.log
└── error.log
linear_regression_baseline.pklrandom_forest_baseline.pkl— Baseline Random Forestrandom_forest_tuned.pkl— Best tree-based model (R² ≈ 0.017)xgboost_baseline.pklxgboost_tuned.pkl— Tuned XGBoost (R² ≈ 0.011)
Distribution & Target Analysis:
target_distribution_analysis.png— Raw vs 2σ-capped histograms & boxplotsdistribution_shift_analysis.png— Train vs Test overlap plots, CDF comparison
Model Performance:
model_comparison.png— Multi-metric bar charts (R², RMSE, MAE, MAPE)feature_importance_comparison.png— Top-30 features from RF & XGB side-by-side
Error Diagnostics:
sector_predictions.png— Time-series actual vs predicted for top-5 sectorssector_error_analysis.png— MAE rankings and error vs sample sizetemporal_error_analysis.png— MAE & bias trends by year-quarterresidual_diagnostics.png— Residuals vs predicted, histogram, Q-Q, time series
model_comparison.csv— RMSE, MAE, R², MAPE for all modelsfeature_importance.csv— RF, XGB, and average importance rankingssector_error_analysis.csv— Per-sector MAE, std, bias, sample sizetemporal_error_analysis.csv— Per-quarter MAE, bias, sample sizedistribution_shift_summary.csv— Train/test statistics & KS test results
sprint_3_output/— Experiment-specific artifacts (e.g., target distribution plots)sprint3_opts/models/— Alternative tuning run snapshots
To rebuild all cleaned and processed datasets from raw files:
# Install dependencies
pip install -r requirements.txt
# Run the full cleaning pipeline
python src/data_processing/clean_data.py
# Process I-O tables and calculate network metrics
python src/network_analysis/process_io_table.py
# Create master dataset and export
python src/data_processing/create_master_dataset.pyAll outputs saved to data/processed/ and data/processed_io_data/.
Each notebook is self-contained and documents its purpose in the header:
jupyter notebook notebooks/s3_TreeBasedModels.ipynb
# (or any other notebook)Notebooks import data from data/processed/ and write outputs to outputs/ and/or sprint_3_output/.
- Model comparison:
outputs/tables/model_comparison.csv - Feature importance:
outputs/tables/feature_importance.csv - Model artifacts:
outputs/models/*.pkl - Visualizations:
outputs/figures/
The project identified a significant structural break between training (pre-2020: volatile, COVID-affected) and test (post-2020: recovery) periods, confirmed by Kolmogorov-Smirnov test. This explains why:
- Linear models failed (R² ≈ −36)
- Tree-based models were more robust (tuned RF: R² ≈ 0.017)
- Deep learning (LSTM) overfitted — learned high-volatility patterns that don't apply to stable test period
- Energy shocks: 10% oil price increase → −4.81% IIP (all manufacturing: −8.0%), p < 0.05
- Food shocks: wheat prices showed negative coefficient (−2.55) but not statistically significant (p > 0.05)
- Instruments validated: Sargan-Hansen test p-values > 0.05 (exogeneity confirmed)
- Critical bottlenecks: Petroleum Products, Trade, Electricity (ranked by Betweenness Centrality)
- Shock multiplier: 10% output shock → 2.19x cumulative impact via Leontief propagation
- Scale-free topology: Robust to random failures, vulnerable to targeted attacks on top ~15% central nodes
- Targeting strategy: Policy Benefit Frontier is concave → diminishing returns beyond Top 5–7 sectors
- Expected benefit: Mitigation strategy could preserve 0.46% of aggregate IIP growth during shock
- High-vulnerability sectors: Other Manufacturing, Tobacco, Electrical Equipment
- Most resilient sectors: Motor Vehicles, Pharmaceuticals, Basic Metals
-
Global Supply Chain Reallocation and Shift under Triple Crises: A U.S.-China Perspective
https://arxiv.org/pdf/2508.06828 -
Financial Markets, Financial Institutions, and International Trade: Examining the Causal Links for Indian Economy
https://arxiv.org/pdf/2112.01749 -
The Causal Effects of Commodity Shocks
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5219522 -
Leontief Model & Input-Output Analysis for Supply Chain Shock Propagation
https://mitpress.mit.edu/
This project is provided for educational purposes as part of the ADA course project (UE23AM343AB1).
Last Updated: November 2025