Skip to content

EnergyQuantResearch/Optimal-Energy-System-Scheduling-Combining-Mixed-Integer-Programming-and-Deep-Reinforcement-Learning

Repository files navigation

Optimal Energy System Scheduling Using A Constraint-Aware Reinforcement Learning Algorithm

This repository accompanies the paper Optimal Energy System Scheduling Using A Constraint-Aware Reinforcement Learning Algorithm.

What Is In The Repository

  • MIP_DQN.py Default training entrypoint. It now routes environment interaction through Actor_MIP, which projects actor outputs into the MIP-constrained action space.
  • random_generator_battery.py Single-battery environment used by the default training flow.
  • random_generator_more_battery.py Multi-battery environment variant kept for follow-up experiments.
  • Parameters.py Unit and battery parameter definitions.
  • data/ Historical PV, price, and load time series.

Core Idea

The neural actor still proposes a continuous action. Actor_MIP then solves a mixed-integer surrogate problem over the critic network so that the executed action respects the implemented action-space constraint:

  • power balance within grid exchange limits

The current repository keeps the historical constraint scope and does not add new paper constraints that were not already encoded in code.

Runtime Dependencies

Base training flow:

  • numpy
  • pandas
  • torch

Actor_MIP path:

  • pyomo
  • omlt
  • onnx
  • gurobi

Experiment logging:

  • wandb (optional)

Development tooling:

  • pytest
  • ruff
  • pre-commit

Running The Training Script

The default entrypoint remains:

python MIP_DQN.py

Useful environment overrides for quick checks:

MIP_DQN_RANDOM_SEEDS=1234
MIP_DQN_NUM_EPISODES=1
MIP_DQN_TARGET_STEP=2
MIP_DQN_INITIAL_BUFFER_SIZE=2
MIP_DQN_BATCH_SIZE=1
MIP_DQN_REPEAT_TIMES=1
MIP_DQN_ENABLE_WANDB=0
MIP_DQN_SAVE_NETWORK=0
MIP_DQN_SAVE_RECORDS=0

To bypass the MIP projection path in a lightweight debug run:

MIP_DQN_USE_ACTOR_MIP=0

Development Checks

ruff check .
pytest
pre-commit run --all-files

Common Failure Modes

  • Actor_MIP requires the 'gurobi' solver... The MIP projection path is enabled, but Gurobi is not available in the current environment.
  • wandb is not installed... Training will continue without experiment logging when MIP_DQN_ENABLE_WANDB=0 or wandb is absent.
  • Data file path errors The environments now resolve CSV files relative to the module location, so the script can be launched from outside the repository root.

Citation

The preprint is available here: arXiv:2305.05484

  • Hou Shengren
  • Pedro P. Vergara
  • Edgar Mauricio Salazar
  • Peter Palensky

If you use this repository, please cite the paper or preprint.

About

The Source code for paper "Optimal Energy System Scheduling Combining Mixed-Integer Programming and Deep Reinforcement Learning". Safe reinforcement learning, energy management

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages