OPEN ENV - Resturant Dispatch RL ENV

Overview

resturant-dispatch-env is a deterministic reinforcement learning environment designed to simulate restaurant operations under peak-hour conditions. The agent is responsible for dispatching incoming orders while balancing service quality, urgency, and kitchen capacity.

This environment is suitable for policy evaluation, benchmarking, and studying decision-making under constrained, multi-objective scenarios.

Motivation

Restaurant dispatch is a constrained optimization problem:

Delays increase the likelihood of customer churn
Over-prioritizing urgent orders can leave high-value orders unfinished
Excessive workload reduces kitchen efficiency

This environment captures these competing objectives in a reproducible and controlled setting.

Environment Design

State Representation

Each order contains:

{
  "id": 1,
  "status": "pending | processing | done | failed",
  "prep_time": 3,
  "priority": "low | medium | high",
  "wait_time": 0,
  "patience": 5,
  "is_vip": false
}

Global state includes:

step_count, max_steps
max_kitchen_capacity
total_completed, total_failed
overload_penalty_counter
deterministic rush-hour configuration flags

Action Space

{
  "action_type": "process | prioritize | idle",
  "order_id": 7
}

process: Start a pending order if capacity allows
prioritize: Accelerate or rescue a specific order
idle: Take no action for the current step

Environment Dynamics

At each step:

Rush orders may be injected deterministically (hard mode)
Agent action is applied
Pending orders increase wait_time
Orders exceeding patience (wait_time > patience) become failed
Processing orders reduce prep_time; completed when prep_time <= 0
Exceeding capacity incurs overload penalties

Decision-Making Challenges

The agent must balance:

Urgency: Orders close to expiration require immediate attention
Priority value: High-priority and VIP orders yield higher rewards
Capacity control: Overloading reduces long-term performance

A missed-opportunity penalty is applied when urgent orders exist but are ignored.

Reward Function

Each step starts with a reward of 0, then applies:

+1.0 for order completion
+0.5 for high-priority completion
+0.3 for VIP completion
+0.4 for saving an urgent order
+0.2 for a valid processing action
-1.0 for failed order
-0.5 for overload event
-0.3 for ignoring urgent orders
-0.2 for invalid action
-0.02 step penalty

Final reward is clamped to the range [-1.0, 1.0].

Task Configurations

Easy

4 orders
Kitchen capacity: 2
Moderate patience

Medium

10 orders
Mixed priorities and VIP distribution
Tighter patience constraints

Hard

20 base orders
Kitchen capacity: 2
Low patience range (2–5)
High density of priority and VIP orders
Deterministic rush injection at step 5
High failure risk under poor policies

Example Episode

Step 1: Process a high-value order → positive reward
Step 2: Urgent queue appears; incorrect action → penalty
Step 3: Order exceeds patience → strong negative reward
Step 4: Rescue + completion → positive reward

Example reward trajectory:

0.20 → -0.30 → 0.50 → -1.00

Evaluation Method

Final score is computed as:

score = (completed / total) 
      - (failed / total) * 0.7 
      - (overload_penalty_counter * 0.05)

Score is clamped to the range [0, 1].

API Usage

Endpoints

POST /reset
```
{ "task": "easy | medium | hard" }
```

POST /step

{ "action_type": "process | prioritize | idle", "order_id": 1 }

GET /state
GET /health

Example Requests

curl -X POST http://localhost:8000/reset \
-H "Content-Type: application/json" \
-d '{"task":"hard"}'

curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action_type":"process","order_id":1}'

curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"action_type":"prioritize","order_id":2}'

Key Properties

Deterministic and reproducible environment
Multi-objective reward design with both incentives and penalties
Realistic trade-offs between urgency, value, and capacity
Hard mode introduces dynamic demand spikes and strict constraints

Use Cases

Reinforcement learning experimentation
Policy benchmarking and comparison
Simulation of constrained decision systems
Academic and research-oriented environments

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
openenv		openenv
scripts		scripts
server		server
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference.py		inference.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OPEN ENV - Resturant Dispatch RL ENV

Overview

Motivation

Environment Design

State Representation

Action Space

Environment Dynamics

Decision-Making Challenges

Reward Function

Task Configurations

Easy

Medium

Hard

Example Episode

Evaluation Method

API Usage

Endpoints

Example Requests

Key Properties

Use Cases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OPEN ENV - Resturant Dispatch RL ENV

Overview

Motivation

Environment Design

State Representation

Action Space

Environment Dynamics

Decision-Making Challenges

Reward Function

Task Configurations

Easy

Medium

Hard

Example Episode

Evaluation Method

API Usage

Endpoints

Example Requests

Key Properties

Use Cases

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages