Layered LLM-driven multi-agent simulation engine. Define worlds in YAML, recreate social dynamics, pause and intervene mid-simulation.
ๅๅฑ้ฉฑๅจ็ๅคๅฎไฝไปฟ็ๅผๆ ยท LLM agents ยท YAML ๅฎไนไธ็ ยท ๅฏๆๅ / ๅฏๅนฒ้ข / ๅฏๅๆบฏๅๆ
Polisim is a simulation engine for running multi-entity scenarios where LLM agents make decisions inside a structured, observable, interruptible world.
You describe the world in a YAML file (entities, actions, message types, relations), declare a scenario (specific instances + initial state + breakpoints), and write a small Python rules module (how actions become effects). Polisim then:
- Drives a tick-based simulation where each entity's decision can come from an LLM, a deterministic rule, or a random fallback
- Records every event to an append-only log + per-tick snapshots, so any moment is replayable / inspectable
- Lets you pause at any tick, intervene by injecting messages or forcing actions, then resume
- Generates a dual-track analysis report at the end: a deterministic Phase A track (turning points, entity trajectories, environment dynamics) plus an optional LLM-enhanced Phase C track (situation judgment, narrative summary, action suggestions)
Built for narrative simulations of markets, negotiations, opinion dynamics, organizational decision-makingโany domain where you want to ask "what happens when these agents interact under these rules?"
- ๐ 6-layer architecture with strict layer boundaries (World / Scenario / Rules / Runtime / Event Log / Analysis)
- ๐ค Three decision modes coexist:
llm(any OpenAI-compatible provider),rule(deterministic),random(sampling) - ๐ YAML-defined worlds with full schema validation (JSON Schema + Pydantic + cross-layer semantic checks)
- โธ Pause / resume / intervene at any tickโforce actions, inject messages, modify entity attributes mid-run
- ๐ Dual-track analysis: deterministic statistics (always on) + optional LLM-narrated summary
- ๐งช 670 passing tests, including end-to-end CLI tests for two production-ready scenarios
- ๐ Extensible by design: bring your own rules module, your own LLM provider, your own analysis enhancers
git clone https://github.com/Kaka-cheaper/Polisim.git
cd Polisim
pip install -e ".[dev]"Requires Python 3.10+.
python -m cli run scenarios/minimal_market/scenario.yaml --ticks 5Sample output:
[run_id] minimal-market_2026-04-26T10-30-00_a1b2
[t= 1] decision_proposed actor=company_a action=do_nothing mode=llm
[t= 1] decision_proposed actor=company_b action=promote mode=rule
[t= 1] action_executed actor=company_b
[t= 1] attribute_changed actor=company_b cash: 100 -> 80
[t= 1] attribute_changed actor=company_b reputation: 50 -> 55
[t= 1] snapshot_saved tick=1
...
[final] tick=5 events=42 entities=2
[analysis] final.md (Phase A) -> runs/minimal-market_.../analysis/
$env:OPENAI_API_KEY = "sk-..." # or any OpenAI-compatible endpoint
python -m cli run scenarios/three_party_negotiation/scenario.yaml `
--llm-provider openai --ticks 8python -m cli run scenarios/minimal_market/scenario.yaml `
--llm-enhance --output-language zh-CN --prompt-history-size 3The final report at runs/<run_id>/analysis/final.md will include four LLM-generated sections in the configured language:
- World overview โ plain-language explanation of the initial entities, attributes, relations and scenario goal
- Narrative summary โ trajectory described in natural language with tick references
- Situation judgement โ final state assessment with cited evidence (specific tick + attribute change)
- Next action suggestions โ actionable suggestions, each embedding supporting evidence
--prompt-history-size N controls how many recent decisions (D-016) are injected into each LLM prompt's actor_view.recent_decisions (default 3, range 0-10).
flowchart TB
subgraph Definition["Definition (static, declared once)"]
World["World Definition<br/>(YAML)<br/><i>entity_types / action_types /<br/>message_types / relation_types</i>"]
Scenario["Scenario<br/>(YAML)<br/><i>entities / relations /<br/>scheduled_events / breakpoints</i>"]
end
subgraph Execution["Execution (per-tick orchestration)"]
Rules["Rules<br/>(Python)<br/><i>validate_action /<br/>resolve_effects /<br/>actions_handled</i>"]
Runtime["Runtime<br/>(orchestrator)<br/><i>tick loop / activation /<br/>conflict resolution /<br/>pause+intervene</i>"]
Provider["LLM Provider<br/>(pluggable)<br/><i>OpenAI / Mock /<br/>any compatible API</i>"]
end
subgraph Output["Output (append-only, replay-friendly)"]
EventLog["Event Log<br/>(JSONL)<br/><i>per-tick events +<br/>snapshots</i>"]
Analysis["Analysis<br/>(Markdown + JSON)<br/><i>Phase A: deterministic +<br/>Phase C: LLM-enhanced</i>"]
end
World --> Runtime
Scenario --> Runtime
Rules --> Runtime
Provider -.->|llm decisions| Runtime
Runtime --> EventLog
EventLog --> Analysis
The 6 layers communicate only through declared interfacesโno cross-layer pollution. This makes each layer independently testable and replaceable: swap rules without touching runtime, swap LLM providers without touching scenarios, swap analysis without touching the event log.
Two competing companies (company_a LLM-driven, company_b rule-driven) decide between promote and do_nothing based on cash, reputation, and a global market_pressure environment variable. Demonstrates: basic action effects, attribute clamping, scheduled events, snapshot lifecycle.
python -m cli run scenarios/minimal_market/scenario.yamlโ See scenarios/minimal_market/
Three negotiators (Alice LLM, Bob rule, Charlie random) propose / accept / reject offers to each other. Trust relations evolve via update_value; reaching trust=80 triggers a breakpoint. Demonstrates: directed messages, relation dynamics, multi-effect actions, pluggable decision modes, breakpoint-driven pause.
python -m cli run scenarios/three_party_negotiation/scenario.yamlโ See scenarios/three_party_negotiation/
Polisim/
โโโ cli/ CLI entry point (run / step / replay)
โโโ core/ Orchestration layer
โ โโโ runtime.py Tick loop, activation, conflict resolution
โ โโโ analysis.py Phase A + Phase C analysis
โ โโโ semantic_validator.py D-013 cross-layer semantic checks
โ โโโ definition_loader.py World loading + schema validation
โ โโโ scenario_loader.py Scenario loading + cross-file refs
โ โโโ rules_loader.py Dynamic rules module loader (D-010)
โ โโโ llm_policy.py LLM protocol layer
โ โโโ events.py Append-only event log + snapshots
โ โโโ errors.py Unified SimEngineError hierarchy (D-011)
โ โโโ providers/ LLMProvider ABC + OpenAI / Mock impls
โโโ models/ Pydantic models (world / scenario / runtime / config / analysis)
โโโ rules/ Rules modules (BaseRules + 2 concrete impls)
โโโ schemas/ JSON schemas (single source of truth)
โโโ scenarios/ YAML scenarios (minimal_market + three_party_negotiation)
โโโ tests/ 670 tests across all layers
โโโ docs/ Design docs / requirements / pitfalls / progress
All design and process documentation is in docs/:
| Document | What's inside |
|---|---|
AGENTS.md |
Entry point for AI coding assistantsโMUST/MUST NOT rules + doc navigation |
docs/00-overview/progress.md |
Session-to-session progress log; current state and next steps |
docs/00-overview/ๅฆไฝไฝฟ็จ่ฟๅฅๆๆกฃไธ้
็ฝฎไฝ็ณป.md |
Human-facing intro to the doc system |
docs/01-requirements/้ชๆถๆ ๅ.md |
Acceptance criteriaโsource of truth when implementation diverges from docs |
docs/01-requirements/ๆๅฐ็คบไพWalkthrough.md |
Step-by-step walkthrough of the minimal_market scenario |
docs/02-design/ |
8 design docs covering each architectural layer |
docs/03-implementation/pitfalls.md |
Known issues, edge cases, and pitfalls discovered during development |
docs/00-overview/LLM่พ
ๅฉๅปบๆจกๆนๆก.md |
Roadmap for future LLM-assisted modeling (Phase 2) |
pytest tests/ -qCurrently 670 tests passing in ~12 seconds, covering:
- All Pydantic models (world / scenario / runtime / config / analysis)
- All loaders (definition / scenario / rules) with three-tier validation
- Runtime full lifecycle (step / run_until / pause+intervene / breakpoints / snapshot modes)
- Both rules modules (minimal_market + three_party_negotiation)
- LLM protocol (mock + OpenAI provider with mocked HTTP)
- Analysis (Phase A + Phase C with multi-language injection)
- CLI end-to-end (run / step / replay across both scenarios)
- Cross-layer semantic validation (D-013)
Plus a real-OpenAI smoke test at scripts/smoke_openai.py for end-to-end verification with live API calls.
v0.1.1 engine rigorization is complete (D-014 strong action params + D-015 (scoped) AttributeEffect.new_value + D-016 PromptContext / enrich_prompt hook + LLM analysis upgrade with world_overview + evidence citation). Next directions, by priority:
- v0.2 web UI: single-repo monorepo additionโFastAPI WebSocket backend + React real-time situation panel consuming
decision_proposed.payload.prompt_context(D-016 enables this) - D-015 full:
EntityCreate/EntityDestroy/ChainedActioneffect types (deferred from v0.1.1) - Phase B.3: protocol-level retry with exponential backoff (currently relying on OpenAI SDK's built-in retries)
- Phase 2 LLM-assisted modeling:
core/modeling_loop.py+ guided Q&A frontend (D-013 semantic validator already provides the "self-repair loop" infrastructure; seeLLM่พ ๅฉๅปบๆจกๆนๆก.md) - More scenarios: information cascade, opinion dynamics, organizational decision-making
See progress.md section "ไธไธๆญฅ่ฏฅๅไปไน" for the live priority list.
This is currently a single-developer project being shaped session-by-session. If you want to:
- Report a bug or pitfall: open an issue with reproduction steps; pitfalls discovered during development are logged in
pitfalls.md - Propose a new scenario: open a discussion describing the world / entities / actions / what dynamics you want to study
- Contribute code: read
AGENTS.mdfirstโit lists the architectural rules that any contributor (human or AI) must follow
The architectural discipline of this project is heavily influenced by:
- Layered orchestration patterns from compiler design and game engines
- Append-only event logs from event sourcing and CQRS
- Multi-agent decision protocols from contemporary LLM agent frameworks (AgentVerse, AutoGen, CrewAI)
- Configuration validation from
pydanticand JSON Schema