feat: MuJoCo simulation backend - AgentTool with 50+ actions#85
Conversation
bc6080f to
78719d9
Compare
yinsong1986
left a comment
There was a problem hiding this comment.
All review comments addressed. LGTM.
f461f30 to
4a3fd3c
Compare
|
Rebased
sim = [
"robot_descriptions>=1.11.0,<2.0.0",
]
sim-mujoco = [
"mujoco>=3.0.0,<4.0.0",
]
all = [
"strands-robots[groot-service]",
"strands-robots[lerobot]",
"strands-robots[sim]",
"strands-robots[sim-mujoco]",
]Both |
dda5248 to
696b423
Compare
awsarron
left a comment
There was a problem hiding this comment.
For all comments in this PR, we should examine common themes and include corrections for them in AGENTS.md so that future agent runs benefit from their lessons.
Move _xml, _robot_base_xml, and _tmpdir from SimWorld into a generic _backend_state dict. Each backend stores its format-specific data there instead of polluting the base class with implementation details. Addresses @awsarron review: 'how can we avoid having implementation details (Mujoco) in base classes like this?' The MuJoCo backend (PR strands-labs#85) will store these in world._backend_state['xml'], etc. during rebase.
Review Status SummaryAll 17 review threads are now resolved. ✅ Latest commit CI: ✅ All checks passing @awsarron — this is ready for re-review. Once #84 merges, this can follow immediately. 🤖 Pipeline analysis by AI agent. Strands Agents. Feedback welcome! |
📋 Review Status SummaryHi @awsarron — consolidating the current state of this PR to help with re-review. Thread Resolution: ✅ 17/17 resolvedAll 17 review threads have been addressed and resolved:
Key changes since CHANGES_REQUESTED:
CI: ✅ PassingLatest commit status: SUCCESS Dependency contextThis PR depends on #84 (simulation foundation, also 50/50 resolved) and is a prerequisite for #86 (Robot factory). 🤖 Automated review triage by Strands Agents. Feedback welcome! |
Move _xml, _robot_base_xml, and _tmpdir from SimWorld into a generic _backend_state dict. Each backend stores its format-specific data there instead of polluting the base class with implementation details. Addresses @awsarron review: 'how can we avoid having implementation details (Mujoco) in base classes like this?' The MuJoCo backend (PR strands-labs#85) will store these in world._backend_state['xml'], etc. during rebase.
✅ All review feedback addressed — ready for re-approvalSummary of latest cycle (2026-05-06): @yinsong1986 — your 26-item review from earlier today has been fully addressed in commits
All 88 review threads resolved. CI green. Would you be able to submit a formal Approve to unblock merge? 🙏 🤖 AI agent response. Strands Agents. Feedback welcome! |
There was a problem hiding this comment.
Read this end-to-end — the MjSpec refactor, the lock audit, and the cooperative-stop ordering are great. Five small things inline, none blocking, but (mesh/peer_id), (tool description vs enum), and ([mujoco] vs [sim-mujoco]) all touch the v0.3.9 5-line promise so worth a quick look before merge.
…tmp path Addresses all 4 unresolved comments from @sundargthb (2026-05-06): 1. simulation.py L127: Change mesh=True → mesh=False default to prevent silent behavior-flip when PR strands-labs#98 (Zenoh mesh) lands. The hasattr(self, 'mesh') guard at L2006 is permanently False today — keeping mesh=True as default would auto-enable networking on upgrade without user action. 2. simulation.py L1544: Tool description now lists all 61 actions from the enum (was ~38). Grouped by category for LLM readability. Prevents action discovery gap for flagship features (replace_scene_mjcf, raycast, get_jacobian, inverse_dynamics, eval_policy, etc.). 3. factory.py L177: Error message now prints 'pip install strands-robots [sim-mujoco]' instead of '[mujoco]'. Added backend→extra mapping dict so future backends get correct suggestions too. 4. rendering.py L647: Replaced hard-coded '/tmp/strands_robots/recordings' with tempfile.gettempdir()-based path. Works on Windows, survives Mac reboots, and the tool_spec.json description updated accordingly. All changes: ruff clean, mypy clean, 105 tests pass.
✅ Addressed all 4 @sundargthb review commentsPushed commit
Quality checks: ruff ✅ | mypy ✅ | 105 non-GPU tests pass ✅ 🤖 AI agent response. Strands Agents. Feedback welcome! |
sundargthb
left a comment
There was a problem hiding this comment.
Approved. One minor thing - if the hasattr is permanently False today by design, would self.mesh: Any = None in `init' be cleaner than relying on the missing-attribute case?
Three bugs fixed in one commit: 1. **macOS path validation** (_path_validation.py) validate_save_path() was silently accepting /etc/passwd and other sensitive paths on macOS because os.path.realpath resolves /etc, /var, /tmp to /private/etc, /private/var, /private/tmp via symlinks. The blocked-prefix startswith check then missed them. Fix: on darwin, add /private/-prefixed variants of every Linux blocked prefix to BLOCKED_PREFIXES. Tests: 5 previously-failing tests now pass, plus 2 new regression tests (test_darwin_includes_private_variants, test_linux_excludes_private_variants) pin the platform semantics. 2. **eject_robot state preservation** (scene_ops.py) eject_robot_from_scene rebuilt the scene from scratch, silently resetting surviving robots' joints to qpos=0 and snapping objects back to their spawn pose. Agents calling remove_robot mid-scene lost physics state with no warning. Fix: snapshot per-joint (qpos, qvel) by fully-qualified name BEFORE rebuild, restore by name AFTER fresh compile. Matches the 'Per-name state copy, not flat index' rule from AGENTS.md — flat-index copy is unsafe because body/joint indices shift when a robot is removed. New helpers: _snapshot_joint_state, _restore_joint_state (both module-private, name-based, width-checked per joint type). Tests: 3 new regression tests (test_surviving_robot_joint_state_is_preserved, test_surviving_object_freejoint_pose_is_preserved, test_ejected_robot_state_is_not_restored). 3. **mesh / peer_id explicit init** (simulation.py) — addresses @sundargthb's approval nit. mesh and peer_id constructor params were never assigned to self, making hasattr(self, 'mesh') at L2006 permanently False and masking the future PR strands-labs#98 wire-up. Now explicitly initialised as attributes so the guard becomes a plain truthy check. Future-compatible: when PR strands-labs#98 lands and replaces self.mesh with a real mesh object, the cleanup path will start working without further changes. Tests: 1275 pass, 1 skipped, 0 failures (was 1255 pass + 5 fail). Lint: ruff + mypy clean.
4c3b085 to
0911df2
Compare
yinsong1986
left a comment
There was a problem hiding this comment.
Follow-up review after the 8 new commits (notably d5e3355 lock-discipline pass). Most of my earlier punch list is resolved — lock gaps in physics.py, _dispatch_action serialization, destroy join-before-null, save_state → mjSTATE_FULLPHYSICS, PolicyRunner.run init ordering, DatasetRecorder.save_episode post-failure poisoning, bounded renderer cache, and step() batch release are all in. Nice work.
Three items still open, pinned inline. Leaning toward: fix the depth-units one in this PR (incorrect unit label will bake into downstream consumers), defer the other two to a follow-up if preferred.
…mera validation
P0: Linearize OpenGL depth buffer to metric depth (meters) in render_depth().
MuJoCo renderer returns normalized [0,1] values; now converted via
z = znear*zfar / (zfar - d*(zfar - znear)) using model.vis.map.znear/zfar
scaled by stat.extent.
P1: start_cameras_recording now fails loudly when user-specified camera names
don't resolve (same strict-validation policy as render()/render_depth()),
instead of silently dropping unresolved names.
P1: Plumb video_width/video_height through DatasetRecorder.create() →
_build_features() so the declared LeRobot feature shape matches the
actual render dimensions. Previously hardcoded (3, 480, 640) regardless
of VideoConfig.
P2: Add math.isfinite() guards to set_timestep and set_gravity. Previously
NaN passed the 'timestep <= 0' check (nan <= 0 is False) and inf passed
too — both corrupt model.opt silently.
nit: Expanded mj_model/mj_data property docstrings to document that reads
also race with a running PolicyRunner worker for direct Python consumers
(agent flows are serialized automatically via _dispatch_action lock).
✅ Final 3 items from 2026-05-07 review addressed@yinsong1986 - your follow-up review (2026-05-07 00:28) identified 3 remaining items. All fixed in commit
Bonus: Added All 98 review threads resolved. CI green. sundargthb approved. Ready for re-review at your convenience. 🤖 AI agent status update. Strands Agents. |
yinsong1986
left a comment
There was a problem hiding this comment.
Follow-up after a125d944. All three open items from the previous review are resolved: depth linearization (with correct stat.extent scaling), _build_features shape plumbing, and NaN/inf guards + regression tests. Also nice touch on the mj_model/mj_data docstrings.
One subtle regression flagged inline on the new start_cameras_recording strictness check — it can reject calls that _active_camera_list was designed to resolve via namespace suffix. Plus two small notes. Fix is ~3 lines; otherwise LGTM.
- **P1 regression fix** (start_cameras_recording): `_active_camera_list`
now returns `(resolved, unresolved_inputs)` so strict validation
operates on actual user input, not the already-namespace-resolved list.
Previously `start_cameras_recording(cameras=['side'])` was rejected
when the scene had 'arm1/side' even though suffix resolution succeeded.
Same fix applied to `render_all`.
- **Nit**: document the mujoco>=3.0 `stat.extent` convention — znear/zfar
in `model.vis.map` are fractions of the model's bounding scale, not
absolute meters. Cross-references pyproject mujoco>=3.2 pin.
- **Nit**: extend the ARB_clip_control warning to note that linearized
Min/Max values remain in meters but lose precision on this path —
downstream consumers should treat them as approximate.
- **Nit**: add regression tests for scalar `set_gravity(float('nan'))`
and `set_gravity(float('inf'))` — guards against future refactors
moving the `isfinite` loop ahead of the scalar->[0,0,z] expansion.
- **Regression tests**: `TestCamerasRecordingSuffixResolution` pins the
suffix-resolution fix with 3 cases (short name resolves, bogus name
still errors, mixed resolvable+bogus fails loudly).
✅ All 4 items from 2026-05-07 02:06 review addressed@yinsong1986 — commit
Regression tests
Quality gate
All review threads addressed. Ready for final pass. 🤖 AI agent response. Strands Agents. Feedback welcome! |
|
Gentle ping @yinsong1986 — all items from your May 7 review are addressed in 🤖 AI agent response. Strands Agents. Feedback welcome! |
|
@yinsong1986 Gentle ping — 🤖 Scheduled cycle. Strands Agents. |
yinsong1986
left a comment
There was a problem hiding this comment.
LGTM.
Reviewed across four rounds; every P0/P1 from the original 23-comment review plus all follow-up items are resolved with fixes + regression tests:
- Lock discipline across
physics.pyand_dispatch_action(serialized via RLock) destroyjoins running policy Futures before nulling_world- Depth linearization with correct
stat.extentscaling (MuJoCo >= 3.2 convention documented) start_cameras_recordingstrict validation preserves namespace-suffix resolution (tuple-return from_active_camera_list)DatasetRecorder._build_featuresreceives actualvideo_width/video_heightsave_stateusesmjSTATE_FULLPHYSICS(ctrl + qfrc_applied captured)- NaN/Inf rejected on both vector and scalar
set_gravity/set_timesteppaths, with regression tests guarding the expansion ordering - Renderer cache bounded (4 per thread, LRU-evicted + closed)
step()caps + per-batch lock release sostop_policycan interleavePolicyRunner.runinitialisesstart_time/step_countbeforetryDatasetRecorder.save_episodepoisons the recorder on failure
Nice sustained iteration. Good provenance on the commit messages too (timestamps tying changes to specific review comments).
|
@sundargthb — your earlier approval was dismissed by GitHub’s stale-review protection (new commits pushed after your May 6 review to address @yinsong1986’s follow-up items). All your inline feedback was addressed in @yinsong1986 has now formally approved after 4 review rounds (May 10). The merge is blocked waiting for a second current approval to satisfy branch protection. Could you re-review and re-approve when you get a moment? The only changes since your review are:
All 102 review threads resolved. CI green. Thanks! 🙏 🤖 Scheduled cycle. Strands Agents. |
All review items addressed and verified. PR has two fresh APPROVED reviews from yinsong1986 and sundargthb. CI green. 102/102 threads resolved.
Rebased on main (post-PR strands-labs#85 merge). Changes: - New strands_robots/robot.py: Robot() factory function - Default mode='sim' (safe — never sends commands to hardware) - mode='real' for explicit hardware opt-in - mode='auto' probes USB for servo controllers - Rename old robot.py → hardware_robot.py (HardwareRobot class) - Updated __init__.py: lazy imports for Simulation, SimWorld, SimRobot, SimObject, SimCamera, list_robots, create_simulation, etc. - Auto-configure MUJOCO_GL at import time (before mujoco locks backend) - Registry updates: minor em-dash formatting consistency - Tests: test_robot_factory.py, test_registry.py, test_registry_integrity.py, test_user_registry.py All review threads (14/14) previously resolved. Addresses feedback from @awsarron and @yinsong1986.
… registration Add Newton GPU simulation backend stub implementing the SimEngine ABC. All methods raise NotImplementedError — actual implementations will follow in subsequent PRs (world lifecycle, step/action/obs, objects, diffsim). Changes: - strands_robots/simulation/newton/ — NewtonSimulation, NewtonConfig - tests/simulation/newton/ — 81 tests (config, factory, lazy import, simulation) - pyproject.toml — [newton] extra (warp-lang + newton-sim), excluded from [all] since newton-sim is not yet on PyPI - mypy overrides — added warp.* and newton.* to ignore list - Fixed method signatures to match SimEngine ABC (list_robots, robot_joint_names, get_observation, run_policy) after PR strands-labs#85 updated the base class
| _TOOL_SPEC_PATH = Path(__file__).parent / "tool_spec.json" | ||
|
|
||
|
|
||
| class Simulation( |
There was a problem hiding this comment.
to keep naming consistent, could we consider MuJoCoSimEngine? I think Simulation is too broad when we will introduce more simulation engines like Isaac Sim, Newton, etc.
| "lerobot>=0.5.0,<0.6.0", | ||
| ] | ||
| sim = [ | ||
| sim-mujoco = [ |
There was a problem hiding this comment.
nit: will we add robot_descriptions to every individual sim dependency group? From previous PRs we kept it in sim intentionally as it's broader. sim-mujoco and others like sim-newton could include strands-robots[sim]
|
|
||
| def __init__( | ||
| self, | ||
| tool_name: str = "sim", |
There was a problem hiding this comment.
should all tools have unique names by default in strands-robots, i.e. rename this to mujoco_simulation ?
| # Fail fast: verify MuJoCo is importable at construction time | ||
| # so consumers catch missing-dependency errors immediately. | ||
| self._mj = _ensure_mujoco() | ||
| logger.info("🎮 Simulation tool '%s' initialized", tool_name) |
|
|
||
| # World Management | ||
|
|
||
| def _cheap_robot_count(self) -> int: |
There was a problem hiding this comment.
is this worth moving outside of mujoco so that it's reusable elsewhere? Doesn't seem specific to mujoco simulation
| # J2 · PHYSICS PROBE - every physics introspection method on a live sim | ||
|
|
||
|
|
||
| def test_j2_physics_probe_every_mixin_method(sim): |
There was a problem hiding this comment.
j1, j2, etc. feels superfluous
| r = sim.add_camera(name=name, position=p, target=t) | ||
| assert r["status"] == "success", r | ||
|
|
||
| sim.step(n_steps=20) |
There was a problem hiding this comment.
would be worth having tests that validate the actual sim state, e.g. have a red and blue object, run some policy that grabs one of the objects in sim, validate that the policy actually ran and moved the robot in sim, grabbing the desired object. This validates that everything functioned as expected end-to-end. At the moment a lot of these integ tests set up objects, step the sim, and then check the objects again which is more ideally covered by unit tests instead of integ tests
| @@ -0,0 +1,314 @@ | |||
| """End-to-end MuJoCo simulation test with Policy ABC. | |||
There was a problem hiding this comment.
unit tests ideally mock mujoco (the core dep), integ tests would not mock it
| @@ -0,0 +1,717 @@ | |||
| """T1/T13: AgentTool router contract - unknown kwargs rejected, required args friendly, | |||
There was a problem hiding this comment.
I would have unit tests file map 1:1 with filenames in src dir
| @@ -0,0 +1,717 @@ | |||
| """T1/T13: AgentTool router contract - unknown kwargs rejected, required args friendly, | |||
TL;DR
Complete MuJoCo simulation backend for
strands-robots, shipped as a StrandsAgentToolwith 50+ actions (addedreplace_scene_mjcf+patch_scene_mjcfin the MjSpec refactor). An agent can spin up a physics world, load robots + objects, step physics, render RGB/depth cameras, run policies, record LeRobot-format datasets, and perform advanced physics queries — all via natural language through a single tool.Part 4 of 6 in the MuJoCo-sim PR decomposition (follows #83 build-system, #84 sim foundation).
🎬 Visual proof it actually works
50 deterministic scenarios were run against the current code (commit
ffc3ba0) to producevideos + images for every category of tool action. Artifacts live on the
cagataycali/robots:pr85-demosbranch; the regeneration script is
generate_demos.py(755 LOC, scripted — no LLM calls,reproducible). 50/50 scenarios completed successfully.
Samples (click to open the raw MP4 / PNG)
replace_scene_mjcfescape hatch<tendon>pendulum (can't be expressed viaSimObject)patch_scene_mjcfiterative buildarm/side+arm/top+arm/frontget_mass_matrix+get_contactsoverlaid on rendered sceneAll 50 cycles + findings JSON: cagataycali/robots:pr85-demos.
How to review this PR
There's a lot going on. To keep the review tractable, here's what actually matters vs. what's noise.
✅ 1. Must-read — the simulation backend
These are the ~3–4k lines of real new functionality. Review in this order:
strands_robots/simulation/base.pySimEngineABC — the public contract every backend implementsstrands_robots/simulation/factory.pycreate_simulation()+ runtimeregister_backend()strands_robots/simulation/mujoco/backend.pyimport mujoco+ headless-GL auto-config (osmesa/egldetection)strands_robots/simulation/mujoco/simulation.pySimulation(AgentTool)— the orchestrator. All 37 agent actions live here. Primary review target.strands_robots/simulation/mujoco/tool_spec.jsonstrands_robots/simulation/mujoco/spec_builder.pySpecBuilderusesmujoco.MjSpecAST instead of string-concat MJCF (replaced 273-linemjcf_builder.py)strands_robots/simulation/mujoco/scene_ops.pyMjSpec.recompile()-based scene mutation (down from 980 LOC on the old XML-round-trip path)strands_robots/simulation/mujoco/physics.pyPhysicsMixin— raycast, jacobian, energy, forces, mass matrix, checkpoints, inverse dynamics. Review by feature.strands_robots/simulation/mujoco/rendering.pyRenderingMixin— offscreen RGB + depth cameras, multi-camera capturestrands_robots/simulation/policy_runner.pyPolicyRunner.run()loop with video recording + on_frame hooksstrands_robots/simulation/mujoco/recording.pyRecordingMixin— per-episode LeRobot dataset capturestrands_robots/simulation/mujoco/randomization.pyRandomizationMixin— domain randomization hooks🧪 2. Tests — proves the above works
980 passing tests on the sim+registry suite, 1 skipped (headless-CI video recording). New this round:
tests/simulation/mujoco/test_spec_builder.pyMjModel/MjsBodystructure, not XML stringstests/simulation/mujoco/test_replace_scene_mjcf.pytests/simulation/mujoco/test_patch_scene_mjcf.pytests/simulation/mujoco/test_export_xml_after_replace.pyexport_xml+MjSpecinteraction (surfaced by agent-in-the-loop testing)tests/simulation/test_video_camera_validation.pytests/simulation/mujoco/test_concurrency_lock_audit.pyget_mass_matrix/get_sensor_data/get_contactsall holdself._lockduringmj_forwardtests/simulation/test_no_import_cycle.py📊 3. Coverage
Simulation subpackage: 86% line coverage (2,464/2,868 lines covered). Per-file:
simulation/models.pysimulation/mujoco/randomization.pysimulation/factory.pysimulation/mujoco/spec_builder.pysimulation/policy_runner.pysimulation/base.pysimulation/mujoco/rendering.pysimulation/mujoco/backend.pysimulation/mujoco/simulation.pysimulation/mujoco/recording.pysimulation/mujoco/physics.pysimulation/mujoco/scene_ops.py0%-coverage
strands_robots/tools/*files are physical-hardware adapters not touched by this PR; they're tracked under a separate follow-up issue.🆕 What changed since the last review pass
MjSpec refactor — closed #121-#126 (the "replace string-concat MJCF with
mujoco.MjSpec" umbrella)mujoco>=3.2, addSpecBuilder, replace camera xyaxes mathad1d298spec.attach(), scene inject/eject viaspec.recompile(), deletemjcf_builder.py, cleanupc2e8826patch_scene_mjcf(ops=[...])for structured scene edits with atomic rollback3404809export_xmlprefersspec.to_xml(), dropsmj_saveLastXMLfallback (surfaced by agent testing)80caf82/38442b0Closes umbrella #121 and sub-issues #122–#126.
Test-directory restructure + code hygiene
tests/simulation/…to mirrorstrands_robots/simulation/…(file moves only).# ═════ / # ─────ASCII divider comments (28 banner lines).IDEA.md(its stages 0-7 are all landed).3 P0/P1 bugs surfaced by the AST analysis + E2E agent harness (commit
ffc3ba0)Run independently by
/tmp/ast-analysis-v2/deeper_analysis_v2.ipynb+/tmp/e2e_agentic_test_85/e2e_agentic_test_85.ipynb:P0 — Video recording silently produced 0-byte MP4 on wrong camera name. The LLM
agent called
run_policy(video={"camera": "side"})butadd_robot()compiled theURDF's
sidecamera asarm1/side. Everyrender()inside the loop returnedstatus=error,_extract_frame_ndarray()returnedNone, the rollout ran tocompletion,
writer.close()produced an empty file, and nothing in the result texthinted at the problem.
Fix:
PolicyRunner.runnow pre-validates the camera name once before the steploop and returns a clean error dict with a "cameras are namespaced, e.g. arm1/side"
hint.
P0 — 3 read-only ops called
mj_forwardwithoutself._lock. After sim/mujoco: clarify _policy_threads semantics — per-robot or global? #114 landedconcurrent per-robot policies,
get_mass_matrix/get_sensor_data/get_contactscould race a policy thread's
_apply_sim_action(mj_step) and return torn/NaN reads.Fix: wrap
mj_forward+ data read inwith self._lock:in all three; snapshot thedata arrays under the lock so name resolution can run lock-free.
P1 —
simulation.base ↔ policy_runnerimport cycle. Papered over with threeinline lazy imports inside
SimEnginemethods. The cycle is compile-time only(
policy_runner.pyimportsSimEngineunderTYPE_CHECKING), so the lazy importswere unnecessary scaffolding.
Fix: hoist
PolicyRunner+VideoConfigto module-level inbase.py, delete thethree inline lazy imports. Added a regression test that walks the AST with
TYPE_CHECKINGawareness and asserts zero runtime cycles.Usage
Or imperatively:
Agent-authored scenes (new in this round)
Key design decisions
SimulationextendsAgentTooldirectly —Agent(tools=[Simulation()])just works, no wrapper._ensure_mujoco()only imports the heavy dep when a sim is actually created.MjSpec-backed scene edits — no more XML round-tripping.spec.recompile(model, data)preserves unchanged joint state automatically.PolicyABC for sim and real — a policy trained in sim runs on the real robot with zero code changes.Simulationis standalone — no dependency onRobot(). Addresses Arron's earlier ask.register_backend("my_sim", MySim)at runtime (covered bytest_factory.py).Testing locally
Regenerate the visual artifacts
Depends on #83 (build) and #84 (sim foundation). After this lands,
strands_robots.simulation.Simulationis fully usable as a standaloneAgentTool.