Nav pt5: Dynamic Global Map with Loop Closure Voxel Transform by jeff-hykin · Pull Request #2131 · dimensionalOS/dimos

jeff-hykin · 2026-05-17T14:12:34Z

Problem

We want 1 map thats global and accurate and real time

Solution

Purple Boxes = Important

ApplyClosure (Graph Transformation)

The vid simulates major drift + loop closure.

The GREEN part at the end of the video is the transformation (in slow-motion) being applied to the global point cloud. Here's how its done:

Expose the pose graph from PGO
Use the delta of a loop closure event as a skeleton movement (video game skeleton)
Apply a modified version of linear blending skinning to deform the lidar pointcloud as a chronological mesh
Keep track of time using a slightly weird but efficient voxel cloud

loop_closure_transform.mov

Dynamic Point-Clearing

NOTE: I'm scrubbing-through the replay in this video (not real time playback). I'm showing how well you can a human with very minimal artifacting.

How? A slightly modified version of Andrew's Rust Raycast module!

raycast_point_clearing.mov

Complete Wiring

How to Test

# tested on alfred
dimos run alfred-nav

# should also work fine onboard a go2 with livox
dimos run unitree-go2-nav

# test the ApplyClosure (rerun visual)
uv run python -m dimos.navigation.nav_stack.modules.apply_closure.demo_closure_scene --step-ms 200

Contributor License Agreement

I have read and approved the CLA.

# Conflicts: # data/.lfs/go2_hongkong_office.db.tar.gz # data/.lfs/go2_short.db.tar.gz

… rrb Native module (cpp/main.cpp) now publishes two new streams on every keyframe: GraphNodes3D for keyframe optimized poses, LineSegments3D for odometry (traversability=1.0) and loop-closure (0.4) edges. Both wire through SimplePGO::keyPoses() + historyPairs() — no changes needed to simple_pgo.{h,cpp} since the accessors already exist. Native binary rebuilt cleanly via nix build .#default --no-write-lock-file. Python (pgo.py) declares matching pgo_graph_nodes / pgo_graph_edges Out streams so the rerun bridge auto-discovers and logs them. nav_stack_rerun_config() now picks _agentic_debug_rerun_blueprint when agentic_debug=True — an rrb.Horizontal layout with a 3D pane and a dedicated top-down pane (both Spatial3DView over origin="world", named "3D" and "top_down" so dimos-viewer persists camera state separately). demo_better_pgo_viz.py composes the cross-wall sim blueprint with agentic_debug=True so the new layout + pose graph render together. Used for manual screenshot validation.

Adds visual_override entries for world/pgo_graph_nodes and world/pgo_graph_edges that mirror the existing FAR pattern: when agentic_debug=True, the PGO pose graph renders at z=_AGENTIC_DEBUG_LIFT (3.0m) instead of the default 1.7m, with slightly larger node radii (0.15) and edge thickness (0.06) so the green keyframe trajectory stands out clearly above the terrain cloud in the top-down pane. Verified visually via demo_better_pgo_viz with the cross-wall sim — green keyframe nodes + edges are now plainly identifiable above terrain in both the 3D and top_down rerun panels.

rerun's Spatial3DView doesn't have a top-down camera API, so the "top_down" pane introduced in a7a9be9 was just a duplicate 3D view. Drop _agentic_debug_rerun_blueprint and use _default_rerun_blueprint unconditionally — the agentic_debug lift on visual_override is what actually makes the pose graph and nav markers readable from any angle.

C++ side (main.cpp): when searchForLoopPairs sets m_cache_pairs (i.e. this keyframe will be incorporated into iSAM2 with a loop factor), snapshot the current global poses before smoothAndUpdate. After the update, build a nav_msgs::Path-encoded LoopClosureDeltas message: position = post.t - r_delta * pre.t, orientation = quaternion(post.R * pre.R^T). Publish on the new pgo_loop_closure topic. Stderr logs the event count for live observability. Python side (pgo.py): declare pgo_loop_closure: Out[NavPath] so the new topic is registered alongside corrected_odometry/pgo_tf/etc. Slow test (test_pgo_loop_closure.py): replays og_nav_60s through the native binary with permissive thresholds (loop_time_thresh=5s, min_loop_detect_duration=1s, loop_search_radius=2m, loop_score_thresh=0.5) so the recording reliably triggers loop closures. Subscribes to pgo_loop_closure, logs each event the moment it arrives (event #, poses_length, frame_id, first delta), and after the run validates each event has >0 poses, finite translations (<100m), and unit-norm quaternions (drift <0.05). Stdout from a run shows 19 events, sizes 10..35, max |t|=0.0013m, max |q|-1|=1e-6 — exactly the small-nudge profile expected from a self-consistent recording.

Replaces the kdtree-on-keyframe-positions loop search with a Scan Context (Kim & Kim 2018) descriptor-based pipeline: 1. addKeyPose now also caches a polar-binned (20 rings × 60 sectors) max-z descriptor + the per-row mean "ring key" for each keyframe. The descriptor is appearance-based and pose-independent, so it keeps working even when odometry has drifted enough that the new keyframe is no longer "near" its old neighbours in pose-space. 2. searchForLoopPairs first asks Scan Context for a candidate: ring-key L2 distance ranks all past keyframes, top-K are scored by column-shifted cosine distance on the full descriptor, the best below the threshold (default 0.4) is the candidate. The winning column shift is also converted to a yaw rotation and used to seed ICP, which dramatically improves convergence on revisits that arrive at a different heading from the original pass. 3. Position-based search is retained as a fallback when SC is disabled or finds nothing, so existing behaviour is preserved. Replaces ~50 lines of position-search with ~30 lines of SC retrieval in searchForLoopPairs; new scan_context.{h,cpp} (~150 lines, MIT attribution to upstream irapkaist/scancontext concepts but no source copied) implements the descriptor + distance. Side-effect: this makes on-start relocalization a small follow-up addition — descriptors + ring-keys + poses are now per-keyframe state that can be serialised, and the SC search path already does "appearance-based pose recovery without an initial pose guess." Verified via test_pgo_loop_closure.py: 17 loop-closure events fired across the og_nav_60s rosbag (was 19 with naive position search; SC is more selective and rejects two borderline-position matches that weren't actually visual revisits). All events have valid shape + tiny quaternion/translation deltas as expected for a self-consistent bag.

…n search misses Adds CLI args to expose Scan Context config on the native binary (--use_scan_context, --sc_n_rings, --sc_n_sectors, --sc_max_range_m, --sc_top_k, --sc_match_threshold). New slow test test_pgo_synthetic_drift.py: - Synthesises a 4-wall point-cloud room with two distinctive interior columns (so the scene isn't rotationally symmetric). - Generates an out-and-back trajectory: drives east 8m then returns to the origin, heading unchanged. - Injects DRIFT_AT_REVISIT_M = 5m of additive y-drift into the reported odometry, ramped linearly with travelled distance. The body-frame scan stays byte-identical between first and second visit (same true sensor view of the same scene); the odom pose at revisit is 5m offset. - Runs the native PGO binary twice over the same input: * use_scan_context=true → expect ≥1 loop event * use_scan_context=false → expect 0 loop events (drift >> 1m radius) - Dumps PGO stderr after each run for diagnostics. Result: SC fires 10 loop closure events on the synthetic trajectory; position-based search fires 0 — exactly the demonstration of why we swapped to appearance-based place recognition. Both assertions pass. Verifies the core SC value prop: appearance-based place recognition doesn't depend on the (drifted) pose, so it keeps working when the odometry has wandered far enough that the kdtree-on-positions search no longer finds neighbours.

Test files now use setup_logger() / logger.info(...) per the fix_nits rule "no print() calls in tests; use logging if diagnostics are genuinely needed." Matches the existing test_pgo_rosbag.py convention. Also drops the now-unused sys import. Also clears a stale docstring on demo_better_pgo_viz.py: it claimed the demo enabled a "horizontal 3D + top-down panes" layout, which was reverted in 1801759 — rerun's Spatial3DView didn't support an initial camera angle (rrb.EyeControls3D existed at the time but wasn't used). The remaining value of agentic_debug=True is the visual override lift, which the new docstring describes accurately. No behavioural change. Tests still pass.

Sweep over names introduced by the better_pgo work that hit fix_nits "expand mod -> module" rule: - scan_context: cfg -> config (param + 12 call-sites); d (return val) -> descriptor in make_descriptor/make_ring_key/make_sector_key; pt -> point in the descriptor build loop; zf -> point_z (float cast); q_col/c_col -> query_column/candidate_column; q_norm/c_norm -> query_norm/ candidate_norm; cj -> shifted_j; d (in best_distance return loop) -> distance with min_distance for the running best. - simple_pgo: desc -> descriptor on the per-keyframe cache; k -> top_k_count for the partial-sort bound; structured-binding `auto [d, shift]` -> `auto [distance, shift]`. - main.cpp: kp -> keyframe; ps -> pose_stamped (build_graph_nodes and build_loop_closure_deltas); a/b -> start/end and p1/p2 -> start_pose/end_pose in append_segment; n -> count for the loop bound; lc_msg -> loop_closure_msg at the publish site. - tests: ps -> pose in the validate loop (test_pgo_loop_closure); c,s -> cos_yaw,sin_yaw in _yaw_rotation (test_pgo_synthetic_drift). Names that intentionally stay short are the math-convention ones: r/t for SE(3) rotation+translation, q for quaternion, i/j as loop indices, idx as keyframe index, ts as timestamp, dt for time delta, tx/ty/tz/qx/qy/qz/qw for component decomposition. The fix_nits rule calls out mod/lc as the target pattern; expanding the math-notation names would make the code less readable, not more. Also drops one section-label comment ("# Log each event the moment it arrives.") whose adjacent function name already conveys the same and one in-loop "# node_type 1 = odom/robot" that repeats info already stated in the function-level docstring. Native binary rebuilt + slow test still passes (17 events, all valid).

Drops in the wiring for evaluating the PGO native module on KITTI-360. Cannot run end-to-end yet — the dataset is gated behind a registered login at cvlibs.net so the data download is a manual user step. What's here: - kitti360_loader.py: parses the KITTI-360 directory layout (data_3d_raw + data_poses + calibration); composes per-frame lidar→world pose by chaining cam0_to_world ⊕ inv(velo_to_cam). Exposes a frame iterator + scan_xyz(frame_id). - loop_groundtruth.py: LCDNet/KITTI-convention groundtruth (≥50 frame gap, ≤4m radius), order-agnostic scoring of detected pairs. - run_kitti360_benchmark.py: argparse CLI, spawns the native binary on private LCM topics, plays (registered_scan, odometry) from disk, subscribes to pgo_graph_edges to extract loop-closure pairs (via traversability ≈ 0.4 segments) and pgo_loop_closure for delta event counts. Writes JSON. - README.md: download instructions for the official "Test SLAM 3D" 12 GB package, published SOTA reference numbers from LCDNet + ISC papers (LCDNet 0.91-0.93 AP, Scan Context 0.62-0.78 AP), expected ballpark for our minimal SC port.

… jeff/clean/nav4

Conflicts resolved: - docs/development/conventions.md: kept HEAD's threading bullet + main's foxglove-removed wording - dimos/hardware/sensors/lidar/fastlio2/fastlio_blueprints.py: kept HEAD's new mid360_fastlio_ray_trace_replay blueprint; accepted main's n_workers=3 -> 5 bump on mid360_fastlio_ray_trace - dimos/protocol/tf/tf.py: dropped HEAD's same-frame ValueError guard (broke test_same_frame_returns_identity); use main's signature so get_transform's identity branch fires - dimos/robot/all_blueprints.py: kept HEAD's 4 new fastlio entries (memory/ray-trace-replay/replay/replay-voxels) and HEAD's fastlio-memory + fastlio-replay class entries; dropped HEAD's foxglove-bridge entry (main removed foxglove support) - dimos/mapping/ray_tracing/rust/.gitignore (NEW): kept HEAD's 'result' and 'result-*' patterns plus shared 'target/' NOT MERGED (HEAD version retained via git checkout --ours): - dimos/mapping/ray_tracing/module.py: add/add with 4 divergent sections; HEAD uses DynamicCloud + map_override + sequence_period_secs for loop-closure, main uses PointCloud2 + grace_depth + GlobalPointcloud mixin + tuned min_health = -2. Cannot be merged without deciding which interface shape downstream callers expect. - dimos/mapping/ray_tracing/rust/src/main.rs: add/add with 114 conflict markers; two genuinely different Rust voxel-map binaries. Jeff needs to hand-merge those 2: pick HEAD's PR-shaped interface and update main's downstream callers, or pick main's and re-port the PR's map_override/slow-clock features.

Records the 3 greptile review comments + 1 Jeff self-todo: - 3254780591: apply_closure.py — already fixed in current code (uses self.map_override.publish); greptile was looking at older rev - 3254780620: rust main.rs:460 DDA cap — lives in the deferred conflict file, defer until Jeff picks Rust binary version - 3254780663: DynamicCloud.py:182 ts=0.0 doc comment — doc-only nit - 3254844181: jeff-hykin self-todo on memory2/module.py:309

@rpc

main's PR #2207 (commit 2dd12d1) introduced the @rpc build() method that runs _maybe_build() during build instead of start. HEAD's PR #2131 already had the same change with an explanatory comment about why heavy work belongs in build(); kept HEAD's comment, accepted main's identical build() body.

Records the 3 greptile review comments + 1 Jeff self-todo: - 3254780591: apply_closure.py — already fixed in current code (uses self.map_override.publish); greptile was looking at older rev - 3254780620: rust main.rs:460 DDA cap — lives in the deferred conflict file, defer until Jeff picks Rust binary version - 3254780663: DynamicCloud.py:182 ts=0.0 doc comment — doc-only nit - 3254844181: jeff-hykin self-todo on memory2/module.py:309

Conflict on dimos/core/native_module.py: keep the PR-side comment block describing why heavy build work belongs in build() rather than start(). main's version of build() has no such comment; both sides converged on the same body (super().build() + self._maybe_build()).

Records the 3 greptile review comments + 1 Jeff self-todo: - 3254780591: apply_closure.py — already fixed in current code (uses self.map_override.publish); greptile was looking at older rev - 3254780620: rust main.rs:460 DDA cap — lives in the deferred conflict file, defer until Jeff picks Rust binary version - 3254780663: DynamicCloud.py:182 ts=0.0 doc comment — doc-only nit - 3254844181: jeff-hykin self-todo on memory2/module.py:309

leshy and others added 30 commits May 6, 2026 14:52

new hong kong office recording

215f337

recordings fixed

8d8b028

new recording

36ebc5f

by default we run go2_short recording

bb3427f

trimmed short recording

bec4833

recording fixes

54dc893

switched example analysis over to new recording

2669672

Merge remote-tracking branch 'origin/dev' into ivan/fix/go2recording

3b3c670

# Conflicts: # data/.lfs/go2_hongkong_office.db.tar.gz # data/.lfs/go2_short.db.tar.gz

recorder module refactor

68cce39

undo memory docs

6e62fd4

undo transform changes

b385513

fastlio memory

961c9e4

fastlio experiments

6d60368

Merge remote-tracking branch 'origin/main' into ivan/feat/livox_recorder

12110e2

Render spheres

0574778

Remove unused code

c118caf

Fix

fd7311e

Mypy

9e02acd

merge main into jeff/feat/better_pgo

e45b228

make model_path optional

1cf0045

add flowbase robot

c9a426a

jeff-hykin added 10 commits May 20, 2026 23:05

add todo

d2b67f2

merge main

bbf481d

hide bad global maps

37c4f2b

fixup pyproject toml

0a066ce

improve visuals

b652aa2

rust native modules build via nix

f8983a8

-

3118917

-

3541e19

add nix helper

2372712

Merge branch 'jeff/clean/nav4' of github.com:dimensionalOS/dimos into…

e4f9848

… jeff/clean/nav4

jeff-hykin force-pushed the jeff/clean/nav4 branch from d70d322 to e4f9848 Compare May 22, 2026 08:17

jeff-hykin added 12 commits May 22, 2026 16:15

merge

5029014

rename pt1

edf0544

rename pt2

910ef9c

merge

d0545f0

remove noise

6bb886b

remove world frame

f118a9d

organize a bit

66f9b36

clean up transform frame

6c96c21

-

b290839

merge

dee87a9

hide go2 stuff by default

c7cbecd

add cancel goal for local planner

7ddd959

put pgo publish on the outside

cecd475

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nav pt5: Dynamic Global Map with Loop Closure Voxel Transform#2131

Nav pt5: Dynamic Global Map with Loop Closure Voxel Transform#2131
jeff-hykin wants to merge 227 commits into
mainfrom
jeff/clean/nav4

jeff-hykin commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jeff-hykin commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Purple Boxes = Important

ApplyClosure (Graph Transformation)

Dynamic Point-Clearing

How to Test

Contributor License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeff-hykin commented May 17, 2026 •

edited

Loading