Nav pt5: Dynamic Global Map with Loop Closure Voxel Transform#2131
Nav pt5: Dynamic Global Map with Loop Closure Voxel Transform#2131jeff-hykin wants to merge 201 commits into
Conversation
# Conflicts: # data/.lfs/go2_hongkong_office.db.tar.gz # data/.lfs/go2_short.db.tar.gz
… rrb
Native module (cpp/main.cpp) now publishes two new streams on every
keyframe: GraphNodes3D for keyframe optimized poses, LineSegments3D for
odometry (traversability=1.0) and loop-closure (0.4) edges. Both wire
through SimplePGO::keyPoses() + historyPairs() — no changes needed to
simple_pgo.{h,cpp} since the accessors already exist. Native binary
rebuilt cleanly via nix build .#default --no-write-lock-file.
Python (pgo.py) declares matching pgo_graph_nodes / pgo_graph_edges Out
streams so the rerun bridge auto-discovers and logs them.
nav_stack_rerun_config() now picks _agentic_debug_rerun_blueprint when
agentic_debug=True — an rrb.Horizontal layout with a 3D pane and a
dedicated top-down pane (both Spatial3DView over origin="world", named
"3D" and "top_down" so dimos-viewer persists camera state separately).
demo_better_pgo_viz.py composes the cross-wall sim blueprint with
agentic_debug=True so the new layout + pose graph render together. Used
for manual screenshot validation.
Adds visual_override entries for world/pgo_graph_nodes and world/pgo_graph_edges that mirror the existing FAR pattern: when agentic_debug=True, the PGO pose graph renders at z=_AGENTIC_DEBUG_LIFT (3.0m) instead of the default 1.7m, with slightly larger node radii (0.15) and edge thickness (0.06) so the green keyframe trajectory stands out clearly above the terrain cloud in the top-down pane. Verified visually via demo_better_pgo_viz with the cross-wall sim — green keyframe nodes + edges are now plainly identifiable above terrain in both the 3D and top_down rerun panels.
rerun's Spatial3DView doesn't have a top-down camera API, so the "top_down" pane introduced in a7a9be9 was just a duplicate 3D view. Drop _agentic_debug_rerun_blueprint and use _default_rerun_blueprint unconditionally — the agentic_debug lift on visual_override is what actually makes the pose graph and nav markers readable from any angle.
C++ side (main.cpp): when searchForLoopPairs sets m_cache_pairs (i.e. this keyframe will be incorporated into iSAM2 with a loop factor), snapshot the current global poses before smoothAndUpdate. After the update, build a nav_msgs::Path-encoded LoopClosureDeltas message: position = post.t - r_delta * pre.t, orientation = quaternion(post.R * pre.R^T). Publish on the new pgo_loop_closure topic. Stderr logs the event count for live observability. Python side (pgo.py): declare pgo_loop_closure: Out[NavPath] so the new topic is registered alongside corrected_odometry/pgo_tf/etc. Slow test (test_pgo_loop_closure.py): replays og_nav_60s through the native binary with permissive thresholds (loop_time_thresh=5s, min_loop_detect_duration=1s, loop_search_radius=2m, loop_score_thresh=0.5) so the recording reliably triggers loop closures. Subscribes to pgo_loop_closure, logs each event the moment it arrives (event #, poses_length, frame_id, first delta), and after the run validates each event has >0 poses, finite translations (<100m), and unit-norm quaternions (drift <0.05). Stdout from a run shows 19 events, sizes 10..35, max |t|=0.0013m, max |q|-1|=1e-6 — exactly the small-nudge profile expected from a self-consistent recording.
Replaces the kdtree-on-keyframe-positions loop search with a Scan
Context (Kim & Kim 2018) descriptor-based pipeline:
1. addKeyPose now also caches a polar-binned (20 rings × 60 sectors)
max-z descriptor + the per-row mean "ring key" for each keyframe.
The descriptor is appearance-based and pose-independent, so it
keeps working even when odometry has drifted enough that the new
keyframe is no longer "near" its old neighbours in pose-space.
2. searchForLoopPairs first asks Scan Context for a candidate:
ring-key L2 distance ranks all past keyframes, top-K are scored
by column-shifted cosine distance on the full descriptor, the
best below the threshold (default 0.4) is the candidate. The
winning column shift is also converted to a yaw rotation and used
to seed ICP, which dramatically improves convergence on revisits
that arrive at a different heading from the original pass.
3. Position-based search is retained as a fallback when SC is
disabled or finds nothing, so existing behaviour is preserved.
Replaces ~50 lines of position-search with ~30 lines of SC retrieval
in searchForLoopPairs; new scan_context.{h,cpp} (~150 lines, MIT
attribution to upstream irapkaist/scancontext concepts but no source
copied) implements the descriptor + distance.
Side-effect: this makes on-start relocalization a small follow-up
addition — descriptors + ring-keys + poses are now per-keyframe state
that can be serialised, and the SC search path already does
"appearance-based pose recovery without an initial pose guess."
Verified via test_pgo_loop_closure.py: 17 loop-closure events fired
across the og_nav_60s rosbag (was 19 with naive position search; SC
is more selective and rejects two borderline-position matches that
weren't actually visual revisits). All events have valid shape + tiny
quaternion/translation deltas as expected for a self-consistent bag.
…n search misses Adds CLI args to expose Scan Context config on the native binary (--use_scan_context, --sc_n_rings, --sc_n_sectors, --sc_max_range_m, --sc_top_k, --sc_match_threshold). New slow test test_pgo_synthetic_drift.py: - Synthesises a 4-wall point-cloud room with two distinctive interior columns (so the scene isn't rotationally symmetric). - Generates an out-and-back trajectory: drives east 8m then returns to the origin, heading unchanged. - Injects DRIFT_AT_REVISIT_M = 5m of additive y-drift into the reported odometry, ramped linearly with travelled distance. The body-frame scan stays byte-identical between first and second visit (same true sensor view of the same scene); the odom pose at revisit is 5m offset. - Runs the native PGO binary twice over the same input: * use_scan_context=true → expect ≥1 loop event * use_scan_context=false → expect 0 loop events (drift >> 1m radius) - Dumps PGO stderr after each run for diagnostics. Result: SC fires 10 loop closure events on the synthetic trajectory; position-based search fires 0 — exactly the demonstration of why we swapped to appearance-based place recognition. Both assertions pass. Verifies the core SC value prop: appearance-based place recognition doesn't depend on the (drifted) pose, so it keeps working when the odometry has wandered far enough that the kdtree-on-positions search no longer finds neighbours.
Test files now use setup_logger() / logger.info(...) per the fix_nits rule "no print() calls in tests; use logging if diagnostics are genuinely needed." Matches the existing test_pgo_rosbag.py convention. Also drops the now-unused sys import. Also clears a stale docstring on demo_better_pgo_viz.py: it claimed the demo enabled a "horizontal 3D + top-down panes" layout, which was reverted in 1801759 — rerun's Spatial3DView didn't support an initial camera angle (rrb.EyeControls3D existed at the time but wasn't used). The remaining value of agentic_debug=True is the visual override lift, which the new docstring describes accurately. No behavioural change. Tests still pass.
Sweep over names introduced by the better_pgo work that hit fix_nits
"expand mod -> module" rule:
- scan_context: cfg -> config (param + 12 call-sites); d (return val) ->
descriptor in make_descriptor/make_ring_key/make_sector_key; pt -> point
in the descriptor build loop; zf -> point_z (float cast); q_col/c_col
-> query_column/candidate_column; q_norm/c_norm -> query_norm/
candidate_norm; cj -> shifted_j; d (in best_distance return loop) ->
distance with min_distance for the running best.
- simple_pgo: desc -> descriptor on the per-keyframe cache; k ->
top_k_count for the partial-sort bound; structured-binding `auto [d,
shift]` -> `auto [distance, shift]`.
- main.cpp: kp -> keyframe; ps -> pose_stamped (build_graph_nodes and
build_loop_closure_deltas); a/b -> start/end and p1/p2 ->
start_pose/end_pose in append_segment; n -> count for the loop bound;
lc_msg -> loop_closure_msg at the publish site.
- tests: ps -> pose in the validate loop (test_pgo_loop_closure);
c,s -> cos_yaw,sin_yaw in _yaw_rotation (test_pgo_synthetic_drift).
Names that intentionally stay short are the math-convention ones:
r/t for SE(3) rotation+translation, q for quaternion, i/j as loop
indices, idx as keyframe index, ts as timestamp, dt for time delta,
tx/ty/tz/qx/qy/qz/qw for component decomposition. The fix_nits rule
calls out mod/lc as the target pattern; expanding the math-notation
names would make the code less readable, not more.
Also drops one section-label comment ("# Log each event the moment it
arrives.") whose adjacent function name already conveys the same and
one in-loop "# node_type 1 = odom/robot" that repeats info already
stated in the function-level docstring.
Native binary rebuilt + slow test still passes (17 events, all valid).
Drops in the wiring for evaluating the PGO native module on KITTI-360. Cannot run end-to-end yet — the dataset is gated behind a registered login at cvlibs.net so the data download is a manual user step. What's here: - kitti360_loader.py: parses the KITTI-360 directory layout (data_3d_raw + data_poses + calibration); composes per-frame lidar→world pose by chaining cam0_to_world ⊕ inv(velo_to_cam). Exposes a frame iterator + scan_xyz(frame_id). - loop_groundtruth.py: LCDNet/KITTI-convention groundtruth (≥50 frame gap, ≤4m radius), order-agnostic scoring of detected pairs. - run_kitti360_benchmark.py: argparse CLI, spawns the native binary on private LCM topics, plays (registered_scan, odometry) from disk, subscribes to pgo_graph_edges to extract loop-closure pairs (via traversability ≈ 0.4 segments) and pgo_loop_closure for delta event counts. Writes JSON. - README.md: download instructions for the official "Test SLAM 3D" 12 GB package, published SOTA reference numbers from LCDNet + ISC papers (LCDNet 0.91-0.93 AP, Scan Context 0.62-0.78 AP), expected ballpark for our minimal SC port.
PGO's pose_graph was rendering as nodes-only outside agentic_debug mode: with no visual_override registered, the bridge fell back to Graph3D.to_rerun() which intentionally returns just rr.Points3D of the nodes (to_rerun_multi is the path that emits edges too). Bridge fix: in final_convert, prefer to_rerun_multi(base_path=...) when the message exposes it. The bridge already knows the entity_path it would log to, so pass it as base_path. Graph3D consumers now get nodes + edges by default, no per-blueprint visual_override required. Existing visual_overrides still win because they run before final_convert in the pipe. Only Graph3D currently defines to_rerun_multi, so this is effectively a Graph3D-rendering fix; other RerunConvertible types (TFMessage, etc.) fall through to to_rerun() unchanged.
- per-module (registered_scan→lidar) remaps now live inside create_nav_stack, so per-robot blueprints (alfred, g1, mobile.py) stop repeating (FastLio2,"lidar","registered_scan") - ApplyClosure publishes on map_override; auto-connects into RayTracingVoxelMap without an explicit remap - UnityBridgeModule renames registered_scan→lidar to match the new convention - local_planner: switch Bool import to dimos_lcm.std_msgs.Bool
Greptile SummaryThis PR adds a Dynamic Global Map with Loop Closure Voxel Transform to the navigation stack. It introduces a Rust-based
Confidence Score: 5/5The PR is safe to merge. The core LBS warp math, DDA raycast clearing, DynamicCloud wire-format round-trip, and the closure-feedback loop are all correct. The apply-closure math correctly applies R*p + t in both the single-node and multi-node code paths. The DynamicCloud binary format is pinned by cross-language tests. Health-value restoration from map overrides correctly roundtrips for all published voxels. The previously flagged corrected_global_map port-name mismatch is resolved. Only minor documentation and observability improvements remain. No files require special attention beyond the inline suggestions. Important Files Changed
Sequence DiagramsequenceDiagram
participant FastLio2 as FastLio2
participant RT as RayTracingVoxelMap
participant AC as ApplyClosure
participant PGO as PGO
participant SP as SimplePlanner
FastLio2->>RT: lidar PointCloud2
FastLio2->>RT: odometry
Note over RT: DDA raycast clearing + health scoring
RT->>AC: global_map DynamicCloud
RT->>SP: global_map DynamicCloud
AC->>AC: latch latest map
FastLio2->>PGO: registered_scan + odometry
Note over PGO: iSAM2 keyframe + ScanContext loop search
PGO->>AC: loop_closure_event GraphDelta3D
Note over AC: LBS warp via slerp/lerp per voxel timestamp
AC->>RT: map_override DynamicCloud warped
Note over RT: Clear + restore from override
RT->>AC: global_map corrected on next lidar scan
RT->>SP: global_map corrected on next lidar scan
Note over SP: Rebuild costmap from world_positions
Reviews (2): Last reviewed commit: "-" | Re-trigger Greptile |
|
|
||
| struct ExtractError(&'static str); | ||
| impl std::fmt::Display for ExtractError { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| f.write_str(self.0) | ||
| } | ||
| } | ||
|
|
||
| fn extract_xyz(msg: &PointCloud2) -> Result<Vec<(f32, f32, f32)>, ExtractError> { | ||
| let mut x_off: Option<usize> = None; |
There was a problem hiding this comment.
Hardcoded DDA iteration cap may silently truncate rays
max_iter = 4096 is a safety guard against infinite loops, but if a ray must traverse more than 4096 voxels (e.g., max_range = 30 m at voxel_size = 0.1 m gives up to ~520 cells for an axial ray, but a near-axis-aligned diagonal can reach ~900 cells), the loop exits early without marking the remaining voxels as misses. This leaves stale occupied voxels along the far end of long rays. The existing FIXME notes the concern; consider deriving max_iter from ceil(max_range / voxel_size * sqrt(3)) or removing the cap entirely since DDA is guaranteed to terminate.
| quantity_size = num_points * 4 | ||
| if len(data) < offset + voxels_size + quantity_size + _U32_SIZE: | ||
| raise ValueError("DynamicCloud: data too short for voxels + quantity + num_events") |
There was a problem hiding this comment.
ts=0.0 silently encodes as zero nanoseconds
timestamp_nanos = int(self.ts * 1_000_000_000) if self.ts else 0 treats ts=0.0 as "no timestamp" via Python truthiness. The lcm_decode path has a matching asymmetry. This is self-consistent but a brief comment noting the 0 = unset convention would prevent future confusion.
❌ 3 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Pull request was converted to draft
| ts = getattr(msg, "ts", None) or time.time() | ||
| frame_id = getattr(msg, "frame_id", None) or default_frame_id | ||
| # For msgs that carry a parent→child transform (Odometry, | ||
| # TransformStamped), child_frame_id is the body whose pose we |
Problem
We want 1 map thats global and accurate and real time
Solution
Purple Boxes = Important
ApplyClosure (Graph Transformation)
The vid simulates major drift + loop closure.
The GREEN part at the end of the video is the transformation (in slow-motion) being applied to the global point cloud. Here's how its done:
loop_closure_transform.mov
Dynamic Point-Clearing
NOTE: I'm scrubbing-through the replay in this video (not real time playback). I'm showing how well you can a human with very minimal artifacting.
How? A slightly modified version of Andrew's Rust Raycast module!
raycast_point_clearing.mov
Complete Wiring

How to Test
Contributor License Agreement