Skip to content

feat: remove rpyc#2094

Open
paul-nechifor wants to merge 1 commit into
mainfrom
paul/feat/remove-rpyc
Open

feat: remove rpyc#2094
paul-nechifor wants to merge 1 commit into
mainfrom
paul/feat/remove-rpyc

Conversation

@paul-nechifor
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor commented May 15, 2026

Problem

  • We don't want RPyC anymore.

Solution

  • Excise it.

Contributor License Agreement

  • I have read and approved the CLA.

@paul-nechifor paul-nechifor marked this pull request as draft May 15, 2026 04:47
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR removes RPyC from the dimos codebase and replaces it with the existing LCM-based RPC infrastructure already used for inter-module communication. The coordinator and worker-side RPyC servers are deleted; a new CoordinatorRPC wrapper publishes the coordinator's API over LCM, and RemoteModuleSource / LocalModuleSource are updated to use RPCClient and the new _RemoteProxy fallback instead of RPyC connections.

  • RPyC excised: rpyc_server.py, rpyc_services.py, python_worker.py's StartRpycRequest, and rpyc_port in RunEntry are all removed; CoordinatorRPC and an updated RPCClient.remote() factory replace the TCP socket layer.
  • Blueprint pickle support: __getstate__/__setstate__ are added to Blueprint so MappingProxyType fields survive pickle round-trips over LCM, replacing the previous copyreg workaround in RemoteModuleSource.
  • Forward-compatible registry: RunEntry.load() now filters unknown JSON keys, so registry files written by older dimos versions (containing rpyc_port) load cleanly without crashing.

Confidence Score: 5/5

The RPyC removal is clean and complete — all call sites, registry fields, worker messages, and server threads are excised. The LCM-based replacement reuses the existing transport layer consistently.

The change is a well-scoped excision of one dependency and its replacement with an already-proven transport. The new CoordinatorRPC wrapper is straightforward, Blueprint pickling is tested, and the forward-compatible RunEntry.load() handles upgrade scenarios gracefully. The only new finding is a cosmetic concern in an unrelated file ('true' as an alias in make_connection). All previously flagged issues are carry-overs from earlier threads.

The make_connection change in dimos/robot/unitree/go2/connection.py introduces an undocumented 'true' synonym for the mujoco connection type that deserves a second look.

Important Files Changed

Filename Overview
dimos/core/coordination/coordinator_rpc.py New file: wraps an LCMRPC instance to publish/consume the singleton Coordinator @rpc service; includes a startup probe to prevent duplicate services.
dimos/core/coordination/module_coordinator.py Replaces RpycServer with CoordinatorRPC; adds rpcs property, ping, list_modules, and load_blueprint_by_name as RPC-exposed methods; restart_module_by_class_name return type changed to None.
dimos/porcelain/remote_module_source.py Replaces RPyC TCP connection with CoordinatorRPC/LCM; adds _RemoteProxy fallback for unimportable module classes; _RemoteProxy._unsub_fns accumulate without cleanup on invalidate/close (flagged in previous review threads).
dimos/porcelain/local_module_source.py Simplified to return coordinator's existing proxy objects; get_module iterates _deployed_modules without the coordinator's lock (flagged in previous review threads).
dimos/core/rpc_client.py Adds optional shared-rpc constructor path (_owns_rpc flag) and RPCClient.remote() factory; stop_rpc_client correctly skips stopping a shared rpc instance.
dimos/core/run_registry.py Removes rpyc_port field and get_most_recent_rpyc_port; adds forward-compatible RunEntry.load() that silently drops unknown keys from old registry files.
dimos/core/coordination/blueprints.py Adds __getstate__/__setstate__ to Blueprint so MappingProxyType fields survive pickle round-trips over LCM; tested with a new pickle round-trip test.
dimos/robot/unitree/go2/connection.py Adds .lower() normalization for unitree_connection_type and adds "true" as an undocumented alias that routes to MujocoConnection.
dimos/porcelain/dimos.py Removes host/port/rpyc_port plumbing from connect(); adds connect_in_process() for test use; peek_stream rewritten to use the new peek_stream @rpc method and PeekNotFound sentinel.
dimos/robot/cli/dimos.py Switches from start_rpyc_service() to start_rpc_service() and drops rpyc_port from RunEntry construction in both daemon and non-daemon paths.

Sequence Diagram

sequenceDiagram
    participant CLI as dimos CLI
    participant MC as ModuleCoordinator
    participant CRPC as CoordinatorRPC (LCM)
    participant Client as Dimos.connect()
    participant RMS as RemoteModuleSource
    participant Proxy as RPCClient / _RemoteProxy

    CLI->>MC: start_rpc_service()
    MC->>CRPC: CoordinatorRPC.serve(coordinator)
    CRPC-->>CRPC: _ensure_no_existing_service (0.5s probe)
    CRPC-->>MC: CoordinatorRPC instance

    Client->>RMS: "RemoteModuleSource(timeout=5.0)"
    RMS->>CRPC: "CoordinatorRPC.connect(timeout=5.0)"
    CRPC-->>CRPC: call("ping") — liveness check
    CRPC-->>RMS: connected

    Client->>RMS: list_module_names()
    RMS->>CRPC: call("list_modules")
    CRPC->>MC: list_modules()
    MC-->>CRPC: [ModuleDescriptor, ...]
    CRPC-->>RMS: descriptors

    Client->>RMS: get_module("StressTestModule")
    RMS-->>RMS: importlib.import_module(qualified_path)
    alt class importable
        RMS-->>Proxy: "RPCClient(None, cls, rpc=coord.rpc)"
    else ImportError / AttributeError
        RMS-->>Proxy: _RemoteProxy(coord.rpc, name, rpc_names)
    end
    RMS-->>Client: proxy

    Client->>Proxy: proxy.ping()
    Proxy->>CRPC: call_sync("StressTestModule/ping", ...)
    CRPC-->>Proxy: "pong"
Loading

Reviews (3): Last reviewed commit: "feat: remove rpyc" | Re-trigger Greptile

Comment thread dimos/porcelain/dimos.py Outdated
Comment thread dimos/porcelain/remote_module_source.py
Comment thread dimos/porcelain/remote_module_source.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Comment thread dimos/core/coordination/blueprints.py Outdated
Comment thread dimos/core/coordination/test_module_coordinator.py Outdated
Comment thread dimos/core/module.py
@paul-nechifor paul-nechifor force-pushed the paul/feat/remove-rpyc branch 3 times, most recently from afbb2d4 to 0b95215 Compare May 17, 2026 04:41
@paul-nechifor paul-nechifor marked this pull request as ready for review May 17, 2026 04:44
@paul-nechifor paul-nechifor enabled auto-merge (squash) May 17, 2026 04:44
Comment on lines +32 to 36
maintains for inter-module calls. Method calls flow over the same LCM
bus the modules use to talk to each other.
"""

is_remote = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _deployed_modules iterated without the coordinator's lock

ModuleCoordinator always acquires _modules_lock before touching _deployed_modules (see list_module_names, load_module, _restart_module, etc.), but get_module here iterates the raw dict directly. Concurrent calls to Dimos.run() or a module restart on any other thread/task while get_module is mid-iteration will raise RuntimeError: dictionary changed size during iteration. Use the existing list_module_names() method (which holds the lock) together with a direct lookup, or add a lock-guarded helper on the coordinator.

Comment thread dimos/porcelain/dimos.py
Comment on lines +102 to 137
def connect(cls, *, run_id: str | None = None, timeout: float = 5.0) -> Dimos:
"""Connect to an already-running DimOS instance over LCM.

With no arguments, finds the most recent alive `RunEntry` in the
registry and connects to its coordinator RPyC endpoint. Use `run_id=` to
select a specific run, or `host=` + `port=` to bypass the registry.
registry and connects via LCM to its `Coordinator` @rpc service. Use
`run_id=` to select a specific run.

Returns a `Dimos` instance in read/call mode: `skills`, attribute
access, `__repr__` and `__dir__` work, but `run()` and `restart()` raise
`NotImplementedError`. `stop()` closes the connection without
terminating the remote process.
access, `__repr__` and `__dir__` work, but only methods marked with
`@rpc` (and `@skill`, which implies `@rpc`) on a module are callable.
`stop()` closes the connection without terminating the remote process.
"""
if host is not None and port is not None:
source: ModuleSource = RemoteModuleSource(host, port)
if run_id is not None:
entries = [e for e in list_runs(alive_only=True) if e.run_id == run_id]
if not entries:
raise RuntimeError(f"No running DimOS instance with run_id={run_id!r}")
else:
rpyc_port = get_most_recent_rpyc_port(run_id=run_id)
source = RemoteModuleSource("localhost", rpyc_port)
if get_most_recent(alive_only=True) is None:
raise RuntimeError(
"No running DimOS instance. Start one with `dimos run <blueprint>`."
)

source = RemoteModuleSource(timeout=timeout)
instance = cls()
instance._source = source
return instance

@classmethod
def connect_in_process(cls, *, timeout: float = 5.0) -> Dimos:
"""Connect over LCM without consulting the run registry.

For tests where the coordinator and the client live in the same
process and there is no `RunEntry` on disk.
"""
source = RemoteModuleSource(timeout=timeout)
instance = cls()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 run_id= validates existence but no longer routes the connection

The old implementation used get_most_recent_rpyc_port(run_id=run_id) to retrieve the specific TCP port for the requested run, so the socket actually landed on that process. The new code checks the registry entry exists, then creates RemoteModuleSource() which connects to whatever Coordinator happens to answer first on the shared LCM bus — regardless of which run_id was requested. In a registry with two alive entries (e.g. two daemon restarts whose old PIDs are still considered live), passing a run_id gives false confidence: the validation passes but the connection may go to the "wrong" coordinator. The docstring still says "Use run_id= to select a specific run", which is no longer accurate. Either remove the run_id parameter or document that it is now only a liveness guard, not a routing mechanism.

@paul-nechifor paul-nechifor force-pushed the paul/feat/remove-rpyc branch from c063cac to 93fc89c Compare May 17, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants