feat: adding bandwidth test#583
Conversation
a7797e7 to
7c03eaa
Compare
Consolidated Tests Results 2026-05-22 - 07:32:28Test ResultsDetails
test-reporter: Run #2349
🎉 All tests passed!TestsView All Tests
🍂 No flaky tests in this run. Github Test Reporter by CTRF 💚 🔄 This comment has been updated |
dvdplm
left a comment
There was a problem hiding this comment.
I think the code here is mostly fine. I do wonder if "bandwidth" is the proper name for it. When measuring bandwidth I think you actually want to reason in terms of rate, i.e. how many bytes/sec can I cram through the pipes. To do that properly I think you need to start small and ramp up the rate over time until the connection is saturated (i.e. when the host starts returning errors or the sender isn't able to send data fast enough).
This PR isn't quite doing that (and probably shouldn't either), but perhaps we need to think of a more appropriate name?
| /// Payload size per session in bytes | ||
| #[arg(short, long)] | ||
| payload_size: u32, |
There was a problem hiding this comment.
So if I set this to, say, 10_000 and duration to 10, does it send 1000 bytes/sec? Or is it 10_000 bytes per send and it tries to send as many as it can for the duration? Is it per session?
There was a problem hiding this comment.
No, I updated the doc to describe how things work.
Also, the comment in the snippet says "per session"
There was a problem hiding this comment.
I know I'm being a nuisance but I am still confused.
If I run the benchmark against 3 endpoints, hostA, hostB and hostC, and pick 4 sessions and 10_000 bytes as the payload size, 30s as the duration, what is actually sent to each?
Is it: each host A, B and C are each sent 10_000*4 bytes repeatedly for 30s?
There was a problem hiding this comment.
Say the context id you provide corresponds to a context with 4 parties (so there's also a hostD here that we are not pinging via the CLI).
The roughly you'd have hostA, hostB, hostC each spawning 4 tokio tasks running the "bandwidth_bench" for 30s.
What the banwidth_bench does is it spawns one tokio task for each peer, sends messages of 10KB and repeat as soon as it receives an ack.
So from hostA's PoV you'll have:
- hostA as a sender, spawning 4 bandwidth_bench task, each spawning 3 "sending" tasks (one sending to hostB, one to hostC, one to hostD).
- hostA as a receiver, will receive the same kind of messages from hostB and hostC running the above too
| /// Context id for the bandwidth benchmark | ||
| #[arg(short, long)] | ||
| context_id: String, |
There was a problem hiding this comment.
dumb question: why would an operator want to set this? And how should an ignorant operator choose it? Is it just any string like "David messing with things"?
There was a problem hiding this comment.
Doc has been updated in the meantime, lmk if that answers this question
|
🚀 Preview deployment is deployed in "thresholdWithEnclave" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
|
🚀 Preview deployment is deployed in "thresholdWithEnclave" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
1 similar comment
|
🚀 Preview deployment is deployed in "thresholdWithEnclave" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
f6fccb8 to
3b22cdc
Compare
|
🚀 Preview deployment is deployed in "thresholdWithEnclave" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
|
🚀 Preview deployment is deployed in "threshold" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
|
🚀 Preview deployment is deployed in "threshold" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
| /// Payload size per session in bytes | ||
| #[arg(short, long)] | ||
| payload_size: u32, |
There was a problem hiding this comment.
I know I'm being a nuisance but I am still confused.
If I run the benchmark against 3 endpoints, hostA, hostB and hostC, and pick 4 sessions and 10_000 bytes as the payload size, 30s as the duration, what is actually sent to each?
Is it: each host A, B and C are each sent 10_000*4 bytes repeatedly for 30s?
|
🚀 Preview deployment is deployed in "threshold" mode You can now port-forward the kms-core to run your tests locally against the preview deployment. Connect to the Tailscale cluster zws-dev: tailscale configure kubeconfig tailscale-operator-zws-dev.diplodocus-boa.ts.netPort-forward the kms-core services to run your tests locally: kubectl port-forward svc/kms-core-1-core-1 -n kms-ci-titouantanguy-583 50100:50100 & \
kubectl port-forward svc/kms-core-2-core-2 -n kms-ci-titouantanguy-583 50200:50100 & \
kubectl port-forward svc/kms-core-3-core-3 -n kms-ci-titouantanguy-583 50300:50100 & \
kubectl port-forward svc/kms-core-4-core-4 -n kms-ci-titouantanguy-583 50400:50100 &Copy config.toml from core-client to ./core-client/config/client_local_kind_threshold.toml kubectl cp kms-ci-titouantanguy-583/kms-core-client-0:/app/kms-core-client/config.toml ./core-client/config/client_local_kind_threshold.toml && \
sed -i '' -E 's|address = "(http://)?kms-core-([0-9]+)-[^:]+:50100"|address = "http://localhost:50\200"|' ./core-client/config/client_local_kind_threshold.toml🚀 And launch your tests: cargo nextest run --test kubernetes_test_threshold --profile ci --no-fail-fast --features="kind_tests"You can connect to the core-client with: kubectl exec kms-core-client-0 -n kms-ci-titouantanguy-583 -it -- /bin/bashClose your port-forwarding with: pgrep -f "kubectl port-forward" | xargs -n 1 kill |
b8564ff to
fd72012
Compare
There was a problem hiding this comment.
Pull request overview
Adds a bandwidth benchmarking capability to the KMS ecosystem by introducing a new gRPC RPC (BandwidthBenchmark) on the core service endpoint and a corresponding kms-health-check bandwidth-bench CLI subcommand. This builds on the existing threshold-networking health-check channel/auth path to measure sustained payload send/ack throughput and report results in text or JSON.
Changes:
- Add
BandwidthBenchmarkRPC + protobuf types and wire it into the threshold engine endpoint. - Extend threshold-networking health-check plumbing to support payload bytes and per-peer connection pooling.
- Add
kms-health-check bandwidth-benchcommand and output formatters (text/JSON).
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/kms-health-check/src/output.rs | Adds text + JSON report formatting for benchmark results. |
| tools/kms-health-check/src/main.rs | Adds bandwidth-bench subcommand and dispatch logic. |
| tools/kms-health-check/src/grpc_client.rs | Adds client call for BandwidthBenchmark with duration-based timeout. |
| tools/kms-health-check/src/checks.rs | Builds BandwidthBenchmarkRequest and invokes the new RPC. |
| tools/kms-health-check/README.md | Documents the new CLI benchmark usage and example output. |
| core/threshold-networking/src/sending_service.rs | Introduces per-peer client pools (multiple gRPC connections per peer). |
| core/threshold-networking/src/health_check.rs | Extends health-check sessions to send payload bytes and run the benchmark. |
| core/threshold-networking/src/grpc.rs | Adds pooled health-check session builder and tweaks health ping logging. |
| core/threshold-networking/protos/gnetworking.proto | Adds payload field to HealthCheckRequest for bandwidth testing. |
| core/threshold-networking/Cargo.toml | Adds rand dependency for random payload generation. |
| core/service/src/engine/threshold/service/session.rs | Adds pooled healthcheck-session creation used by the benchmark. |
| core/service/src/engine/threshold/mod.rs | Exposes the new bandwidth_bench module. |
| core/service/src/engine/threshold/endpoint.rs | Wires the new RPC into the threshold endpoint implementation. |
| core/service/src/engine/threshold/bandwidth_bench.rs | Implements server-side benchmark orchestration + aggregation (new file). |
| core/service/src/engine/centralized/endpoint.rs | Adds a centralized-mode handler for the new RPC (currently unimplemented!). |
| core/grpc/proto/kms.v1.proto | Adds benchmark request/response protos and latency structures. |
| core/grpc/proto/kms-service.v1.proto | Adds BandwidthBenchmark RPC to the secure service definition. |
| core/grpc/proto/kms-service-insecure.v1.proto | Adds BandwidthBenchmark RPC to the insecure service definition. |
| Cargo.lock | Locks rand (workspace) dependency resolution. |
Comments suppressed due to low confidence (1)
core/service/src/engine/threshold/bandwidth_bench.rs:29
- This new RPC adds non-trivial parsing/behavior (kind handling, connection-pool sizing, aggregation). The codebase has extensive Rust test coverage under
core/service/src/client/tests, but there are no tests coveringBandwidthBenchmarkyet. Please add at least one integration-style test exercising the RPC (includingconnections_per_peer = 0clamping andOncevsDuration).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Runs a bandwidth benchmark against a lit of KMS endpoints. | ||
| /// NOTE: It makes more sense to run it on all the parties at the same time to emulate real bandwidth usage, but it can be run on a subset of parties as well. |
| /// KMS endpoints to test (e.g., --endpoints host1:50100 --endpoints host2:50100) | ||
| #[arg(short, long)] | ||
| endpoints: Vec<String>, | ||
|
|
| async fn bandwidth_benchmark( | ||
| &self, | ||
| _request: Request<kms_grpc::kms::v1::BandwidthBenchmarkRequest>, | ||
| ) -> Result<Response<kms_grpc::kms::v1::BandwidthBenchmarkResponse>, Status> { | ||
| unimplemented!("Bandwidth benchmark is not supported in centralized KMS"); | ||
| } |
| pub(crate) async fn run_bandwidth_benchmark( | ||
| request: Request<BandwidthBenchmarkRequest>, | ||
| session_maker: ImmutableSessionMaker, | ||
| ) -> Result<Response<BandwidthBenchmarkResponse>, Status> { | ||
| tracing::info!("Received bandwidth benchmark request: {:?}", request); | ||
| let request = request.into_inner(); |
| // Be safe and use random bytes as payload to avoid any compression that | ||
| // could happen before TLS layer | ||
| let payload = (0..payload_size) | ||
| .map(|_| rand::random::<u8>()) | ||
| .collect::<Vec<u8>>(); |
| join_set.spawn(async move { | ||
| let mut total_bytes_sent = 0; | ||
| let start = std::time::Instant::now(); | ||
| let mut answers = Vec::new(); | ||
| match duration { | ||
| BenchKind::Once => { |
| for ((role, id), (bytes_sent, duration, status)) in result { | ||
| let (entry_sent, entry_duration, entry_status) = results | ||
| .entry((role, id)) | ||
| .or_insert_with(|| (0, vec![], vec![])); | ||
| entry_sent.add_assign(bytes_sent); | ||
| entry_duration.push(duration); | ||
| entry_status.extend(status); |
| /// Only the first connection per peer is cached in the | ||
| /// [`crate::sending_service::GrpcSendingService`] channel map; the rest | ||
| /// live for the duration of the returned session. |
| // Returns a health check session for the given context. | ||
| //pub(crate) async fn get_healthcheck_session( | ||
| // &self, | ||
| // context_id: &ContextId, | ||
| //) -> anyhow::Result<HealthCheckSession<Role>> { |
| .map(|((role, id), (bytes_sent, durations, status))| { | ||
| // Fill up tha latency struct | ||
| let latency = make_latency(status) |
Description of changes
Adds a tool to measure bandwidth between the parties on the P2P gRPC endpoint.
NOTE:
The parsing of the results as well as the implementation of the connection pool are mostly AI generated.
COPY of the doc:
Bandwidth Benchmark
The tool can also be used to perform a bandwidth benchmark between the different parties.
As for the healthcheck, the benchmark will use the same gRPC server as the MPC protocol,
and will also perform the party authentication. Both a text and a json output format are available.
The bandwidth benchmark expects the following parameters:
CONTEXT_IDMPC context we want to benchmark (i.e. corresponds to the set of parties)DURATION_SECthe duration of the experiment (in seconds)NUM_SESSIONSthe number of sessions spawned in parallelPAYLOAD_SIZEthe size of the payload sent over and over by each sessions (in bytes)ENDPOINTthe address of the nodes that will be sending dataCONNECTIONS_PER_PEERthe number of dedicated TCP connections to the other peersThe bandwidth benchmark consists in spawning
NUM_SESSIONSsessions (each in their own tokio task) that will send a payload ofPAYLOAD_SIZEbytes, wait for the ack from the other party and repeat.This is done over a period of
DURATION_SECafter which the results are collected and displayed.If the
DURATION_SECis set to 0, then we only send a single payload per session; this may be useful to test how long it takes to clear a big number of sessions.To better emulate what happens during the execution of an MPC protocol, it's best to perform the bandwidth benchmark on all the parties at the same time, such that all the parties send and receive the same amount of data.
NOTE: The gRPC timeout between the tool and the nodes is set to be
DURATION_SECplus the usual request timeout configurable via environment variable as described above.An example output is:
PR Checklist
I attest that all checked items are satisfied. Any deviation is clearly justified above.
chore: ...).TODO(#issue).unwrap/expect/paniconly in tests or for invariant bugs (documented if present).devopslabel + infra notified + infra-team reviewer assigned.!and affected teams notified.Zeroize+ZeroizeOnDropimplemented.unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.Dependency Update Questionnaire (only if deps changed or added)
Answer in the
Cargo.tomlnext to the dependency (or here if updating):More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md