Skip to content

feat(router): add mtls support to grpc subgraphs#2861

Draft
dkorittki wants to merge 6 commits into
mainfrom
dominik/eng-9363-tls-support
Draft

feat(router): add mtls support to grpc subgraphs#2861
dkorittki wants to merge 6 commits into
mainfrom
dominik/eng-9363-tls-support

Conversation

@dkorittki
Copy link
Copy Markdown
Contributor

@dkorittki dkorittki commented May 13, 2026

WIP

Summary by CodeRabbit

  • New Features

    • TLS/mTLS support for outbound gRPC subgraph connections, with global settings and per-subgraph overrides.
    • CA file verification and client certificate (mTLS) options for gRPC clients; gRPC transports can now use TLS when configured.
  • Tests

    • New integration tests validating gRPC TLS/mTLS behaviors: global vs per-subgraph overrides, CA verification, client cert success/failure, and combined scenarios.

Review Change Stack

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/docs-website.
  • I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

@dkorittki dkorittki changed the title feat: add (m)tls support to grpc subgraphs feat(router): add (m)tls support to grpc subgraphs May 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 72c40915-8081-4b4b-804f-d700077a99ce

📥 Commits

Reviewing files that changed from the base of the PR and between 9350004 and 52f76b5.

📒 Files selected for processing (3)
  • router-tests/security/subgraph_grpc_mtls_test.go
  • router-tests/security/subgraph_mtls_test.go
  • router/pkg/config/config.go

Walkthrough

This PR adds gRPC-specific client TLS configuration and schema, refactors the TLS builder to accept global and per-subgraph inputs, threads default/per-subgraph gRPC TLS through the graph server into the gRPC connector, updates the gRPC provider and test env for TLS, and adds integration tests for TLS/mTLS scenarios.

Changes

gRPC Client TLS Configuration

Layer / File(s) Summary
Configuration schema and contracts
router/pkg/config/config.go, router/pkg/config/config.schema.json, router/pkg/config/fixtures/full.yaml, router/pkg/config/testdata/config_defaults.json, router/pkg/config/testdata/config_full.json
New GRPCClientTLSConfiguration and TLS.ClientGRPC with all and subgraphs fields; JSON schema and fixtures updated.
TLS config building refactor
router/core/tls.go, router/core/tls_test.go
buildSubgraphTLSConfigs now accepts a raw TLSClientCertConfiguration (all) and a map[string]TLSClientCertConfiguration (subgraphs), builds default and per-subgraph *tls.Config, and tests updated to the new signature.
Graph server TLS integration
router/core/graph_server.go
Add defaultClientTLS and perSubgraphTLS to BuildGraphMuxOptions, build gRPC client TLS configs in newGraphServer, refactor connector setup to accept an options struct, and pass per-subgraph TLS into RemoteGRPCProvider.
gRPC remote provider TLS support
router/pkg/grpcconnector/grpcremote/grpc_remote.go
Add TLSConfig *tls.Config to RemoteGRPCProviderConfig, store on provider, and choose credentials.NewTLS vs insecure credentials when dialing.
Test environment TLS setup
router-tests/testenv/testenv.go
Add SubgraphConfig.GRPCTLSConfig *tls.Config, extend makeSafeGRPCServer to accept a TLS config and append grpc.Creds(credentials.NewTLS(...)), and wire TLS into test server creation functions.
Comprehensive gRPC mTLS integration tests
router-tests/security/subgraph_grpc_mtls_test.go
Add TestSubgraphGRPCmTLS with scenarios for InsecureSkipCaVerification, client-certificate mTLS success/failure, CA CaFile verification, full mTLS, and helper grpcSubgraphTLSServerConfig to create server TLS configs.
Update existing HTTP mTLS tests
router-tests/security/subgraph_mtls_test.go
Replace ClientTLSConfiguration usages with HTTPClientTLSConfiguration in existing mTLS tests and shared variables.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • wundergraph/cosmo#2863: The main PR's gRPC subgraph client TLS work builds directly on the TLS-structure refactor from #2863 by extending nested TLS config handling and adding gRPC client-specific wiring.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(router): add mtls support to grpc subgraphs' accurately and clearly summarizes the main objective of this pull request, which is to introduce mTLS (mutual TLS) support for gRPC subgraphs in the router.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

@dkorittki dkorittki changed the title feat(router): add (m)tls support to grpc subgraphs feat(router): add mtls support to grpc subgraphs May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-0ed94c66c1745b02e1f5d325deff50a215ea1a8d

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/pkg/grpcconnector/grpcremote/grpc_remote.go (1)

71-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Protect Start with the provider mutex.

Start reads/writes g.cc without synchronization while other lifecycle methods use mu, which can race under concurrent start/get/stop paths.

Proposed fix
 func (g *RemoteGRPCProvider) Start(ctx context.Context) error {
+	g.mu.Lock()
+	defer g.mu.Unlock()
+
 	if g.cc == nil {
 		var transportCreds grpc.DialOption
 		if g.tlsConfig != nil {
 			transportCreds = grpc.WithTransportCredentials(credentials.NewTLS(g.tlsConfig))
 		} else {
 			transportCreds = grpc.WithTransportCredentials(insecure.NewCredentials())
 		}
 
 		clientConn, err := grpc.NewClient(g.endpoint, transportCreds)
 		if err != nil {
 			return fmt.Errorf("failed to create client connection: %w", err)
 		}
 
 		g.cc = clientConn
 	}
 
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@router/pkg/grpcconnector/grpcremote/grpc_remote.go` around lines 71 - 86,
Start currently reads/writes g.cc without acquiring the provider mutex (mu),
causing races with other lifecycle methods; modify RemoteGRPCProvider.Start to
acquire the same mutex used by other methods (mu) at the start of the function,
check g.cc while holding the lock, initialize g.cc if nil, and release the lock
(use defer Unlock immediately after Lock) so Start is synchronized with
Stop/GetClient and avoids data races on g.cc.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@router/core/tls.go`:
- Around line 71-73: The warning message logged when
sgCfg.InsecureSkipCaVerification is true is misleading (it says the subgraph
"inherits" from global config); update the logger.Warn call in tls.go (the
branch checking sgCfg.InsecureSkipCaVerification) to state that the subgraph TLS
config has InsecureSkipCaVerification enabled (or that the subgraph is
configured to skip CA verification), removing the word "inherits" and any
implication of global config so the message accurately reflects
sgCfg.InsecureSkipCaVerification and `logger.Warn` usage for the subgraph named
by `name`.

---

Outside diff comments:
In `@router/pkg/grpcconnector/grpcremote/grpc_remote.go`:
- Around line 71-86: Start currently reads/writes g.cc without acquiring the
provider mutex (mu), causing races with other lifecycle methods; modify
RemoteGRPCProvider.Start to acquire the same mutex used by other methods (mu) at
the start of the function, check g.cc while holding the lock, initialize g.cc if
nil, and release the lock (use defer Unlock immediately after Lock) so Start is
synchronized with Stop/GetClient and avoids data races on g.cc.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4e212fb2-4a36-4f65-bab3-b669318a1727

📥 Commits

Reviewing files that changed from the base of the PR and between 18d3ec0 and 1782dc5.

📒 Files selected for processing (14)
  • router-tests/security/subgraph_grpc_mtls_test.go
  • router-tests/testenv/testenv.go
  • router/core/graph_server.go
  • router/core/router.go
  • router/core/router_config.go
  • router/core/supervisor_instance.go
  • router/core/tls.go
  • router/core/tls_test.go
  • router/pkg/config/config.go
  • router/pkg/config/config.schema.json
  • router/pkg/config/fixtures/full.yaml
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/config/testdata/config_full.json
  • router/pkg/grpcconnector/grpcremote/grpc_remote.go

Comment thread router/core/tls.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 97.46835% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.30%. Comparing base (be9d015) to head (e12803e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
router/core/graph_server.go 94.73% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2861      +/-   ##
==========================================
+ Coverage   64.47%   66.30%   +1.83%     
==========================================
  Files         319      258      -61     
  Lines       45359    27105   -18254     
  Branches     4927        0    -4927     
==========================================
- Hits        29245    17972   -11273     
+ Misses      16089     7720    -8369     
- Partials       25     1413    +1388     
Files with missing lines Coverage Δ
router/core/tls.go 95.23% <100.00%> (ø)
router/pkg/config/config.go 84.53% <100.00%> (ø)
router/pkg/grpcconnector/grpcremote/grpc_remote.go 78.12% <100.00%> (ø)
router/core/graph_server.go 85.72% <94.73%> (ø)

... and 573 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dkorittki added 2 commits May 19, 2026 13:33
They are only needed when building a grpc connector
and are obsolete after that. Hence they don't need to be
a type field.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@router/core/graph_server.go`:
- Around line 357-358: The graph mux reuse logic currently ignores TLS changes
because defaultClientTLS and perSubgraphTLS are only considered on rebuild;
update the reuse decision so muxes/connectors are not reused when those TLS
settings differ by including defaultClientTLS and perSubgraphTLS in the mux
identity/comparison (or cache key) used by the base/feature-flag reuse branches,
and force a rebuild/recreate of the mux/connector whenever those TLS values
change; locate the reuse checks that decide to keep an existing mux (the
base/feature-flag reuse branches) and add equality checks or incorporate the TLS
structs so stale gRPC credentials are not retained.
- Around line 1411-1420: After successfully calling setupConnector
(setupConnector), ensure partially-initialized resources are cleaned up if
subsequent buildGraphMux fails: either keep the connector/local resources
(caches, metric stores, pubsub providers) in local variables and only assign
them to the server state (s.*) after buildGraphMux completes successfully, or
add a short-lived defer immediately after setupConnector that calls
graphServer.Shutdown (or the connector-specific cleanup routine) and cancels
that defer if buildGraphMux succeeds; update the error-return paths after
buildGraphMux to invoke the cleanup so no providers or connectors remain running
on failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 526ede46-37b7-48f3-bd51-4acaaabcd3af

📥 Commits

Reviewing files that changed from the base of the PR and between 21e77d2 and 9350004.

📒 Files selected for processing (2)
  • router/core/graph_server.go
  • router/pkg/grpcconnector/grpcremote/grpc_remote.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/grpcconnector/grpcremote/grpc_remote.go

Comment thread router/core/graph_server.go
Comment thread router/core/graph_server.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant