Skip to content

Harden registry rename and startup recovery#197

Merged
bnema merged 6 commits into
mainfrom
feat/registry-rename-startup-recovery-hardening
May 10, 2026
Merged

Harden registry rename and startup recovery#197
bnema merged 6 commits into
mainfrom
feat/registry-rename-startup-recovery-hardening

Conversation

@bnema
Copy link
Copy Markdown
Owner

@bnema bnema commented May 10, 2026

Summary

  • add shared handling for current and legacy Gordon registry domains across matching, internal pull rewriting, and auto-route ownership checks
  • make local deploys and startup recovery use Gordon's internal deploy semantics while keeping reboot recovery best-effort and independent from auto_route.enabled
  • canonicalize legacy Gordon-managed refs on save and document staged registry host renames and recovery behavior

Testing

  • rtk go test ./internal/domain ./internal/usecase/auto ./internal/usecase/config ./internal/usecase/container ./internal/adapters/in/cli ./internal/app
  • rtk golangci-lint run ./...
  • rtk go test ./...
  • coderabbit review --agent --base main -t committed -c AGENTS.md -c /home/brice/CLAUDE.md

Summary by CodeRabbit

  • New Features

    • Staged registry host rename: support primary + legacy registry domains with canonicalized image refs and cross-host repo matching.
  • Documentation

    • Expanded CLI docs: explicit HTTP server startup, detailed startup-recovery/reconciliation behavior, and clearer local vs remote manual deploy behavior.
    • Added config example and upgrade guidance for staged registry host rename.
  • Chores / Behavior changes

    • Startup recovery documented to run independently of container restart policies; runtime no longer adjusts container restart policies.

Review Change Stack

Copilot AI review requested due to automatic review settings May 10, 2026 13:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52e49dd6-946d-4694-8f0d-52d7857c4d48

📥 Commits

Reviewing files that changed from the base of the PR and between 53f9355 and 95b3044.

📒 Files selected for processing (6)
  • internal/adapters/out/docker/runtime.go
  • internal/adapters/out/docker/runtime_restart_policy_test.go
  • internal/boundaries/out/mocks/mock_container_runtime.go
  • internal/boundaries/out/runtime.go
  • internal/usecase/container/service.go
  • internal/usecase/container/service_test.go
💤 Files with no reviewable changes (4)
  • internal/boundaries/out/runtime.go
  • internal/adapters/out/docker/runtime.go
  • internal/adapters/out/docker/runtime_restart_policy_test.go
  • internal/boundaries/out/mocks/mock_container_runtime.go

📝 Walkthrough

Walkthrough

Adds staged registry host rename support: new server.legacy_registry_domains, domain utilities for image normalization/canonicalization, wiring of resolved current+legacy domains into proxy/container/autoroute, a startup recovery flow using an internal-deploy context, CLI internal-deploy wiring, removal of container restart-policy interface/tests/mocks, and extensive tests and docs.

Changes

Support for Staged Registry Host Rename

Layer / File(s) Summary
Documentation
docs/cli/serve.md, docs/config/index.md, docs/config/server.md, docs/upgrading.md, gordon.toml.example
Updated CLI docs for SIGUSR2 and deploy behavior; added staged registry-host rename guidance, upgrade notes, and example config entry.
Configuration Schema & Resolution
internal/app/run.go
Added RegistryDomain and LegacyRegistryDomains fields, Viper default, and resolveRegistryDomains; propagated resolved domains into proxy, container, and autoroute wiring.
Image Registry Utilities
internal/domain/registry_image.go
New utilities: KnownGordonRegistryDomains, StripKnownGordonRegistry, ExtractGordonRepoName, CanonicalizeGordonImageRef, IsGordonRegistryImageRef.
Image Utility Tests
internal/domain/registry_image_test.go
Tests for normalization, stripping, extraction, canonicalization, and detection across current and legacy domains.
Config Service Updates
internal/usecase/config/service.go
Added LegacyRegistryDomains; extended NormalizeRegistryImage, matchesImageName, lookup functions, and canonicalization-on-save to use legacy domains; added GetLegacyRegistryDomains.
Config Service Tests
internal/usecase/config/service_test.go
Added/expanded tests for route/attachment canonicalization on save, lookup matching across legacy/current domains, AddAttachment dedupe, and normalization scenarios.
Container Service Updates
internal/usecase/container/service.go
Added LegacyRegistryDomains; updated stripRegistryPrefix, validateExternalImageRef, and rewriteToLocalRegistry to handle legacy domains during internal deploys.
Container Service Tests
internal/usecase/container/service_test.go
Extended tests for rewrite-to-local registry, stripping legacy prefixes, and internal-deploy forced-pull handling with legacy hosts/ports.
Auto-route Handler Updates
internal/usecase/container/autoroute.go
Added support for legacy registry domains in handler and compared repo ownership across current+legacy domains when updating routes.
Auto-route Handler Tests
internal/usecase/container/autoroute_test.go
Tests for same-repo updates across legacy/current hosts, config reload behavior, and cross-repo rejection.
Auto-validation Updates
internal/usecase/auto/validation.go
ExtractRepoName now accepts variadic legacy domains and delegates to domain utilities.
Auto-validation Tests
internal/usecase/auto/validation_test.go
Tests for domain equivalence, ports, and digest handling.
Startup Recovery & CLI Internal Deploy Context
internal/app/startup_recovery.go, internal/app/run.go, internal/adapters/in/cli/controlplane_local.go
Added syncAndRecoverConfiguredRoutes for best-effort startup recovery; runServers now calls it; CLI local deploy wraps container deploy with domain.WithInternalDeploy.
Startup Recovery Tests
internal/app/startup_recovery_test.go, internal/adapters/in/cli/controlplane_local_test.go
Tests for startup recovery orchestration, call sequencing, internal-deploy context propagation, and resilience to sync/autostart failures.
Runtime Interface Pruning
internal/adapters/out/docker/runtime.go, internal/adapters/out/docker/runtime_restart_policy_test.go, internal/boundaries/out/mocks/mock_container_runtime.go, internal/boundaries/out/runtime.go
Removed EnsureContainerRestartPolicy method from runtime adapter and interface, removed its tests, and removed the autogenerated mock expecter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bnema/gordon#191: Directly related — touches restart-policy functionality that this PR removes/changes.
  • bnema/gordon#110: Overlaps in startup/run wiring and container-service constructor/signature changes.
  • bnema/gordon#104: Related to image-normalization and image-based lookup logic affecting config service matching.

Poem

🐰 I hopped through docs and hummed a small song,
Legacy hosts and current now belong.
Routes wake at boot with a gentle deploy,
Images canonicalize — no frantic ploy.
Hop on the cutover, the migration's not long.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Harden registry rename and startup recovery' directly and clearly summarizes the main objectives of the changeset: adding support for staged registry host renames and improving startup recovery behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/registry-rename-startup-recovery-hardening

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/usecase/container/service.go (1)

1368-1370: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Simplify early return for already-local refs.

When imageRef already has the localPrefix, reconstructing localPrefix + remainder returns the original string unchanged. Returning imageRef directly is clearer.

♻️ Suggested simplification
-	if remainder, ok := strings.CutPrefix(imageRef, localPrefix); ok {
-		return localPrefix + remainder
+	if strings.HasPrefix(imageRef, localPrefix) {
+		return imageRef
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/usecase/container/service.go` around lines 1368 - 1370, The code
reconstructs a value by concatenating localPrefix + remainder even when imageRef
already starts with localPrefix; instead, detect the early-return case and
return imageRef directly to avoid unnecessary string operations—update the logic
in the function that handles imageRef/localPrefix (referencing imageRef,
localPrefix, remainder) so when imageRef.HasPrefix(localPrefix) you return
imageRef, nil rather than rebuilding the same string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/app/run.go`:
- Around line 1975-1977: autoRouteHandler is created once with registryDomain
and legacyRegistryDomains resolved from startup cfg (resolveRegistryDomains +
container.NewAutoRouteHandler + WithEnvExtractor), so config reloads don’t
update its domain set; change the handler to read domains from svc.configSvc on
each event or rebuild/rebind autoRouteHandler when config reload
completes—either (a) modify NewAutoRouteHandler/AutoRouteHandler to accept
svc.configSvc and call resolveRegistryDomains(svc.configSvc.Current()) per
request/event, or (b) listen for the config reload completion and recreate
autoRouteHandler by calling resolveRegistryDomains(cfg) and
container.NewAutoRouteHandler(...). Ensure references to resolveRegistryDomains,
autoRouteHandler, NewAutoRouteHandler, WithEnvExtractor and svc.configSvc are
updated accordingly.

In `@internal/usecase/config/service.go`:
- Around line 877-897: canonicalAttachmentsForSave currently canonicalizes image
refs but does not deduplicate them, so legacy and current registry variants can
produce duplicate canonical refs; update canonicalAttachmentsForSave to
deduplicate per target after calling domain.CanonicalizeGordonImageRef by
collecting canonical refs into a temporary set (map[string]struct{}) while
preserving original order (append to canonicalImages only when not already
seen), then assign the deduped slice to result[target]; ensure nil handling
stays the same and use the existing function name canonicalAttachmentsForSave
and the call to domain.CanonicalizeGordonImageRef as the anchor points to
change.

---

Outside diff comments:
In `@internal/usecase/container/service.go`:
- Around line 1368-1370: The code reconstructs a value by concatenating
localPrefix + remainder even when imageRef already starts with localPrefix;
instead, detect the early-return case and return imageRef directly to avoid
unnecessary string operations—update the logic in the function that handles
imageRef/localPrefix (referencing imageRef, localPrefix, remainder) so when
imageRef.HasPrefix(localPrefix) you return imageRef, nil rather than rebuilding
the same string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c899410-cda3-4f4c-be07-b8c18b56b578

📥 Commits

Reviewing files that changed from the base of the PR and between 99756e8 and 637b278.

📒 Files selected for processing (20)
  • docs/cli/serve.md
  • docs/config/index.md
  • docs/config/server.md
  • docs/upgrading.md
  • gordon.toml.example
  • internal/adapters/in/cli/controlplane_local.go
  • internal/adapters/in/cli/controlplane_local_test.go
  • internal/app/run.go
  • internal/app/startup_recovery.go
  • internal/app/startup_recovery_test.go
  • internal/domain/registry_image.go
  • internal/domain/registry_image_test.go
  • internal/usecase/auto/validation.go
  • internal/usecase/auto/validation_test.go
  • internal/usecase/config/service.go
  • internal/usecase/config/service_test.go
  • internal/usecase/container/autoroute.go
  • internal/usecase/container/autoroute_test.go
  • internal/usecase/container/service.go
  • internal/usecase/container/service_test.go

Comment thread internal/app/run.go
Comment on lines +1975 to 1977
registryDomain, legacyRegistryDomains := resolveRegistryDomains(cfg)
autoRouteHandler := container.NewAutoRouteHandler(ctx, svc.configSvc, svc.containerSvc, svc.blobStorage, registryDomain, legacyRegistryDomains...).
WithEnvExtractor(svc.runtime, svc.envDir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Refresh auto-route registry domains on hot reload.

Lines 1975-1977 bind registryDomain and legacyRegistryDomains from the startup cfg, so later config reloads never update the autoRouteHandler. After changing server.gordon_domain or server.legacy_registry_domains at runtime, image-push ownership checks will still use the old host set until Gordon is restarted, which breaks the staged-rename flow this PR is adding. Make the handler read the current domains from configSvc per event, or rebuild/rebind it when config reload completes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/run.go` around lines 1975 - 1977, autoRouteHandler is created
once with registryDomain and legacyRegistryDomains resolved from startup cfg
(resolveRegistryDomains + container.NewAutoRouteHandler + WithEnvExtractor), so
config reloads don’t update its domain set; change the handler to read domains
from svc.configSvc on each event or rebuild/rebind autoRouteHandler when config
reload completes—either (a) modify NewAutoRouteHandler/AutoRouteHandler to
accept svc.configSvc and call resolveRegistryDomains(svc.configSvc.Current())
per request/event, or (b) listen for the config reload completion and recreate
autoRouteHandler by calling resolveRegistryDomains(cfg) and
container.NewAutoRouteHandler(...). Ensure references to resolveRegistryDomains,
autoRouteHandler, NewAutoRouteHandler, WithEnvExtractor and svc.configSvc are
updated accordingly.

Comment thread internal/usecase/config/service.go
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Gordon’s transition between registry hostnames by introducing shared “current + legacy registry domain” handling across image matching, internal pull rewriting, auto-route ownership checks, and config persistence. It also refactors startup behavior so reboot recovery consistently uses Gordon’s internal deploy semantics and no longer depends on auto_route.enabled.

Changes:

  • Add legacy_registry_domains support and shared domain helpers for identifying/stripping/canonicalizing Gordon-managed image refs.
  • Update container deploy/pull rewriting, auto-route repo ownership checks, and config image matching/saving to treat legacy + current registry hosts equivalently.
  • Refactor startup recovery into a dedicated helper that syncs containers and recovers configured routes after listeners bind; update local deploy to use internal deploy context.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/usecase/container/service.go Uses shared domain helpers for registry stripping/recognition and improves internal pull ref rewriting behavior.
internal/usecase/container/service_test.go Adds coverage for legacy registry rewrite behavior and internal deploy pulls.
internal/usecase/container/autoroute.go Treats legacy/current registry hosts as the same repo for auto-route ownership checks.
internal/usecase/container/autoroute_test.go Adds tests for auto-route ownership across legacy/current registry hosts.
internal/usecase/config/service.go Loads legacy domains, matches images across legacy/current hosts, and canonicalizes Gordon-managed refs on save (routes + attachments).
internal/usecase/config/service_test.go Adds tests ensuring canonicalization happens on disk while preserving in-memory refs.
internal/usecase/auto/validation.go Delegates repo extraction to shared domain helper and supports legacy domains.
internal/usecase/auto/validation_test.go Adds coverage for legacy/current host equivalence and ports/digests.
internal/domain/registry_image.go Introduces shared utilities for known domains, stripping, canonicalization, and repo extraction.
internal/domain/registry_image_test.go Adds unit tests for registry-image domain utilities (including hostile lookalikes).
internal/app/startup_recovery.go New startup recovery helper to sync containers and recover configured routes using internal deploy semantics.
internal/app/startup_recovery_test.go Tests startup recovery sequencing and best-effort behavior.
internal/app/run.go Wires legacy registry domain config through services and swaps startup flow to the new recovery helper.
internal/adapters/in/cli/controlplane_local.go Forces local deploys to use internal deploy semantics.
internal/adapters/in/cli/controlplane_local_test.go Adds test ensuring internal deploy context is used.
gordon.toml.example Documents legacy_registry_domains option.
docs/upgrading.md Documents staged registry host rename procedure and expectations.
docs/config/server.md Adds legacy_registry_domains documentation and links.
docs/config/index.md Mentions staged rename option and links to docs.
docs/cli/serve.md Updates startup sequence docs and clarifies startup recovery vs manual deploy behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/usecase/container/service.go Outdated
Comment on lines +1337 to +1344
return strings.TrimPrefix(image, prefix)
}
return image
return domain.StripKnownGordonRegistry(image, cfg.RegistryDomain, cfg.LegacyRegistryDomains)
Comment thread internal/app/run.go
Comment on lines +2587 to +2588
// Recover configured routes after servers are listening (registry port is now bound).
syncAndRecoverConfiguredRoutes(ctx, svc.configSvc, svc.containerSvc, log)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/usecase/config/service_test.go`:
- Around line 2132-2148: The test currently ignores Load errors by assigning `_
= svc.Load(ctx)` which can hide setup failures; replace those calls with an
explicit assertion such as require.NoError(t, svc.Load(ctx)) in the t.Run
subtest "matches current image when route stores legacy registry host" (and the
other similar subtest around lines 2261-2277) so that the
NewService/svc.Load(ctx) setup failure is reported immediately; update both
occurrences to call require.NoError with the same ctx and svc.Load invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5465edb-168f-462e-ab8c-09960f8a4dae

📥 Commits

Reviewing files that changed from the base of the PR and between 637b278 and 2b371d2.

📒 Files selected for processing (5)
  • internal/usecase/config/service.go
  • internal/usecase/config/service_test.go
  • internal/usecase/container/autoroute.go
  • internal/usecase/container/autoroute_test.go
  • internal/usecase/container/service.go

Comment thread internal/usecase/config/service_test.go
@bnema bnema merged commit 8d0a092 into main May 10, 2026
7 checks passed
@bnema bnema deleted the feat/registry-rename-startup-recovery-hardening branch May 10, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants