Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 23 additions & 8 deletions docs/cli/serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Gordon responds to these signals:
| `SIGTERM` | Graceful shutdown |
| `SIGINT` | Graceful shutdown (Ctrl+C) |
| `SIGUSR1` | Reload configuration |
| `SIGUSR2` | Manual deploy (used by `gordon deploy`) |
| `SIGUSR2` | Manual deploy request (used by local `gordon deploy`) |

### Running with systemd

Expand Down Expand Up @@ -106,8 +106,22 @@ sudo loginctl enable-linger $USER
6. Initialize services (registry, proxy, auth)
7. Register event handlers
8. Start config file watcher
9. Sync existing containers
10. Start HTTP servers
9. Start HTTP servers
10. Run best-effort startup recovery (sync existing containers, then recover configured routes)

### Startup Recovery

After Gordon starts, including after a host reboot, it runs a best-effort recovery pass once the listeners are bound. Errors are logged, but Gordon keeps starting.

1. **Sync existing containers** - Gordon reconciles runtime state with configured routes.
2. **Recover configured routes** - Gordon runs `AutoStart` for any configured route that has no running container.
3. **Start the background monitor** - Ongoing crash recovery resumes after the startup pass.

This recovery runs even when `[auto_route].enabled = false`. `auto_route` only controls whether new image pushes create routes automatically; it does not disable restart recovery for routes already in the config.

Recovery stays inside Gordon's own control flow instead of relying on Docker or Podman restart policies. That keeps the behavior consistent across both runtimes.

Startup recovery is intentionally narrower than a manual deploy. It only starts routes that are missing a running container, skips readiness checks during boot, and does not perform drain/replacement logic for routes that are already running.

### Shutdown Sequence

Expand Down Expand Up @@ -177,14 +191,15 @@ Use `--remote` and `--token` to override. See [CLI Overview](./index.md).

### Description

**Local mode:** Sends `SIGUSR2` to the Gordon process with the specified domain.
**Local mode:** On the Gordon host, Gordon uses the explicit deploy path for the selected route. When the CLI cannot execute that path directly, it falls back to queueing the request with `SIGUSR2` for the running server. This is different from startup recovery: startup recovery uses `AutoStart`, while a manual deploy performs an explicit redeploy for the selected route.

**Remote mode:** Calls the remote Gordon Admin API to trigger deployment.
**Remote mode:** Calls the remote Gordon Admin API to trigger deployment. The remote Gordon instance still performs the actual deploy internally; the CLI only submits the request.

Both modes trigger:
Both local and remote manual deploys use Gordon's explicit deploy path:

- Fresh image pull (always pulls latest, ignoring cache)
- Container redeployment for the specified route
- Fresh image content is pulled for the route
- The specified route is redeployed
- Configured readiness checks and drain behavior apply when needed

### Examples

Expand Down
2 changes: 2 additions & 0 deletions docs/config/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ gordon_domain = "gordon.mydomain.com"
```

> **Note:** `gordon_domain` is the canonical key. Migrate older `registry_domain` values before restarting.
>
> For a staged registry host rename, set the new `server.gordon_domain` and keep old Gordon registry hosts in `server.legacy_registry_domains` until clients move. See [Server](./server.md#gordon-domain) and [Upgrading](../upgrading.md#staged-registry-host-rename).

## Full Configuration Reference

Expand Down
4 changes: 4 additions & 0 deletions docs/config/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ tls_port = 8443 # HTTPS listener port (0 = disabled)
# tls_key_file = "" # Optional: PEM key for static TLS
# force_https_redirect = false # Redirect all HTTP traffic to HTTPS
gordon_domain = "gordon.mydomain.com" # Gordon domain (required)
# legacy_registry_domains = ["registry.example.com:5000"]
# data_dir = "~/.gordon" # Data storage directory (default)
max_blob_chunk_size = "95MB" # Max registry blob upload chunk
max_blob_size = "1GB" # Max cumulative registry blob/layer upload
Expand All @@ -30,6 +31,7 @@ max_blob_size = "1GB" # Max cumulative registry blob/layer up
| `force_https_redirect` | bool | `false` | Redirect all HTTP requests to the HTTPS port. For direct-access setups without a TLS-terminating proxy |
| `gordon_domain` | string | **required** | Domain for Gordon (registry + admin API) |
| `registry_domain` | string | - | Deprecated migration key. Set `gordon_domain` instead. |
| `legacy_registry_domains` | []string | `[]` | Additional Gordon registry hosts treated as aliases during staged migration. See [Upgrading: Staged Registry Host Rename](../upgrading.md#staged-registry-host-rename). |
| `data_dir` | string | `~/.gordon` | Directory for registry data, logs, and env files |
| `max_proxy_body_size` | string | `"512MB"` | Maximum request body size for proxied requests |
| `max_blob_chunk_size` | string | `"95MB"` | Maximum request body size for a single registry blob upload chunk |
Expand Down Expand Up @@ -217,6 +219,8 @@ This domain is used for:

> **Warning:** If you are upgrading an older config, copy `server.registry_domain` to `server.gordon_domain` before restarting.

For a staged rename, set `gordon_domain` to the new public host and add any old Gordon registry hosts that clients still use to `legacy_registry_domains` (including `host:port` forms). Gordon treats those entries as registry aliases during image matching and internal pulls, then writes canonical refs back to `gordon_domain`. Remote CLI and admin API traffic should use `gordon_domain`.

Without this migration, a Host/remote-target mismatch can break routing or remote CLI token exchange.

When requests arrive on the proxy port with this domain as the Host header, Gordon routes them to the backend services (registry and admin API).
Expand Down
23 changes: 23 additions & 0 deletions docs/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,29 @@ gordon_domain = "gordon.example.com"

If you do not migrate, `gordon status --remote ...` and `gordon routes list --remote ...` can fail with `/auth/token` `404`, and `reg-domain/v2/` or `/admin/status` can return `404`.

### Staged Registry Host Rename

If you cannot rename the Gordon registry host in one step, keep the new host in `server.gordon_domain` and list the old Gordon registry hosts in `server.legacy_registry_domains` during the cutover:

```toml
[server]
gordon_domain = "gordon.example.com"
legacy_registry_domains = [
"registry.example.com",
"registry.example.com:5000",
]
```

Recommended rollout:

1. Set `gordon_domain` to the new host.
2. Add every old Gordon registry host that clients still use to `legacy_registry_domains`.
3. Restart Gordon.
4. Move Docker/Podman logins, pushes, pulls, and image references to the new host.
5. Remove `legacy_registry_domains` after every client has moved.

During the transition, Gordon treats both the current and legacy hosts as its own registry for image matching and internal pulls, then saves canonical refs back to `gordon_domain`. Remote CLI and admin API traffic should use the new `gordon_domain`.

## v2.16.0 to v2.30.0

### Breaking: Password Authentication Removed
Expand Down
1 change: 1 addition & 0 deletions gordon.toml.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
port = 8088
registry_port = 5000
gordon_domain = "gordon.example.com"
# legacy_registry_domains = ["registry.example.com:5000"]
# data_dir = "~/.gordon"

[auth]
Expand Down
2 changes: 1 addition & 1 deletion internal/adapters/in/cli/controlplane_local.go
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@ func (l *localControlPlane) Deploy(ctx context.Context, deployDomain string) (*r
if err != nil {
return nil, err
}
container, err := l.containerSvc.Deploy(ctx, *route)
container, err := l.containerSvc.Deploy(domain.WithInternalDeploy(ctx), *route)
if err != nil {
return nil, err
}
Expand Down
27 changes: 27 additions & 0 deletions internal/adapters/in/cli/controlplane_local_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,33 @@ func TestLocalControlPlane_GetStatus(t *testing.T) {
require.Equal(t, "running", status.ContainerStatus["app.local"])
}

func TestLocalControlPlane_DeployUsesInternalDeployContext(t *testing.T) {
t.Parallel()

configSvc := inmocks.NewMockConfigService(t)
containerSvc := inmocks.NewMockContainerService(t)

ctx := context.Background()
route := &domain.Route{Domain: "app.local", Image: "repo/app:latest"}

require.False(t, domain.IsInternalDeploy(ctx))

configSvc.EXPECT().GetRoute(mock.Anything, "app.local").Return(route, nil)
containerSvc.EXPECT().Deploy(mock.Anything, *route).RunAndReturn(func(deployCtx context.Context, deployedRoute domain.Route) (*domain.Container, error) {
require.True(t, domain.IsInternalDeploy(deployCtx))
require.Equal(t, *route, deployedRoute)
return &domain.Container{ID: "container-1"}, nil
})

cp := &localControlPlane{configSvc: configSvc, containerSvc: containerSvc}
result, err := cp.Deploy(ctx, "app.local")
require.NoError(t, err)
require.NotNil(t, result)
require.Equal(t, "deployed", result.Status)
require.Equal(t, "app.local", result.Domain)
require.Equal(t, "container-1", result.ContainerID)
}

func TestLocalControlPlane_Backups(t *testing.T) {
t.Parallel()

Expand Down
78 changes: 37 additions & 41 deletions internal/app/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,22 +93,24 @@ import (
// Config holds the application configuration.
type Config struct {
Server struct {
Port int `mapstructure:"port"`
RegistryPort int `mapstructure:"registry_port"`
GordonDomain string `mapstructure:"gordon_domain"`
TLSPort int `mapstructure:"tls_port"`
TLSCertFile string `mapstructure:"tls_cert_file"`
TLSKeyFile string `mapstructure:"tls_key_file"`
ForceHTTPSRedirect bool `mapstructure:"force_https_redirect"`
DataDir string `mapstructure:"data_dir"`
MaxProxyBodySize string `mapstructure:"max_proxy_body_size"` // e.g., "512MB", "1GB"
MaxBlobChunkSize string `mapstructure:"max_blob_chunk_size"` // e.g., "512MB", "1GB"
MaxBlobSize string `mapstructure:"max_blob_size"` // e.g., "1GB", "2GB"
MaxProxyResponseSize string `mapstructure:"max_proxy_response_size"` // e.g., "1GB", "0" for no limit
MaxConcurrentConns int `mapstructure:"max_concurrent_connections"`
RegistryAllowedIPs []string `mapstructure:"registry_allowed_ips"`
ProxyAllowedIPs []string `mapstructure:"proxy_allowed_ips"`
RegistryListenAddr string `mapstructure:"registry_listen_address"`
Port int `mapstructure:"port"`
RegistryPort int `mapstructure:"registry_port"`
GordonDomain string `mapstructure:"gordon_domain"`
RegistryDomain string `mapstructure:"registry_domain"`
LegacyRegistryDomains []string `mapstructure:"legacy_registry_domains"`
TLSPort int `mapstructure:"tls_port"`
TLSCertFile string `mapstructure:"tls_cert_file"`
TLSKeyFile string `mapstructure:"tls_key_file"`
ForceHTTPSRedirect bool `mapstructure:"force_https_redirect"`
DataDir string `mapstructure:"data_dir"`
MaxProxyBodySize string `mapstructure:"max_proxy_body_size"` // e.g., "512MB", "1GB"
MaxBlobChunkSize string `mapstructure:"max_blob_chunk_size"` // e.g., "512MB", "1GB"
MaxBlobSize string `mapstructure:"max_blob_size"` // e.g., "1GB", "2GB"
MaxProxyResponseSize string `mapstructure:"max_proxy_response_size"` // e.g., "1GB", "0" for no limit
MaxConcurrentConns int `mapstructure:"max_concurrent_connections"`
RegistryAllowedIPs []string `mapstructure:"registry_allowed_ips"`
ProxyAllowedIPs []string `mapstructure:"proxy_allowed_ips"`
RegistryListenAddr string `mapstructure:"registry_listen_address"`
} `mapstructure:"server"`

Logging struct {
Expand Down Expand Up @@ -1356,6 +1358,14 @@ func resolveEnvDir(cfg Config) string {
return envDir
}

func resolveRegistryDomains(cfg Config) (string, []string) {
registryDomain := cfg.Server.GordonDomain
if registryDomain == "" {
registryDomain = cfg.Server.RegistryDomain
}
return registryDomain, append([]string{}, cfg.Server.LegacyRegistryDomains...)
}

func createTokenStore(backend domain.SecretsBackend, dataDir string, log zerowrap.Logger) (out.TokenStore, error) {
// Token store is always created since tokens work in both auth modes
store, err := tokenstore.NewStore(backend, dataDir, log)
Expand Down Expand Up @@ -1750,9 +1760,11 @@ func buildProxyConfig(cfg Config, log zerowrap.Logger) (*proxyConfigResult, erro
}
// 0 means no limit (as documented in proxy.Config)

registryDomain, _ := resolveRegistryDomains(cfg)

return &proxyConfigResult{
proxyConfig: proxy.Config{
RegistryDomain: cfg.Server.GordonDomain,
RegistryDomain: registryDomain,
RegistryPort: cfg.Server.RegistryPort,
MaxBodySize: maxProxyBodySize,
MaxResponseSize: maxProxyResponseSize,
Expand Down Expand Up @@ -1827,10 +1839,12 @@ func createContainerService(ctx context.Context, v *viper.Viper, cfg Config, svc
}

attachmentConfig := svc.configSvc.GetAttachmentConfig()
registryDomain, legacyRegistryDomains := resolveRegistryDomains(cfg)

containerConfig := container.Config{
RegistryAuthEnabled: cfg.Auth.Enabled,
RegistryDomain: cfg.Server.GordonDomain,
RegistryDomain: registryDomain,
LegacyRegistryDomains: legacyRegistryDomains,
RegistryPort: cfg.Server.RegistryPort,
InternalRegistryUsername: svc.internalRegUser,
InternalRegistryPassword: svc.internalRegPass,
Expand Down Expand Up @@ -1958,7 +1972,8 @@ func registerEventHandlers(ctx context.Context, svc *services, cfg Config) (func
}

// Auto-route handler for creating routes from image labels
autoRouteHandler := container.NewAutoRouteHandler(ctx, svc.configSvc, svc.containerSvc, svc.blobStorage, cfg.Server.GordonDomain).
registryDomain, legacyRegistryDomains := resolveRegistryDomains(cfg)
autoRouteHandler := container.NewAutoRouteHandler(ctx, svc.configSvc, svc.containerSvc, svc.blobStorage, registryDomain, legacyRegistryDomains...).
WithEnvExtractor(svc.runtime, svc.envDir)
Comment on lines +1975 to 1977
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Refresh auto-route registry domains on hot reload.

Lines 1975-1977 bind registryDomain and legacyRegistryDomains from the startup cfg, so later config reloads never update the autoRouteHandler. After changing server.gordon_domain or server.legacy_registry_domains at runtime, image-push ownership checks will still use the old host set until Gordon is restarted, which breaks the staged-rename flow this PR is adding. Make the handler read the current domains from configSvc per event, or rebuild/rebind it when config reload completes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/app/run.go` around lines 1975 - 1977, autoRouteHandler is created
once with registryDomain and legacyRegistryDomains resolved from startup cfg
(resolveRegistryDomains + container.NewAutoRouteHandler + WithEnvExtractor), so
config reloads don’t update its domain set; change the handler to read domains
from svc.configSvc on each event or rebuild/rebind autoRouteHandler when config
reload completes—either (a) modify NewAutoRouteHandler/AutoRouteHandler to
accept svc.configSvc and call resolveRegistryDomains(svc.configSvc.Current())
per request/event, or (b) listen for the config reload completion and recreate
autoRouteHandler by calling resolveRegistryDomains(cfg) and
container.NewAutoRouteHandler(...). Ensure references to resolveRegistryDomains,
autoRouteHandler, NewAutoRouteHandler, WithEnvExtractor and svc.configSvc are
updated accordingly.


// Preview handler for creating preview environments from tagged images
Expand Down Expand Up @@ -2013,26 +2028,6 @@ func setupConfigHotReload(ctx context.Context, configSvc configWatcher, coordina
return nil
}

// syncAndAutoStart syncs existing containers and auto-starts if configured.
func syncAndAutoStart(ctx context.Context, svc *services, log zerowrap.Logger) {
if err := svc.containerSvc.EnsureManagedContainerRestartPolicies(ctx); err != nil {
log.Warn().Err(err).Msg("failed to migrate managed container restart policies")
}

if err := svc.containerSvc.SyncContainers(ctx); err != nil {
log.Warn().Err(err).Msg("failed to sync existing containers")
}

if svc.configSvc.IsAutoRouteEnabled() {
routes := svc.configSvc.GetRoutes(ctx)
if err := svc.containerSvc.AutoStart(domain.WithInternalDeploy(ctx), routes); err != nil {
log.Warn().Err(err).Msg("failed to auto-start containers")
}
}

// Start background monitor to restart crashed containers.
svc.containerSvc.StartMonitor(ctx)
}
func loopbackOnly(next http.Handler, log zerowrap.Logger) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
host, _, err := net.SplitHostPort(r.RemoteAddr)
Expand Down Expand Up @@ -2589,8 +2584,8 @@ func runServers(ctx context.Context, v *viper.Viper, cfg Config, svc *services,
defer schedulerCleanup()
}

// Auto-start after servers are listening (registry port is now bound).
syncAndAutoStart(ctx, svc, log)
// Recover configured routes after servers are listening (registry port is now bound).
syncAndRecoverConfiguredRoutes(ctx, svc.configSvc, svc.containerSvc, log)
Comment on lines +2587 to +2588

waitForShutdown(ctx, errChan, reloadChan, deployChan, reload, svc.eventBus, log)
cleanupHandlers() // Stop debounce timers before draining containers
Expand Down Expand Up @@ -3319,6 +3314,7 @@ func isProcessAlive(pid int) bool {
func loadConfig(v *viper.Viper, configPath string) error {
v.SetDefault("server.port", 8088)
v.SetDefault("server.registry_port", 5000)
v.SetDefault("server.legacy_registry_domains", []string{})
v.SetDefault("server.tls_port", 8443)
v.SetDefault("server.tls_cert_file", "")
v.SetDefault("server.tls_key_file", "")
Expand Down
44 changes: 44 additions & 0 deletions internal/app/startup_recovery.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package app

import (
"context"

"github.com/bnema/zerowrap"

"github.com/bnema/gordon/internal/domain"
)

// startupConfigService defines the route configuration needed during startup
// recovery. It intentionally excludes auto-route settings because reboot
// recovery always works from configured routes.
type startupConfigService interface {
GetRoutes(ctx context.Context) []domain.Route
}

// startupContainerService defines the container lifecycle operations needed
// during startup recovery.
type startupContainerService interface {
SyncContainers(ctx context.Context) error
AutoStart(ctx context.Context, routes []domain.Route) error
StartMonitor(ctx context.Context)
}

// syncAndRecoverConfiguredRoutes performs best-effort startup recovery for
// configured routes after listeners are ready.
func syncAndRecoverConfiguredRoutes(
ctx context.Context,
configSvc startupConfigService,
containerSvc startupContainerService,
log zerowrap.Logger,
) {
defer containerSvc.StartMonitor(ctx)

if err := containerSvc.SyncContainers(ctx); err != nil {
log.Warn().Err(err).Msg("failed to sync existing containers")
}

routes := configSvc.GetRoutes(ctx)
if err := containerSvc.AutoStart(domain.WithInternalDeploy(ctx), routes); err != nil {
log.Warn().Err(err).Msg("failed to auto-start configured routes")
}
}
Loading
Loading