feat(BA-6242): give custom runtime variant a default health check#11863
Open
seedspirit wants to merge 2 commits into
Open
feat(BA-6242): give custom runtime variant a default health check#11863seedspirit wants to merge 2 commits into
seedspirit wants to merge 2 commits into
Conversation
Seed the custom variant's default_model_definition with a baseline health_check (path /health, initial_delay 1800s) so deployments that omit it no longer fall back to the 60s schema default and fail health checks while large models load. User model-definition.yaml / request values still override field-by-field. Migration + both fixtures kept in sync. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Seeds the custom runtime variant’s default_model_definition with a baseline health_check so deployments that omit health_check don’t inherit the strict 60s schema default and get marked unhealthy while large models are still loading.
Changes:
- Add an Alembic data migration to update the
customruntime variant’sdefault_model_definitionfrom{"models": null}to a draft containingservice.health_checkdefaults (guarded to avoid overwriting operator customizations). - Update both runtime-variant fixture JSON files to match the new
customdefault. - Add a towncrier news fragment describing the behavior change.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/ai/backend/manager/models/alembic/versions/ed42bc179b91_set_custom_runtime_variant_default_definition.py |
Adds an idempotent, guarded UPDATE to seed custom’s default model-definition draft with a long initial_delay health check. |
src/ai/backend/install/fixtures/example-runtime-variants.json |
Updates the installer fixture so fresh installs include the new custom baseline health check. |
fixtures/manager/example-runtime-variants.json |
Keeps the Manager-side fixture in sync with the installer fixture for the custom default. |
changes/11863.feature.md |
Documents the new default health check behavior in release notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| Apply a default health check (/health, 1800s initial delay) to the custom runtime variant so model deployments no longer fail health checks while large models load | |||
fregataa
reviewed
May 29, 2026
Member
fregataa
left a comment
There was a problem hiding this comment.
health check field is nullable and when it is null then no health check runs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
customruntime variant'sdefault_model_definitionwith a baselinehealth_check(path/health,interval10s,max_retries10,initial_delay1800s) via alembic migrationed42bc179b91.customheld{"models": null}, so deployments that omittedhealth_checkfell back to the 60s schema default and could be marked unhealthy while large models load (up to ~30 min). 1800s matches the prebuilt variants.model-definition.yaml/ request still overrides each field, so this is purely a fallback layer. Operator-customised rows are left untouched (WHERE-guarded UPDATE, idempotent).fixtures/manager/andsrc/ai/backend/install/fixtures/) in sync with the migration.Test plan
pants fmt/lint/checkon the migration0113c63f3261 → ed42bc179b91seeds the def;downgrade -1reverts to{"models": null}; re-upgrade headis idempotentResolves BA-6242