Skip to content

feat(BA-6242): give custom runtime variant a default health check#11863

Open
seedspirit wants to merge 2 commits into
mainfrom
feat/BA-6242
Open

feat(BA-6242): give custom runtime variant a default health check#11863
seedspirit wants to merge 2 commits into
mainfrom
feat/BA-6242

Conversation

@seedspirit
Copy link
Copy Markdown
Contributor

Summary

  • Seed the custom runtime variant's default_model_definition with a baseline health_check (path /health, interval 10s, max_retries 10, initial_delay 1800s) via alembic migration ed42bc179b91.
  • Previously custom held {"models": null}, so deployments that omitted health_check fell back to the 60s schema default and could be marked unhealthy while large models load (up to ~30 min). 1800s matches the prebuilt variants.
  • Merge stays field-wise: a user's model-definition.yaml / request still overrides each field, so this is purely a fallback layer. Operator-customised rows are left untouched (WHERE-guarded UPDATE, idempotent).
  • Kept both fixtures (fixtures/manager/ and src/ai/backend/install/fixtures/) in sync with the migration.

Test plan

  • pants fmt/lint/check on the migration
  • Real alembic run on a throwaway DB: 0113c63f3261 → ed42bc179b91 seeds the def; downgrade -1 reverts to {"models": null}; re-upgrade head is idempotent
  • Both fixtures parse as valid JSON
  • CI

Resolves BA-6242

Seed the custom variant's default_model_definition with a baseline
health_check (path /health, initial_delay 1800s) so deployments that
omit it no longer fall back to the 60s schema default and fail health
checks while large models load. User model-definition.yaml / request
values still override field-by-field. Migration + both fixtures kept
in sync.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added size:L 100~500 LoC comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels May 29, 2026
@seedspirit seedspirit requested a review from HyeockJinKim May 29, 2026 08:23
@seedspirit seedspirit marked this pull request as ready for review May 29, 2026 08:57
@seedspirit seedspirit requested a review from a team as a code owner May 29, 2026 08:57
Copilot AI review requested due to automatic review settings May 29, 2026 08:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Seeds the custom runtime variant’s default_model_definition with a baseline health_check so deployments that omit health_check don’t inherit the strict 60s schema default and get marked unhealthy while large models are still loading.

Changes:

  • Add an Alembic data migration to update the custom runtime variant’s default_model_definition from {"models": null} to a draft containing service.health_check defaults (guarded to avoid overwriting operator customizations).
  • Update both runtime-variant fixture JSON files to match the new custom default.
  • Add a towncrier news fragment describing the behavior change.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/ai/backend/manager/models/alembic/versions/ed42bc179b91_set_custom_runtime_variant_default_definition.py Adds an idempotent, guarded UPDATE to seed custom’s default model-definition draft with a long initial_delay health check.
src/ai/backend/install/fixtures/example-runtime-variants.json Updates the installer fixture so fresh installs include the new custom baseline health check.
fixtures/manager/example-runtime-variants.json Keeps the Manager-side fixture in sync with the installer fixture for the custom default.
changes/11863.feature.md Documents the new default health check behavior in release notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread changes/11863.feature.md
@@ -0,0 +1 @@
Apply a default health check (/health, 1800s initial delay) to the custom runtime variant so model deployments no longer fail health checks while large models load
Copy link
Copy Markdown
Member

@fregataa fregataa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health check field is nullable and when it is null then no health check runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants