Skip to content

feat(agents)!: per-agent model selection for cost optimization and /compact loop fix#1541

Merged
WilliamBerryiii merged 15 commits into
mainfrom
feat/model-selection
May 8, 2026
Merged

feat(agents)!: per-agent model selection for cost optimization and /compact loop fix#1541
WilliamBerryiii merged 15 commits into
mainfrom
feat/model-selection

Conversation

@katriendg
Copy link
Copy Markdown
Contributor

Description

This PR introduces per-agent model selection via frontmatter, backed by a validated model catalog that tracks GitHub Copilot's evolving model lineup. Simple tasks (git operations, issue triage, research) now route to fast-tier models at a fraction of the cost, while complex agents inherit the session model for full capability.

Additionally, this PR removes the self-referential /compact handoff from 12 agents, eliminating the root cause of Autopilot infinite loops reported in #1420. The disk-first .copilot-tracking/ architecture and Memory Agent already provide equivalent persistence without the loop risk.

Model Selection Infrastructure

Cost-first principle: use fast models for read-only research and validation; inherit session model for code generation and complex reasoning.

  • Added model catalog (scripts/linting/model-catalog.json) tracking 25 models across 5 tiers (free, fast, standard, premium, ultra) with multiplier values, vendor attribution, and GA/preview/retiring status
  • Added catalog refresh script (scripts/linting/Update-ModelCatalog.ps1) that fetches authoritative YAML from github/docs for model release status and multiplier data; marks removed models as retiring with 60-day grace period rather than deleting
  • Added validation script (scripts/linting/Test-ModelReferences.ps1) that scans all .agent.md and .prompt.md files for model frontmatter and validates references against the catalog; reports invalid models as errors, retiring models as warnings
  • Added JSON schema (scripts/linting/schemas/model-catalog.schema.json) for structural validation of the catalog file
  • Added weekly CI workflow (.github/workflows/model-validation.yml) running every Wednesday plus PR-triggered validation on agent/prompt/catalog changes; includes catalog freshness check and artifact upload
  • Integrated lint:models and lint:models:refresh into package.json; model validation runs as part of the lint:all chain

Per-Agent and Per-Prompt Model Assignment

Assigned fast-tier models to 7 subagents performing read-heavy validation tasks: researcher-subagent, plan-validator, implementation-validator, prompt-evaluator, rpi-validator, codebase-profiler, and report-generator. Each declares a prioritized fallback array: Claude Haiku 4.5 → GPT-5.4 mini.

Assigned Claude Haiku 4.5 (copilot) to 7 prompts handling mechanical operations: git-commit-message, git-commit, git-setup, github-add-issue, github-discover-issues, github-triage-issues, and checkpoint.

Added "Model Selection for Subagents" guidance to 6 parent agents (task-researcher, task-planner, task-implementor, task-reviewer, prompt-builder, security-reviewer) documenting cost-first dispatch decisions and VS Code tier constraint behavior.

/compact Handoff Removal (Fixes #1420)

Removed the Compact handoff entry from all 12 agents where it appeared. Eleven had it as their first handoff, causing Autopilot to auto-execute it on every turn completion, creating an infinite self-referential loop.

  • Updated rai-identity.instructions.md to remove the "Compact handoff" exit point reference from disclaimer display logic
  • Updated docs/rpi/context-engineering.md to recommend /checkpoint (Memory Agent) for cross-phase persistence and clarify that /compact remains available as a typed command

PR #1492 (feat/context-working) adds Context Discipline to 5 RPI parent agents, enforcing disk-first lean responses. The /compact handoff is now architecturally redundant because:

  1. Disk-first .copilot-tracking/ files — all state already lives on disk
  2. Memory Agent — provides structured session persistence with handoff to a different agent (non-looping)
  3. PR feat(agents): optimize RPI agent context management with discipline rules #1492 Context Discipline — caps subagent responses to executive summaries, reducing context bloat at the source

Test Coverage

  • Added 41 Pester tests (Test-ModelReferences.Tests.ps1) covering validation logic, frontmatter parsing, and error handling
  • Added 29 Pester tests (Test-UpdateModelCatalog.Tests.ps1) covering catalog merge, comparison, and refresh logic

Related Issue(s)

Fixes #1420
Closes #1540

Type of Change

Select all that apply:

Code & Documentation:

  • Bug fix (non-breaking change fixing an issue)
  • New feature (non-breaking change adding functionality)
  • Breaking change (fix or feature causing existing functionality to change)
  • Documentation update

Infrastructure & Configuration:

  • GitHub Actions workflow
  • Linting configuration (markdown, PowerShell, etc.)
  • Security configuration
  • DevContainer configuration
  • Dependency update

AI Artifacts:

  • Reviewed contribution with prompt-builder agent and addressed all feedback
  • Copilot instructions (.github/instructions/*.instructions.md)
  • Copilot prompt (.github/prompts/*.prompt.md)
  • Copilot agent (.github/agents/*.agent.md)
  • Copilot skill (.github/skills/*/SKILL.md)

Note for AI Artifact Contributors:

  • Agents: Research, indexing/referencing other project (using standard VS Code GitHub Copilot/MCP tools), planning, and general implementation agents likely already exist. Review .github/agents/ before creating new ones.
  • Skills: Must include both bash and PowerShell scripts. See Skills.
  • Model Versions: Only contributions targeting the latest Anthropic and OpenAI models will be accepted. Older model versions (e.g., GPT-3.5, Claude 3) will be rejected.
  • See Agents Not Accepted and Model Version Requirements.

Other:

  • Script/automation (.ps1, .sh, .py)
  • Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request:

Invoke any RPI agent (e.g., task researcher) with a research task. The agent dispatches its Researcher Subagent at fast-tier cost automatically. Run npm run lint:models to validate all model references.

Execution Flow:

  1. Parent agent evaluates task type (read-only vs code-generation)
  2. For research/validation tasks, parent specifies model: "Claude Haiku 4.5 (copilot)" on runSubagent call
  3. VS Code resolves model against cost tier constraint (cannot exceed parent model tier)
  4. Subagent executes at fast-tier cost; results written to .copilot-tracking/ disk files
  5. If tier constraint blocks downgrade, platform falls back to session model gracefully

Output Artifacts:

  • logs/model-validation-results.json — structured validation results with per-file status
  • scripts/linting/model-catalog.json — refreshed catalog after lint:models:refresh

Success Indicators:

  • npm run lint:models exits 0 with no invalid model references
  • Subagent invocations show model name in VS Code chat header when explicitly set
  • No Autopilot infinite loops when agents complete their work

Testing

  • npm run lint:models — model reference validation (validates all 14 model-annotated files)
  • Security analysis: no sensitive data exposure, no privilege escalation, workflow uses read-only permissions
  • Diff-based assessment: all changes are configuration-level (frontmatter, handoff entries, guidance sections); no business logic modified
  • Manual testing performed

Note

Add manual testing descriptions when applicable.

Checklist

Required Checks

  • Documentation is updated (if applicable)
  • Files follow existing naming conventions
  • Changes are backwards compatible (if applicable)
  • Tests added for new functionality (if applicable)

AI Artifact Contributions

  • Used /prompt-analyze to review contribution
  • Addressed all feedback from prompt-builder review
  • Verified contribution follows common standards and type-specific requirements

Required Automated Checks

The following validation commands must pass before merging:

  • Markdown linting: npm run lint:md
  • Spell checking: npm run spell-check
  • Frontmatter validation: npm run lint:frontmatter
  • Skill structure validation: npm run validate:skills
  • Link validation: npm run lint:md-links
  • PowerShell analysis: npm run lint:ps
  • Plugin freshness: npm run plugin:generate
  • Docusaurus tests: npm run docs:test

Security Considerations

  • This PR does not contain any sensitive or NDA information
  • Any new dependencies have been reviewed for security issues (N/A — no new runtime dependencies added)
  • Security-related scripts follow the principle of least privilege

Warning

This PR includes experimental GHCP artifacts that may have breaking changes.

  • .github/agents/hve-core/task-challenger.agent.md
  • .github/agents/experimental/experiment-designer.agent.md
  • .github/agents/experimental/pptx.agent.md
  • .github/agents/security/security-planner.agent.md
  • .github/agents/security/sssc-planner.agent.md
  • .github/agents/security/security-reviewer.agent.md
  • .github/agents/security/subagents/codebase-profiler.agent.md
  • .github/agents/security/subagents/report-generator.agent.md
  • .github/agents/rai-planning/rai-planner.agent.md

GHCP Artifact Maturity

File Type Maturity Notes
.github/agents/hve-core/rpi-agent.agent.md Agent ✅ stable All builds
.github/agents/hve-core/task-researcher.agent.md Agent ✅ stable All builds
.github/agents/hve-core/task-planner.agent.md Agent ✅ stable All builds
.github/agents/hve-core/task-implementor.agent.md Agent ✅ stable All builds
.github/agents/hve-core/task-reviewer.agent.md Agent ✅ stable All builds
.github/agents/hve-core/prompt-builder.agent.md Agent ✅ stable All builds
.github/agents/hve-core/task-challenger.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/experimental/experiment-designer.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/experimental/pptx.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/security/security-planner.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/security/sssc-planner.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/security/security-reviewer.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/security/subagents/codebase-profiler.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/security/subagents/report-generator.agent.md Agent ⚠️ experimental Pre-release only
.github/agents/rai-planning/rai-planner.agent.md Agent ⚠️ experimental Pre-release only
.github/prompts/hve-core/checkpoint.prompt.md Prompt ✅ stable All builds
.github/prompts/hve-core/git-commit-message.prompt.md Prompt ✅ stable All builds
.github/prompts/hve-core/git-commit.prompt.md Prompt ✅ stable All builds
.github/prompts/hve-core/git-setup.prompt.md Prompt ✅ stable All builds
.github/prompts/github/github-add-issue.prompt.md Prompt ✅ stable All builds
.github/prompts/github/github-discover-issues.prompt.md Prompt ✅ stable All builds
.github/prompts/github/github-triage-issues.prompt.md Prompt ✅ stable All builds
.github/instructions/rai-planning/rai-identity.instructions.md Instructions ⚠️ experimental Pre-release only

GHCP Maturity Acknowledgment

  • I acknowledge this PR includes non-stable GHCP artifacts
  • Non-stable artifacts are intentional for this change

Additional Notes

  • The /compact removal is a breaking change for users who relied on the handoff button. The /compact typed command remains available; only the agent-surfaced handoff is removed.
  • Model catalog currently tracks 25 models; the automated refresh runs weekly to catch additions, removals, and multiplier changes from GitHub's upstream YAML sources.
  • The VS Code cost tier constraint means subagents can only use models at the same or lower tier than the parent. All guidance sections document this limitation and the graceful fallback behavior.

Follow-up Tasks

  • Monitor weekly CI workflow for first catalog drift detection to confirm automation works end-to-end
  • Consider extending model selection to remaining prompts (pull-request, doc-ops) once cost savings are validated

@katriendg katriendg requested a review from a team as a code owner May 6, 2026 12:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
actions/actions/checkout de0fac2e4500dabe0009e67214ff5f5447ce83dd 🟢 5.7
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained⚠️ 00 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Fuzzing⚠️ 0project is not fuzzed
Packaging⚠️ -1packaging workflow not detected
License🟢 10license file detected
Pinned-Dependencies🟢 3dependency not pinned by hash detected -- score normalized to 3
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
Branch-Protection🟢 5branch protection is not maximal on development and all release branches
SAST🟢 8SAST tool detected but not run on all commits
actions/actions/upload-artifact 043fb46d1a93c77aae656e7c1c64a875d1fc6a0a 🟢 5.6
Details
CheckScoreReason
Code-Review🟢 8Found 8/9 approved changesets -- score normalized to 8
Maintained🟢 66 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 6
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Packaging⚠️ -1packaging workflow not detected
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 1dependency not pinned by hash detected -- score normalized to 1
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 9security policy file detected
SAST🟢 10SAST tool is run on all commits
Branch-Protection⚠️ 0branch protection not enabled on development/release branches

Scanned Files

  • .github/workflows/model-validation.yml

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

Codecov Report

❌ Patch coverage is 85.82375% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.62%. Comparing base (97c40e8) to head (bd021ee).

Files with missing lines Patch % Lines
scripts/linting/Update-ModelCatalog.ps1 83.68% 23 Missing ⚠️
scripts/linting/Test-ModelReferences.ps1 88.33% 14 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1541      +/-   ##
==========================================
+ Coverage   85.46%   85.62%   +0.16%     
==========================================
  Files          80       77       -3     
  Lines       11541    10779     -762     
==========================================
- Hits         9863     9230     -633     
+ Misses       1678     1549     -129     
Flag Coverage Δ
pester 83.59% <85.82%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
scripts/linting/Test-ModelReferences.ps1 88.33% <88.33%> (ø)
scripts/linting/Update-ModelCatalog.ps1 83.68% <83.68%> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advisory review, this PR is from a maintainer. Findings are informational only.


Review Summary

This PR is well-structured and addresses two distinct, clearly-scoped concerns — per-agent model selection for cost optimisation and removal of the /compact handoff loop. The implementation follows repository conventions throughout.


Issue Alignment ✅

  • Fixes #1420 (Autopilot /compact infinite loop): the removal of the Compact handoff from 12 agents directly addresses the root cause described in the issue.
  • Closes #1540: per-agent model selection infrastructure is complete and consistent with the stated intent.
  • No scope creep observed; changes are tightly scoped to model frontmatter additions, handoff removals, and the supporting catalog/validation infrastructure.

PR Template Compliance ⚠️

One minor gap: the GHCP Maturity Acknowledgment section at the bottom of the PR description has both checkboxes unchecked. The maturity warning block and the artifact table are fully filled in, so this appears to be an oversight rather than an intentional omission. Please check these two boxes to complete the template.


Coding Standards ✅

  • All new PowerShell scripts (Test-ModelReferences.ps1, Update-ModelCatalog.ps1) follow the repository's PowerShell conventions: copyright header, #Requires -Version 7.0, comment-based help, [CmdletBinding()], $ErrorActionPreference = 'Stop', main execution guard, region blocks, and [OutputType()] on exported functions.
  • Test files (Test-ModelReferences.Tests.ps1, Test-UpdateModelCatalog.Tests.ps1) follow Pester 5 conventions: #Requires -Modules Pester first, copyright header after, BeforeAll/AfterAll lifecycle, -Tag 'Unit' on all Describe blocks, and $TestDrive-equivalent temp directory management.
  • model-validation.yml uses the same actions/checkout and actions/upload-artifact SHA pins as the rest of the repository's workflows, persist-credentials: false, permissions: contents: read at both workflow and job level. All security requirements are met.
  • model frontmatter field format (single string for prompts, priority array for agents) is consistent and matches the documentation added in ai-artifacts-common.md.

Code Quality ✅

Two minor advisory findings are raised as inline comments:

  1. model-catalog.json missing initial $schema reference (line 1) — the Update-ModelCatalog.ps1 writes $schema on refresh but the seeded file committed here omits it; schema tooling won't apply until the first weekly run.
  2. Update-ModelCatalog.ps1 mixed PSCustomObject/hashtable in $finalModels (lines 251–273) — functionally correct, but a normalisation step would make future maintenance safer.

No security vulnerabilities, no breaking changes beyond those declared, no missing error handling at system boundaries.


Documentation ✅

  • copilot-instructions.md, docs/contributing/ai-artifacts-common.md, docs/contributing/custom-agents.md, docs/contributing/prompts.md, and docs/rpi/context-engineering.md are all updated to reflect the new model selection guidance and /compact deprecation from handoffs.
  • The README-level npm script list is updated in both package.json and the instructions file.

Outstanding Action Item

  • Check the two GHCP Maturity Acknowledgment checkboxes in the PR description.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #1420 issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by PR Review for issue #1541 · ● 2.2M

Comment thread scripts/linting/model-catalog.json
Comment thread scripts/linting/Update-ModelCatalog.ps1
@bindsi
Copy link
Copy Markdown
Member

bindsi commented May 6, 2026

Review

Two bundled changes: per-agent/prompt model frontmatter backed by a validated catalog with weekly refresh, and removal of the self-referential /compact handoff from 12 agents to fix Autopilot loops (#1420). All 59 CI checks green.

Strengths

  • Well-scoped and well-justified. PR body cleanly separates the two threads. The architectural argument for /compact removal (disk-first .copilot-tracking/ + Memory Agent + feat(agents): optimize RPI agent context management with discipline rules #1492 Context Discipline making it redundant) is sound.
  • Catalog-driven validation. JSON schema, refresh script that fetches authoritative YAML from github/docs, retiring grace period instead of hard delete — right pattern.
  • Strong test coverage. 70 Pester tests covering frontmatter parsing, validation logic, tier classification, and catalog comparison. Edge cases (Not applicable multiplier, missing multiplier entry, mixed invalid/retiring) are exercised.
  • Documentation is thorough and coherent across ai-artifacts-common.md, custom-agents.md, prompts.md, and context-engineering.md.
  • Cost-first guidance is consistent across parent agents and follows a clear pattern: read-only/validation → fast tier; code generation/architecture → inherit session model.

Concerns

1. Silent catalog drift in CI (medium). model-validation.yml runs Update-ModelCatalog.ps1 before validation, but the refreshed catalog isn't committed. PRs validate against a fresher catalog than what's in the repo — a reference can pass CI because the model exists upstream while the committed model-catalog.json is stale. The "Report catalog drift" step only flags retiring entries; it doesn't surface added/removed/multiplier-changed models that appear only in the just-fetched copy.

Suggestion: either (a) validate against the committed catalog only and run refresh on a separate scheduled job that opens an auto-PR, or (b) fail the CI step when the in-memory refreshed catalog differs materially from the committed one.

2. Provider-allowlist not mechanically enforced (medium). The catalog correctly contains non-Anthropic/OpenAI entries (Goldeneye, Raptor mini, Gemini 3 Flash, Grok Code Fast 1) pulled from upstream YAML. lint:models accepts them, but they'd violate the "Anthropic and OpenAI only" policy in ai-artifacts-common.md. Consider an additional provider-allowlist check so policy is enforced, not just documented.

3. Upstream pin (low). Update-ModelCatalog.ps1 fetches from github/docs@main. Combined with #1, PR validation results are non-deterministic across runs. Worth at least a script comment.

4. Workflow runner pin (low — verify). model-validation.yml uses runs-on: ubuntu-latest. Repo conventions typically require a pinned runner version (e.g., ubuntu-24.04).

5. Untested branch in Update-ModelCatalog.ps1 (low). The "mark removed as retiring" overlay logic in the main block (~lines 245–265) — 60-day grace transition and update-multiplier-on-existing-entry path — has no tests. Compare-Catalogs itself is well covered.

6. Get-FrontmatterFromFile regex (low). '(?s)^---\r?\n(.+?)\r?\n---' doesn't anchor the closing --- to a line boundary. Lazy .+? keeps it correct for current files, but worth aligning with the existing frontmatter validator's pattern.

7. Breaking change disclosure. /compact handoff removal is correctly called out as breaking in the PR body. Verify release-please/changelog surfaces this (e.g., feat!: or BREAKING CHANGE: footer) so downstream consumers see it.

8. Minor. Test-ModelReferences.ps1 uses (Get-Location).Path for relative-path computation — awkward when invoked outside repo root. Consider basing on $PSScriptRoot.

Verdict

Approve with non-blocking suggestions. Core mechanics and tests are solid. #1 (silent catalog drift) and #2 (provider-allowlist) are the most actionable — both undermine the validation guarantee the PR is selling. Worth addressing before merge or in immediate follow-up.

Copy link
Copy Markdown
Member

@bindsi bindsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like it, that´s such an improvement and optimisation of cost impact for users. Thank you so much

Comment thread .github/workflows/model-validation.yml
@katriendg katriendg changed the title feat(agents): per-agent model selection for cost optimization and /compact loop fix feat(agents)!: per-agent model selection for cost optimization and /compact loop fix May 6, 2026
katriendg added 11 commits May 6, 2026 15:10
- add model frontmatter to 7 research/validation subagents with fast-tier fallbacks
- add model frontmatter to 7 Tier 1 prompts using Claude Haiku 4.5
- create model-catalog.json with 19 supported models and JSON schema
- create Test-ModelReferences.ps1 validation script
- add lint:models npm script integrated into lint:all chain

⚡ - Generated by Copilot
- add model-validation.yml with weekly schedule and PR-triggered runs
- add catalog freshness check warning when catalog exceeds 90 days
- refactor Test-ModelReferences.ps1 to extract Invoke-ModelReferenceValidation function
- add Test-ModelReferences.Tests.ps1 with 41 unit tests covering all code paths

🧪 - Generated by Copilot
- add Update-ModelCatalog.ps1 with YAML source fetching from github/docs
- add Pester tests for Merge-ModelData, Get-RemoteYaml, Compare-Catalogs
- update workflow to run catalog refresh before validation
- update agent model references from retiring Gemini 3 Flash (Preview) to GA name
- refresh model-catalog.json with correct tier assignments

🔄 - Generated by Copilot
…spatch

- add Model Selection for Subagents section to RPI parent agents
- add model guidance to prompt-builder and security-reviewer
- fast model for research/validation, session model for code generation
- cost-first principle adapted for VS Code tier constraints

💰 - Generated by Copilot
Remove self-referential Compact handoffs that caused Autopilot infinite
loops. The disk-first .copilot-tracking architecture and Memory Agent
make /compact handoffs redundant.

- Remove Compact from 5 RPI agents (rpi-agent, task-researcher,
  task-planner, task-implementor, task-reviewer)
- Remove Compact from 7 additional agents (task-challenger,
  prompt-builder, security-planner, sssc-planner, rai-planner,
  experiment-designer, pptx)
- Update context-engineering.md to clarify /compact as typed command
- Remove Compact exit point from rai-identity.instructions.md
- Regenerate plugins

Closes #1420
…prompts documentation

- update description of model property as a preference hint
- explain fallback behavior when specified model is unavailable
- emphasize cost tier constraints for prompt models

🔍 - Generated by Copilot
…-ModelCatalogUpdate functions

- implement tests for validation output handling
- cover scenarios for valid, invalid, and retiring references
- ensure directory creation for output files in tests
- validate catalog update logic with various conditions

🔍 - Generated by Copilot
…alidation

- implement a check for changes in model-catalog.json
- add provider allowlist functionality in model validation
- update model-catalog.json with provider information
- enhance schema to include providerAllowlist

🔍 - Generated by Copilot
@katriendg katriendg force-pushed the feat/model-selection branch from a1449e5 to 9b129ac Compare May 6, 2026 15:12
@github-actions github-actions Bot mentioned this pull request May 6, 2026
@katriendg
Copy link
Copy Markdown
Contributor Author

Review

Two bundled changes: per-agent/prompt model frontmatter backed by a validated catalog with weekly refresh, and removal of the self-referential /compact handoff from 12 agents to fix Autopilot loops (#1420). All 59 CI checks green.
... omitting the rest

Thanks @bindsi for your review. I believe we have addressed everything:

PR Review Comments — Resolution Summary

# Status Action
1 Implemented Added "Detect catalog drift" CI step in model-validation.yml
2 Implemented Added provider field, providerAllowlist, and enforcement
3 Implemented Added .NOTES comment documenting upstream non-determinism
4 No action ubuntu-latest is the required pattern per repo conventions
5 No action Tests already exist for the retiring/multiplier-update paths
6 Implemented Aligned the frontmatter regex patterns to `(.*?)\r?\n---(\r?\n
7 No action Commit-message - ensure feat!: on merge (updated)
8 Implemented Replaced (Get-Location).Path with $RepoRoot via git rev-parse

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

This PR is well-structured, clearly motivated, and addresses two distinct problems (cost optimization via per-agent model selection, and the Autopilot infinite loop from the /compact handoff). The implementation quality is solid overall.


✅ Issue Alignment

The PR links Fixes #1420 and Closes #1540. The /compact handoff removal directly addresses the described infinite-loop root cause, and the model selection infrastructure is a coherent new feature. No scope creep detected; the additional documentation and copilot-instructions.md updates are appropriate companions to the code changes.


⚠️ PR Template Compliance

GHCP Maturity Acknowledgment checkboxes are unchecked. The PR explicitly lists 9 experimental agents in a > [!WARNING] block and includes a dedicated GHCP Maturity table, but the two acknowledgment checkboxes at the bottom of that section remain unchecked:

- [ ] I acknowledge this PR includes non-stable GHCP artifacts
- [ ] Non-stable artifacts are intentional for this change

These require author sign-off before merge.


🔍 Coding Standards

  • PowerShell scripts follow all required conventions: copyright headers, #Requires -Version 7.0, [CmdletBinding()], $ErrorActionPreference = 'Stop', #region blocks, and the invocation guard pattern. ✅
  • Pester test files follow the required header ordering (#Requires -Modules Pester before copyright). ✅
  • Agent files use correct user-invocable: false for subagents. ✅
  • Workflow file uses persist-credentials: false and contents: read permissions. ✅

One concern flagged inline: the model: field in agent frontmatter is used as an array across all 7 subagents, but the prompt-builder.instructions.md spec documents it as a scalar string. This pattern should be validated against VS Code's runtime behaviour and documented before wider adoption — see the inline comment on researcher-subagent.agent.md.


🔒 Code Quality

Workflow action version commentsactions/checkout # v6.0.2 and actions/upload-artifact # v7.0.1 carry unexpectedly high version numbers. The SHAs themselves satisfy the SHA-pinning requirement, but incorrect version comments reduce the audit value. Please verify these tags exist on the respective action repos (inline comments added).

Invoke-WebRequest timeoutGet-RemoteYaml in Update-ModelCatalog.ps1 has no -TimeoutSec, so a slow upstream could block the scheduled job indefinitely. Adding -TimeoutSec 30 is a low-risk hardening step (inline comment added).

cancel-in-progress: true — Safe for PR-triggered runs, but worth noting the design trade-off for the weekly scheduled scan (inline comment added). Not a blocking issue.


📋 Action Items

  1. ✅ Check both GHCP Maturity Acknowledgment boxes before merge.
  2. 🔍 Verify actions/checkout and actions/upload-artifact version comments match the pinned SHAs.
  3. 💡 Confirm VS Code supports the model: array format, and update prompt-builder.instructions.md to document it.
  4. 💡 Add -TimeoutSec 30 to Invoke-WebRequest in Get-RemoteYaml.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #1420 issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by PR Review for issue #1541 · ● 2.3M

Comment thread .github/workflows/model-validation.yml
Comment thread .github/workflows/model-validation.yml
Comment thread scripts/linting/Update-ModelCatalog.ps1
Comment thread .github/workflows/model-validation.yml
Comment thread .github/agents/hve-core/subagents/researcher-subagent.agent.md
@WilliamBerryiii WilliamBerryiii merged commit e158d88 into main May 8, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants