feat(agents)!: per-agent model selection for cost optimization and /compact loop fix#1541
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1541 +/- ##
==========================================
+ Coverage 85.46% 85.62% +0.16%
==========================================
Files 80 77 -3
Lines 11541 10779 -762
==========================================
- Hits 9863 9230 -633
+ Misses 1678 1549 -129
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Advisory review, this PR is from a maintainer. Findings are informational only.
Review Summary
This PR is well-structured and addresses two distinct, clearly-scoped concerns — per-agent model selection for cost optimisation and removal of the /compact handoff loop. The implementation follows repository conventions throughout.
Issue Alignment ✅
- Fixes #1420 (Autopilot
/compactinfinite loop): the removal of theCompacthandoff from 12 agents directly addresses the root cause described in the issue. - Closes #1540: per-agent model selection infrastructure is complete and consistent with the stated intent.
- No scope creep observed; changes are tightly scoped to model frontmatter additions, handoff removals, and the supporting catalog/validation infrastructure.
PR Template Compliance ⚠️
One minor gap: the GHCP Maturity Acknowledgment section at the bottom of the PR description has both checkboxes unchecked. The maturity warning block and the artifact table are fully filled in, so this appears to be an oversight rather than an intentional omission. Please check these two boxes to complete the template.
Coding Standards ✅
- All new PowerShell scripts (
Test-ModelReferences.ps1,Update-ModelCatalog.ps1) follow the repository's PowerShell conventions: copyright header,#Requires -Version 7.0, comment-based help,[CmdletBinding()],$ErrorActionPreference = 'Stop', main execution guard, region blocks, and[OutputType()]on exported functions. - Test files (
Test-ModelReferences.Tests.ps1,Test-UpdateModelCatalog.Tests.ps1) follow Pester 5 conventions:#Requires -Modules Pesterfirst, copyright header after,BeforeAll/AfterAlllifecycle,-Tag 'Unit'on allDescribeblocks, and$TestDrive-equivalent temp directory management. model-validation.ymluses the sameactions/checkoutandactions/upload-artifactSHA pins as the rest of the repository's workflows,persist-credentials: false,permissions: contents: readat both workflow and job level. All security requirements are met.modelfrontmatter field format (single string for prompts, priority array for agents) is consistent and matches the documentation added inai-artifacts-common.md.
Code Quality ✅
Two minor advisory findings are raised as inline comments:
model-catalog.jsonmissing initial$schemareference (line 1) — theUpdate-ModelCatalog.ps1writes$schemaon refresh but the seeded file committed here omits it; schema tooling won't apply until the first weekly run.Update-ModelCatalog.ps1mixedPSCustomObject/hashtable in$finalModels(lines 251–273) — functionally correct, but a normalisation step would make future maintenance safer.
No security vulnerabilities, no breaking changes beyond those declared, no missing error handling at system boundaries.
Documentation ✅
copilot-instructions.md,docs/contributing/ai-artifacts-common.md,docs/contributing/custom-agents.md,docs/contributing/prompts.md, anddocs/rpi/context-engineering.mdare all updated to reflect the new model selection guidance and/compactdeprecation from handoffs.- The
README-level npm script list is updated in bothpackage.jsonand the instructions file.
Outstanding Action Item
- Check the two GHCP Maturity Acknowledgment checkboxes in the PR description.
Note
🔒 Integrity filter blocked 1 item
The following item were blocked because they don't meet the GitHub integrity level.
- #1420
issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | none
ReviewTwo bundled changes: per-agent/prompt Strengths
Concerns1. Silent catalog drift in CI (medium). Suggestion: either (a) validate against the committed catalog only and run refresh on a separate scheduled job that opens an auto-PR, or (b) fail the CI step when the in-memory refreshed catalog differs materially from the committed one. 2. Provider-allowlist not mechanically enforced (medium). The catalog correctly contains non-Anthropic/OpenAI entries ( 3. Upstream pin (low). 4. Workflow runner pin (low — verify). 5. Untested branch in 6. 7. Breaking change disclosure. 8. Minor. VerdictApprove with non-blocking suggestions. Core mechanics and tests are solid. #1 (silent catalog drift) and #2 (provider-allowlist) are the most actionable — both undermine the validation guarantee the PR is selling. Worth addressing before merge or in immediate follow-up. |
bindsi
left a comment
There was a problem hiding this comment.
Really like it, that´s such an improvement and optimisation of cost impact for users. Thank you so much
- add model frontmatter to 7 research/validation subagents with fast-tier fallbacks - add model frontmatter to 7 Tier 1 prompts using Claude Haiku 4.5 - create model-catalog.json with 19 supported models and JSON schema - create Test-ModelReferences.ps1 validation script - add lint:models npm script integrated into lint:all chain ⚡ - Generated by Copilot
- add model-validation.yml with weekly schedule and PR-triggered runs - add catalog freshness check warning when catalog exceeds 90 days - refactor Test-ModelReferences.ps1 to extract Invoke-ModelReferenceValidation function - add Test-ModelReferences.Tests.ps1 with 41 unit tests covering all code paths 🧪 - Generated by Copilot
- add Update-ModelCatalog.ps1 with YAML source fetching from github/docs - add Pester tests for Merge-ModelData, Get-RemoteYaml, Compare-Catalogs - update workflow to run catalog refresh before validation - update agent model references from retiring Gemini 3 Flash (Preview) to GA name - refresh model-catalog.json with correct tier assignments 🔄 - Generated by Copilot
…spatch - add Model Selection for Subagents section to RPI parent agents - add model guidance to prompt-builder and security-reviewer - fast model for research/validation, session model for code generation - cost-first principle adapted for VS Code tier constraints 💰 - Generated by Copilot
Remove self-referential Compact handoffs that caused Autopilot infinite loops. The disk-first .copilot-tracking architecture and Memory Agent make /compact handoffs redundant. - Remove Compact from 5 RPI agents (rpi-agent, task-researcher, task-planner, task-implementor, task-reviewer) - Remove Compact from 7 additional agents (task-challenger, prompt-builder, security-planner, sssc-planner, rai-planner, experiment-designer, pptx) - Update context-engineering.md to clarify /compact as typed command - Remove Compact exit point from rai-identity.instructions.md - Regenerate plugins Closes #1420
…prompts documentation - update description of model property as a preference hint - explain fallback behavior when specified model is unavailable - emphasize cost tier constraints for prompt models 🔍 - Generated by Copilot
…-ModelCatalogUpdate functions - implement tests for validation output handling - cover scenarios for valid, invalid, and retiring references - ensure directory creation for output files in tests - validate catalog update logic with various conditions 🔍 - Generated by Copilot
…alidation - implement a check for changes in model-catalog.json - add provider allowlist functionality in model validation - update model-catalog.json with provider information - enhance schema to include providerAllowlist 🔍 - Generated by Copilot
…o root path 🔧 - Generated by Copilot
a1449e5 to
9b129ac
Compare
Thanks @bindsi for your review. I believe we have addressed everything: PR Review Comments — Resolution Summary
|
There was a problem hiding this comment.
PR Review Summary
This PR is well-structured, clearly motivated, and addresses two distinct problems (cost optimization via per-agent model selection, and the Autopilot infinite loop from the /compact handoff). The implementation quality is solid overall.
✅ Issue Alignment
The PR links Fixes #1420 and Closes #1540. The /compact handoff removal directly addresses the described infinite-loop root cause, and the model selection infrastructure is a coherent new feature. No scope creep detected; the additional documentation and copilot-instructions.md updates are appropriate companions to the code changes.
⚠️ PR Template Compliance
GHCP Maturity Acknowledgment checkboxes are unchecked. The PR explicitly lists 9 experimental agents in a > [!WARNING] block and includes a dedicated GHCP Maturity table, but the two acknowledgment checkboxes at the bottom of that section remain unchecked:
- [ ] I acknowledge this PR includes non-stable GHCP artifacts
- [ ] Non-stable artifacts are intentional for this change
These require author sign-off before merge.
🔍 Coding Standards
- PowerShell scripts follow all required conventions: copyright headers,
#Requires -Version 7.0,[CmdletBinding()],$ErrorActionPreference = 'Stop',#regionblocks, and the invocation guard pattern. ✅ - Pester test files follow the required header ordering (
#Requires -Modules Pesterbefore copyright). ✅ - Agent files use correct
user-invocable: falsefor subagents. ✅ - Workflow file uses
persist-credentials: falseandcontents: readpermissions. ✅
One concern flagged inline: the model: field in agent frontmatter is used as an array across all 7 subagents, but the prompt-builder.instructions.md spec documents it as a scalar string. This pattern should be validated against VS Code's runtime behaviour and documented before wider adoption — see the inline comment on researcher-subagent.agent.md.
🔒 Code Quality
Workflow action version comments — actions/checkout # v6.0.2 and actions/upload-artifact # v7.0.1 carry unexpectedly high version numbers. The SHAs themselves satisfy the SHA-pinning requirement, but incorrect version comments reduce the audit value. Please verify these tags exist on the respective action repos (inline comments added).
Invoke-WebRequest timeout — Get-RemoteYaml in Update-ModelCatalog.ps1 has no -TimeoutSec, so a slow upstream could block the scheduled job indefinitely. Adding -TimeoutSec 30 is a low-risk hardening step (inline comment added).
cancel-in-progress: true — Safe for PR-triggered runs, but worth noting the design trade-off for the weekly scheduled scan (inline comment added). Not a blocking issue.
📋 Action Items
- ✅ Check both GHCP Maturity Acknowledgment boxes before merge.
- 🔍 Verify
actions/checkoutandactions/upload-artifactversion comments match the pinned SHAs. - 💡 Confirm VS Code supports the
model:array format, and updateprompt-builder.instructions.mdto document it. - 💡 Add
-TimeoutSec 30toInvoke-WebRequestinGet-RemoteYaml.
Note
🔒 Integrity filter blocked 1 item
The following item were blocked because they don't meet the GitHub integrity level.
- #1420
issue_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
To allow these resources, lower min-integrity in your GitHub frontmatter:
tools:
github:
min-integrity: approved # merged | approved | unapproved | none
Description
This PR introduces per-agent model selection via frontmatter, backed by a validated model catalog that tracks GitHub Copilot's evolving model lineup. Simple tasks (git operations, issue triage, research) now route to fast-tier models at a fraction of the cost, while complex agents inherit the session model for full capability.
Additionally, this PR removes the self-referential
/compacthandoff from 12 agents, eliminating the root cause of Autopilot infinite loops reported in #1420. The disk-first.copilot-tracking/architecture and Memory Agent already provide equivalent persistence without the loop risk.Model Selection Infrastructure
scripts/linting/model-catalog.json) tracking 25 models across 5 tiers (free, fast, standard, premium, ultra) with multiplier values, vendor attribution, and GA/preview/retiring statusscripts/linting/Update-ModelCatalog.ps1) that fetches authoritative YAML fromgithub/docsfor model release status and multiplier data; marks removed models as retiring with 60-day grace period rather than deletingscripts/linting/Test-ModelReferences.ps1) that scans all.agent.mdand.prompt.mdfiles for model frontmatter and validates references against the catalog; reports invalid models as errors, retiring models as warningsscripts/linting/schemas/model-catalog.schema.json) for structural validation of the catalog file.github/workflows/model-validation.yml) running every Wednesday plus PR-triggered validation on agent/prompt/catalog changes; includes catalog freshness check and artifact uploadlint:modelsandlint:models:refreshintopackage.json; model validation runs as part of thelint:allchainPer-Agent and Per-Prompt Model Assignment
Assigned fast-tier models to 7 subagents performing read-heavy validation tasks: researcher-subagent, plan-validator, implementation-validator, prompt-evaluator, rpi-validator, codebase-profiler, and report-generator. Each declares a prioritized fallback array:
Claude Haiku 4.5 → GPT-5.4 mini.Assigned
Claude Haiku 4.5 (copilot)to 7 prompts handling mechanical operations: git-commit-message, git-commit, git-setup, github-add-issue, github-discover-issues, github-triage-issues, and checkpoint.Added "Model Selection for Subagents" guidance to 6 parent agents (task-researcher, task-planner, task-implementor, task-reviewer, prompt-builder, security-reviewer) documenting cost-first dispatch decisions and VS Code tier constraint behavior.
/compact Handoff Removal (Fixes #1420)
Removed the
Compacthandoff entry from all 12 agents where it appeared. Eleven had it as their first handoff, causing Autopilot to auto-execute it on every turn completion, creating an infinite self-referential loop.rai-identity.instructions.mdto remove the "Compact handoff" exit point reference from disclaimer display logicdocs/rpi/context-engineering.mdto recommend/checkpoint(Memory Agent) for cross-phase persistence and clarify that/compactremains available as a typed commandPR #1492 (
feat/context-working) adds Context Discipline to 5 RPI parent agents, enforcing disk-first lean responses. The/compacthandoff is now architecturally redundant because:.copilot-tracking/files — all state already lives on diskTest Coverage
Test-ModelReferences.Tests.ps1) covering validation logic, frontmatter parsing, and error handlingTest-UpdateModelCatalog.Tests.ps1) covering catalog merge, comparison, and refresh logicRelated Issue(s)
Fixes #1420
Closes #1540
Type of Change
Select all that apply:
Code & Documentation:
Infrastructure & Configuration:
AI Artifacts:
prompt-builderagent and addressed all feedback.github/instructions/*.instructions.md).github/prompts/*.prompt.md).github/agents/*.agent.md).github/skills/*/SKILL.md)Other:
.ps1,.sh,.py)Sample Prompts (for AI Artifact Contributions)
User Request:
Invoke any RPI agent (e.g.,
task researcher) with a research task. The agent dispatches its Researcher Subagent at fast-tier cost automatically. Runnpm run lint:modelsto validate all model references.Execution Flow:
model: "Claude Haiku 4.5 (copilot)"onrunSubagentcall.copilot-tracking/disk filesOutput Artifacts:
logs/model-validation-results.json— structured validation results with per-file statusscripts/linting/model-catalog.json— refreshed catalog afterlint:models:refreshSuccess Indicators:
npm run lint:modelsexits 0 with no invalid model referencesTesting
npm run lint:models— model reference validation (validates all 14 model-annotated files)Note
Add manual testing descriptions when applicable.
Checklist
Required Checks
AI Artifact Contributions
/prompt-analyzeto review contributionprompt-builderreviewRequired Automated Checks
The following validation commands must pass before merging:
npm run lint:mdnpm run spell-checknpm run lint:frontmatternpm run validate:skillsnpm run lint:md-linksnpm run lint:psnpm run plugin:generatenpm run docs:testSecurity Considerations
Warning
This PR includes experimental GHCP artifacts that may have breaking changes.
.github/agents/hve-core/task-challenger.agent.md.github/agents/experimental/experiment-designer.agent.md.github/agents/experimental/pptx.agent.md.github/agents/security/security-planner.agent.md.github/agents/security/sssc-planner.agent.md.github/agents/security/security-reviewer.agent.md.github/agents/security/subagents/codebase-profiler.agent.md.github/agents/security/subagents/report-generator.agent.md.github/agents/rai-planning/rai-planner.agent.mdGHCP Artifact Maturity
.github/agents/hve-core/rpi-agent.agent.md.github/agents/hve-core/task-researcher.agent.md.github/agents/hve-core/task-planner.agent.md.github/agents/hve-core/task-implementor.agent.md.github/agents/hve-core/task-reviewer.agent.md.github/agents/hve-core/prompt-builder.agent.md.github/agents/hve-core/task-challenger.agent.md.github/agents/experimental/experiment-designer.agent.md.github/agents/experimental/pptx.agent.md.github/agents/security/security-planner.agent.md.github/agents/security/sssc-planner.agent.md.github/agents/security/security-reviewer.agent.md.github/agents/security/subagents/codebase-profiler.agent.md.github/agents/security/subagents/report-generator.agent.md.github/agents/rai-planning/rai-planner.agent.md.github/prompts/hve-core/checkpoint.prompt.md.github/prompts/hve-core/git-commit-message.prompt.md.github/prompts/hve-core/git-commit.prompt.md.github/prompts/hve-core/git-setup.prompt.md.github/prompts/github/github-add-issue.prompt.md.github/prompts/github/github-discover-issues.prompt.md.github/prompts/github/github-triage-issues.prompt.md.github/instructions/rai-planning/rai-identity.instructions.mdGHCP Maturity Acknowledgment
Additional Notes
/compactremoval is a breaking change for users who relied on the handoff button. The/compacttyped command remains available; only the agent-surfaced handoff is removed.Follow-up Tasks