feat(google-vertex): update model YAMLs [bot]#1675
Conversation
|
/test-models |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2b1233f. Configure here.
| features: | ||
| - system_messages | ||
| - structured_output | ||
| - prompt_caching |
There was a problem hiding this comment.
Conflicting structured output flag
Medium Severity
This PR adds structured_output (and prompt_caching) to gemini-2.5-flash-image.yaml, while the related google/gemini-2.5-flash-image.yaml updated in the same PR still documents that structured output is not supported and omits those features.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2b1233f. Configure here.
| provisioning: serverless | ||
| removeParams: | ||
| - tool_choice | ||
| retirementDate: "2026-07-17" |
There was a problem hiding this comment.
Deprecated flag preview status
Medium Severity
The PR sets isDeprecated: true and a retirementDate on google/gemini-3.1-flash-image-preview, but leaves status: preview. Other deprecated Vertex models in this commit pair deprecation flags with status: deprecated.
Reviewed by Cursor Bugbot for commit 2b1233f. Configure here.
| context_window: 32768 | ||
| max_input_tokens: 32768 | ||
| context_window: 65536 | ||
| max_input_tokens: 65536 |
There was a problem hiding this comment.
Dropped deprecation on alias
Medium Severity
This PR removes deprecationDate from gemini-2.5-flash-image.yaml while google/gemini-2.5-flash-image.yaml in the same provider still carries deprecationDate: "2026-10-02" for the same underlying model family.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2b1233f. Configure here.
Gateway test results
Failures (6)
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-vertex/deepseek-ai-deepseek-v3.1-maas",
messages=[
{"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=False,
)
_usage = getattr(response, "usage", None)
_reasoning_detected = False
_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
_message = getattr(_choices[0], "message", None)
else:
_message = None
if _message and getattr(_message, "content", None) is not None:
print(_message.content)
if _usage is not None:
_output_token_details = getattr(_usage, "completion_tokens_details", None)
if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
elif getattr(_usage, "reasoning", None) is not None:
_reasoning_detected = True
if getattr(_message, "reasoning_content", None) is not None:
_reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
_reasoning_detected = True
if not _reasoning_detected:
print("Response: ", response)
raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")Output
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-vertex/deepseek-ai-deepseek-v3.1-maas",
messages=[
{"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=True,
)
_reasoning_detected = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if getattr(delta, "reasoning_content", None) is not None:
_reasoning_detected = True
if getattr(delta, "reasoning", None) is not None:
_reasoning_detected = True
_usage = getattr(chunk, "usage", None)
if _usage is not None:
_details = getattr(_usage, "completion_tokens_details", None)
if _details and getattr(_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
if not _reasoning_detected:
raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")Output
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name, e.g. London",
},
},
"required": ["location"],
"additionalProperties": False,
},
"strict": True,
},
},
]
response = client.chat.completions.create(
model="test-v2-vertex/meta-llama-4-scout-17b-16e-instruct-maas",
messages=[
{"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
],
tools=tools,
tool_choice="auto",
stream=True,
)
_tool_calls_made = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if delta.tool_calls:
_tool_calls_made = True
for _tc in delta.tool_calls:
if _tc.function:
print(_tc.function.arguments or "", end="", flush=True)
if not _tool_calls_made:
raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name, e.g. London",
},
},
"required": ["location"],
"additionalProperties": False,
},
"strict": True,
},
},
]
response = client.chat.completions.create(
model="test-v2-vertex/meta-llama-4-scout-17b-16e-instruct-maas",
messages=[
{"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hi, how can I help you"},
{"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
],
tools=tools,
tool_choice="auto",
stream=False,
)
_message = response.choices[0].message
if _message.tool_calls:
for _tc in _message.tool_calls:
print(f"Function: {_tc.function.name}")
print(f"Arguments: {_tc.function.arguments}")
else:
print(_message.content)
if not _message.tool_calls or len(_message.tool_calls) == 0:
raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")Output
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.embeddings.create(
model="test-v2-vertex/intfloat-multilingual-e5-large-instruct-maas",
input="What is the capital of France?",
encoding_format="float",
)
output = [embed.embedding for embed in response.data]
print(output)
ErrorCode snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.embeddings.create(
model="test-v2-vertex/google-gemini-embedding-2-preview",
input="What is the capital of France?",
encoding_format="float",
)
output = [embed.embedding for embed in response.data]
print(output)Successes (151)
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
Output
OutputSkipped (12)
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason
Skip reason |


Auto-generated by poc-agent for provider
google-vertex.Note
Medium Risk
Wide catalog changes can shift cost estimates, max-token validation, and deprecation routing; the DeepSeek OCR file explicitly flags CUE validation failure and should be verified before merge.
Overview
Bulk refresh of google-vertex provider model YAMLs (auto-generated bot PR): pricing, limits, features, regions, and lifecycle fields aligned with current docs.
Lifecycle: Claude Opus 4.1 entries move to
status: deprecatedwithdeprecationDate,isDeprecated, and normalized cost literals. Veo 2.0 / 3.0 video models are marked deprecated or retired (veo-3.0-fast-generate-001). Gemini 3.1 Flash Image Preview addsisDeprecatedandretirementDate. Several Gemini models swapdeprecationDateforretirementDateon live/preview SKUs (e.g. native audio, 2.5 Flash).Capabilities & limits: Claude Sonnet 4.6 raises max output to 128k; Gemini 2.5 Flash Image doubles input context to 65536 and adds structured_output / prompt_caching on one variant. DeepSeek OCR uses
supportedModes: ocr(was chat) and carries a CUE validation failed header. DeepSeek v3.1 adds cache-read pricing; v3.2 drops structured_output from features.Pricing & regions: Per-region batch and cache cost fields filled in across embeddings, Anthropic Opus 4.5, Gemini 2.5 Pro (cache creation per hour), Veo 3.1 Fast (720p/1080p/4k per-second rates), and new intfloat/multilingual-e5 cost blocks. Extra
us/euregion rows for Opus 4.5 and gemini-embedding-2-preview.Smaller fixes: Tool-use token count for Sonnet 4.5, gpt-oss-120b
reasoning_effortparam, modality tweaks (video on Gemma3n, glm-5 dropsdoc), and doc sources updates.Reviewed by Cursor Bugbot for commit 2b1233f. Bugbot is set up for automated code reviews on this repo. Configure here.