Google – 2 Apr 26 (https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/)
New ways to balance cost and reliability in the Gemini API (https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/)
Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.
Google AI for Developers (https://ai.google.dev/gemini-api/docs/optimization?hl=zh-cn#inference-tiers)
推理服务层级(同步) - Gemini API 优化和推理 | Google AI for Developers (https://ai.google.dev/gemini-api/docs/optimization?hl=zh-cn#inference-tiers)
您可以在延迟优化型同步流量和费用优化型同步流量之间切换,只需在标准生成调用中传递 service_tier 参数即可。 | 了解 Gemini API 提供的不同推理和优化选项。