|
|

Google – 3 Jun 26 (https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/)

Introducing Gemma 4 12B: a unified, encoder-free multimodal model (https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/)
An overview of Gemma 4 12B, a model designed to bring high-performance multimodal intelligence directly to your laptop.
以下为官方公布的benchmark

同样为多模态模型,采用encoder-free 架构训练,支持文字,图片,音频
可参阅相关技术报告

developers.googleblog.com (https://developers.googleblog.com/gemma-4-12b-the-developer-guide/)

Gemma 4 12B: The Developer Guide- Google Developers Blog (https://developers.googleblog.com/gemma-4-12b-the-developer-guide/)
Meet Gemma 4 12B: the first medium-sized, encoder-free multimodal model capable of natively ingesting audio and video. Ideal for local AI development with 16GB VRAM, Hugging Face integrations, and drop-in local API servers.
采用sliding window attention技术 1024的滑动窗口大小,256k上下文长度.
谷歌blog介绍,其性能接近gemma4 26b model
 |
|