Google Launches Gemma 2 2B and Gemini 1.5 Pro

Last week, Google also stepped up its game by releasing the Gemini 1.5 Pro and Gemma 2 2B models. Among them, the Gemini 1.5 Pro 0801 has surpassed the GPT-4o mini in the comprehensive ranking of the LLM arena, becoming the first. Google said that this is an experimental version and not an official version, so it is only available in AI Studio. However, from the tests, the multimodal capabilities of Gemini 1.5 Pro 0801 are very powerful, basically surpassing GPT-4o and Claude 3.5, and it supports audio and video. I tried it with a podcast file of more than an hour, and it summarized in just a few seconds. In addition, Google also released the Gemma 2 2B model, which can run on device-side. This model also scored higher than many LLMs much larger than it in the LLM arena. This is the quantified Gemma 2 2B plus MLX running effect on the iPhone 15pro. Moreover, this model also comes with Google's newly released security classifier ShieldGemma, which can effectively detect hate speech, harassment, sexual content, and dangerous content.

In June, we launched Gemma 2, our latest best-in-class open model with parameter sizes of 27 billion (27B) and 9 billion (9B). Since its debut, the 27B model has quickly become one of the highest-ranking open models on the LMSYS Chatbot Arena leaderboard, even outperforming popular models more than twice its size in real conversations.

But Gemma is not just about performance. It is built on responsible AI, prioritizing safety and accessibility. To support this commitment, we are pleased to announce three new additions to the Gemma 2 series:

Gemma 2 2B - The new version of our popular 2 billion (2B) parameter model, with built-in safety improvements and a strong balance of performance and efficiency.
ShieldGemma - A set of safety content classifier models built on Gemma 2 to filter AI model inputs and outputs and ensure user safety.
Gemma Scope - A new model explainability tool that provides unparalleled insight into the inner workings of our models.

With these additional features, researchers and developers can now create safer customer experiences, gain unprecedented insights into our models, and confidently deploy powerful AI responsibly on devices, unlocking new possibilities for innovation.

Gemma 2 2B: Experience the next generation of performance on devices now
We are excited to launch the Gemma 2 2B model, the highly anticipated new member of the Gemma 2 series. This lightweight model has produced extraordinary results by distilling what it has learned from larger models. In fact, Gemma 2 2B has surpassed all GPT-3.5 models on Chatbot Arena, demonstrating its superior conversational AI capabilities.

Chart - LMSYS Chatbot Arena leaderboard scores
LMSYS Chatbot Arena leaderboard scores captured on July 30, 2024. Gemma 2 2B scores +/- 10.
Gemma 2 2B offers:

Superior performance: Provides the best-in-class performance for its size, surpassing other open models in its category.
Flexible and cost-effective deployment: Gemma 2 2B can run efficiently on a variety of hardware - from edge devices and laptops to powerful cloud deployments with Vertex AI and Google Kubernetes Engine (GKE). To further increase speed, it is optimized with NVIDIA TensorRT-LLM library and is available in the form of NVIDIA NIM. This optimization targets various deployments, including data centers, cloud, local workstations, PCs, and edge devices - using NVIDIA RTX, NVIDIA GeForce RTX GPUs, or NVIDIA Jetson modules for edge AI. Additionally, Gemma 2 2B integrates seamlessly with Keras, JAX, Hugging Face, NVIDIA NeMo, Ollama, Gemma.cpp, and the upcoming MediaPipe to simplify development.
Open and easily accessible: Available for research and commercial applications under the commercially friendly Gemma terms. It is even small enough to run on the free tier of Google Colab's T4 GPU, making experimentation and development easier than ever.
Starting today, you can download Gemma 2 model weights from Kaggle, Hugging Face, Vertex AI Model Garden. You can also try out its features in Google AI Studio.

ShieldGemma: Protect users with the state-of-the-art safety classifier
Deploying open models responsibly to ensure engaging, safe, and inclusive AI outputs requires significant effort from developers and researchers. To assist developers in this process, we have launched ShieldGemma, a suite of state-of-the-art safety classifiers designed to detect and mitigate harmful content in AI model inputs and outputs. ShieldGemma specifically targets four key areas of harm:

Hate speech
Harassment
Sexual content
Dangerous content
Generative AI application model architecture
These open classifiers complement the existing suite of safety classifiers in the Responsible AI Toolkit, which includes a method for building classifiers tailored to specific policies using a limited number of data points, as well as existing Google Cloud ready-made classifiers provided via API.

ShieldGemma can help you create safer, better AI applications as follows:

SOTA performance: ShieldGemma is built on Gemma 2 and is the industry-leading safety classifier.
Flexible sizes: ShieldGemma offers various model sizes to meet different needs. The 2B model is ideal for online classification tasks, while the 9B and 27B versions provide higher performance for offline applications that are less concerned with latency. All sizes utilize NVIDIA speed optimization for efficient performance across hardware.
Open and collaborative: The openness of ShieldGemma encourages transparency and collaboration within the AI community, contributing to the future of machine learning industry safety standards.

"As artificial intelligence continues to mature, the entire industry needs to invest in the development of high-performance safety evaluators. We are pleased to see Google make this investment and look forward to their continued participation in our AI Safety Working Group." ~ Rebecca Weiss, Executive Director of ML Commons
Based on the best F1 (left) / AU-PRC (right) evaluation results, the higher the better.
Based on the best F1 (left) / AU-PRC (right) evaluation results, the higher the better. We use 𝛼=0 and T = 1 to calculate probabilities. ShieldGemma (SG) Prompt and SG Response are our test datasets, and OpenAI Mod/ToxicChat are external benchmarks. The performance of baseline models on external datasets comes from Ghosh et al. (2024); Inan et al. (2023).
Learn more about ShieldGemma, view the full results in the technical report, and start building safer AI applications with our comprehensive responsible generative AI toolkit.

Gemma Scope: Clarify AI decision-making with open sparse autoencoders
Gemma Scope provides researchers and developers with unprecedented transparency, allowing them to understand the decision-making process of the Gemma 2 models. Gemma Scope is like a powerful microscope, using sparse autoencoders (SAE) to zoom in on specific points in the model, making its inner workings easier to interpret.

These SAEs are specialized neural networks that help us unpack the dense, complex information processed by Gemma 2, expanding it into a more analyzable and understandable form. By studying these expanded views, researchers can gain valuable insights into how Gemma 2 identifies patterns, processes information, and ultimately makes predictions. With Gemma Scope, we aim to help the AI research community explore how to build more understandable, reliable, and robust AI systems.

The breakthroughs of Gemma Scope are as follows:

Open SAE: Over 400 free SAEs covering all layers of Gemma 2 2B and 9B.
Interactive demos: Explore SAE features and analyze model behavior without writing code on Neuronpedia.
Easy-to-use repository: Code and examples for interacting with SAEs and Gemma 2.
Learn more about Gemma Scope in the Google DeepMind blog, technical report, and developer documentation.

Building a future based on responsible artificial intelligence
These releases reflect our ongoing commitment to providing the AI community with the tools and resources needed to build a future where AI benefits everyone. We believe that open access, transparency, and collaboration are essential for developing safe and beneficial AI.