Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

rw-book-cover

Metadata

Author: Omar Sanseviero
Full Title: Introducing PaliGemma 2 mix: A vision-language model for multiple tasks
URL: https://developers.googleblog.com/en/introducing-paligemma-2-mix/

This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance. (View Highlight)
What’s new in PaliGemma 2 mix? • Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation. • Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px). • Use with your preferred framework: Leverage your preferred tools and frameworks, including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cp (View Highlight)