This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance. (View Highlight)
What’s new in PaliGemma 2 mix?
• Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation.
• Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).
• Use with your preferred framework: Leverage your preferred tools and frameworks, including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cp (View Highlight)