rw-book-cover

Metadata

Highlights

  • This past December, we launched PaliGemma 2, an upgraded vision-language model in the Gemma family. The release included pretrained checkpoints of different sizes (3B, 10B, and 28B parameters) that can be easily fine-tuned on a wide range of vision-language tasks and domains, such as image segmentation, short video captioning, scientific question answering and text-related tasks with high performance. (View Highlight)
  • What’s new in PaliGemma 2 mix?Multiple tasks with one model: PaliGemma 2 mix can solve tasks such as short and long captioning, optical character recognition (OCR), image question answering, object detection and segmentation. • Developer-friendly sizes: Use the best model for your needs thanks to the different model sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px). • Use with your preferred framework: Leverage your preferred tools and frameworks, including Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cp (View Highlight)