Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

Introducing PaliGemma 2 mix: A vision-language model for multiple tasks

This previous December, we launched PaliGemma 2, an upgraded vision-language mannequin within the Gemma household. The discharge included pretrained checkpoints of various sizes (3B, 10B, and 28B parameters) that may be simply fine-tuned on a variety of vision-language duties and domains, comparable to picture segmentation, quick video captioning, scientific query answering and text-related duties with excessive efficiency.

Now, we’re thrilled to announce the launch of PaliGemma 2 combine checkpoints. PaliGemma 2 combine are fashions tuned to a mix of duties that enable immediately exploring the mannequin capabilities and utilizing it out-of-the-box for frequent use circumstances.

What’s new in PaliGemma 2 combine?

  • A number of duties with one mannequin: PaliGemma 2 combine can clear up duties comparable to quick and lengthy captioning, optical character recognition (OCR), picture query answering, object detection and segmentation.
  • Developer-friendly sizes: Use the perfect mannequin to your wants due to the completely different mannequin sizes (3B, 10B, and 28B parameters) and resolutions (224px and 448px).

Should you have been already utilizing the unique PaliGemma combine checkpoints, you may immediately improve to PaliGemma 2 while not having to do any modifications. The mannequin performs completely different duties relying on the way it’s prompted. You possibly can evaluate the completely different immediate process syntax within the official documentation and be taught extra about how PaliGemma 2 was developed in our technical report.


Detection

  • Process: Detection (PaliGemma-2-3b-mix-224)
  • Enter: “detect androidn”

Consequence: a cow standing on a seaside subsequent to an indication that claims warning harmful rip present.

Optical Character Recognition (OCR)

Consequence: A cow standing on a seaside subsequent to a warning signal.

Consequence:

WARNING DANGEROUS

RIP CURRENT


Get Began As we speak

Prepared to find the potential of PaliGemma 2? Right here is how one can discover the combination mannequin capabilities:

  • Check out the combination mannequin with a couple of clicks: Discover the combination mannequin capabilities immediately on the Hugging Face demo.
  • Learn to run the mannequin: Check out the Keras inference notebook immediately in Google Colab or domestically.

Whereas PaliGemma 2 combine has sturdy efficiency throughout a number of duties, you’ll get the perfect outcomes by fine-tuning PaliGemma 2 in your personal process or area. To discover ways to do it, dive into our comprehensive documentation, test our official example notebooks for Keras and JAX, or use the Hugging Face transformers example. We’re trying ahead to seeing what you construct with it!

Leave a Reply