view article Article PaliGemma ā Google's Cutting-Edge Open Vision Language Model May 14, 2024 ā¢ 233
view post Post 1761 New open Vision Language Model by @Google : PaliGemma šš¤š Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolutionš§© Combination of Gemma 2B LLM and SigLIP image encoderš¤ Supported in transformersPaliGemma can do..š§© Image segmentation and detection! š¤Æš Detailed document understanding and reasoningš Visual question answering, captioning and any other VLM task!Read our blog š hf.co/blog/paligemmaTry the demo šŖ hf.co/spaces/google/paligemmaCheck out the Spaces and the models all in the collection š google/paligemma-release-6643a9ffbf57de2ae0448ddaCollection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda 13 replies Ā· š„ 13 13 š 8 8 ā¤ļø 6 6 š 4 4 + Reply
Salesforce/xgen-mm-phi3-mini-instruct-r-v1 Image-Text-to-Text ā¢ Updated Sep 18, 2024 ā¢ 1.65k ā¢ 185
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x ā¢ Jun 23, 2024 ā¢ 34
[lecture artifacts] aligning open language models Collection artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin ā¢ 63 items ā¢ Updated Apr 17, 2024 ā¢ 56
view article Article Fine-tuning a large language model on Kaggle Notebooks (or even on your own computer) for solving real-world tasks By lmassaron ā¢ Feb 21, 2024 ā¢ 14
view article Article Design choices for Vision Language Models in 2024 By gigant ā¢ Apr 16, 2024 ā¢ 25