HuggingFaceM4
/

Idefics3-8B-Llama3

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

HugoLaurencon commited on Aug 6, 2024

Commit

86e519e

·

verified ·

1 Parent(s): 657234f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -60,7 +60,7 @@ Idefics3 demonstrates a net improvement over Idefics2, especially in document un
 | **Idefics3-8B** | 46.6             | 58.4                   | 55.9                | 87.7                 | 74.9              |
-**Idefics2 introduces several changes compared to Idefics2:**
 - We use 169 visual tokens to encode a image of size 364x364. Each image is divided into several sub images of sizes at most 364x364, which are then encoded separately.
 - For the fine-tuning datasets, we have extended [The Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and added several datasets, including [Docmatix](HuggingFaceM4/Docmatix). We will push soon these datasets to the same repo of The Cauldron (TODO).

 | **Idefics3-8B** | 46.6             | 58.4                   | 55.9                | 87.7                 | 74.9              |
+**Idefics3 introduces several changes compared to Idefics2:**
 - We use 169 visual tokens to encode a image of size 364x364. Each image is divided into several sub images of sizes at most 364x364, which are then encoded separately.
 - For the fine-tuning datasets, we have extended [The Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and added several datasets, including [Docmatix](HuggingFaceM4/Docmatix). We will push soon these datasets to the same repo of The Cauldron (TODO).