llamaindex
/

vdr-2b-multi-v1

@@ -26,10 +26,42 @@ vdr-2b-multi-v1 is a multilingual embedding model designed for visual document r
 - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
-To know more about the model, read the [announcement blogpost](https://huggingface.co/blog/marco/vdr-2b-multilingual).
 # Usage
 <details>
 <summary>
 via HuggingFace Transformers
@@ -174,29 +206,6 @@ def encode_documents(documents: list[Image.Image], dimension: int):
 </details>
-<details>
-<summary>
-via LlamaIndex
-</summary>
-```bash
-pip install -U llama-index-embeddings-huggingface
-```
-```python
-from llama_index.embeddings.huggingface import HuggingFaceEmbedding
-model = HuggingFaceEmbedding(
-    model_name_or_path="llamaindex/vdr-2b-multi-v1",
-    device="mps",
-    trust_remote_code=True,
-)
-embeddings = model.get_image_embedding("image.png")
-```
-</details>
 <details>
 <summary>
@@ -223,8 +232,6 @@ embeddings = model.encode("image.png")
 </details>
 # Training
 The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and it was trained on the new [vdr-multilingual-train](https://huggingface.co/datasets/llamaindex/vdr-multilingual-train) dataset that consinsists of 500k high quality, multilingual query image pairs. It was trained for 1 epoch using the [DSE approach](https://arxiv.org/abs/2406.11251), with a batch size of 128 and hard-mined negatives.

 - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
 # Usage
+The model uses bf16 tensors and allocates ~4.4GB of VRAM when loaded. You can easily run inference and generate embeddings using 768 image patches and a batch size of 16 even on a cheap NVIDIA T4 GPU. This table reports the memory footprint (GB) under conditions of different batch sizes with HuggingFace Transformers and maximum 768 image patches.
+| Batch Size | GPU Memory (GB) |
+|------------|-----------------|
+|          4 |             6.9 |
+|          8 |             8.8 |
+|         16 |            11.5 |
+|         32 |            19.7 |
+Generating embeddings with vdr-2b-multi-v1 is easier than ever with SentenceTransformers and LlamaIndex direct integrations. Get started with just a few lines of code:
+<details open>
+<summary>
+via LlamaIndex
+</summary>
+```bash
+pip install -U llama-index-embeddings-huggingface
+```
+```python
+from llama_index.embeddings.huggingface import HuggingFaceEmbedding
+model = HuggingFaceEmbedding(
+    model_name_or_path="llamaindex/vdr-2b-multi-v1",
+    device="mps",
+    trust_remote_code=True,
+)
+embeddings = model.get_image_embedding("image.png")
+```
+</details>
 <details>
 <summary>
 via HuggingFace Transformers
 </details>
 <details>
 <summary>
 </details>
 # Training
 The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLight/dse-qwen2-2b-mrl-v1) and it was trained on the new [vdr-multilingual-train](https://huggingface.co/datasets/llamaindex/vdr-multilingual-train) dataset that consinsists of 500k high quality, multilingual query image pairs. It was trained for 1 epoch using the [DSE approach](https://arxiv.org/abs/2406.11251), with a batch size of 128 and hard-mined negatives.