Gemma Embeddings v1.0
GemmaEmbed is a dense-vector embedding model, trained especially for retrieval. As of December 12, 2024, GemmaEmbed achieves the #1 position overall on the MTEB leaderboard, with a score of 72.72.
Important Notes
- This is not an official Google product.
- This is a research project.
Results summary
Results comparing with BGE-EN-ICL and NV-Embed-v2 on each task in MTEB:
Model | Total (56) | Classification (12) | Classification Pair (3) | STS (10) | Clustering (11) | Reranking (4) | Retrieval (15) | Summary (1) |
---|---|---|---|---|---|---|---|---|
bge-en-icl | 0.7167 | 0.8895 | 0.8814 | 0.8425 | 0.5789 | 0.5986 | 0.6216 | 0.3077 |
NV-Embed-v2 | 0.7231 | 0.9037 | 0.8867 | 0.8431 | 0.5846 | 0.6065 | 0.6265 | 0.3070 |
Gemma-Embeddings-v1.0 | 0.7272 | 0.9000 | 0.8809 | 0.8423 | 0.5826 | 0.6214 | 0.6371 | 0.4052 |
Model & Data
Our base encoder model is Gemma2 9B.
We use the BGE-EN-ICL training data.
Research Team
- Nicholas Monath
- Michael Boratko
- Seungyeon Kim
- Andrew McCallum
- Rob Fergus
- Manzil Zaheer
- Downloads last month
- 632
Model tree for google/Gemma-Embeddings-v1.0
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported94.627
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported91.931
- f1_weighted on MTEB AmazonCounterfactualClassification (en)test set self-reported94.770
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported77.826
- ap_weighted on MTEB AmazonCounterfactualClassification (en)test set self-reported77.826
- main_score on MTEB AmazonCounterfactualClassification (en)test set self-reported94.627
- accuracy on MTEB AmazonPolarityClassification (default)test set self-reported97.038
- f1 on MTEB AmazonPolarityClassification (default)test set self-reported97.038
- f1_weighted on MTEB AmazonPolarityClassification (default)test set self-reported97.038
- ap on MTEB AmazonPolarityClassification (default)test set self-reported95.872