arxiv:2409.10173

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Published on Sep 16, 2024

· Submitted by

akhaliq on Sep 17, 2024

Upvote

Authors:

Saba Sturua ,

Isabelle Mohr ,

Mohammad Kalim Akram ,

Michael Günther ,

Bo Wang ,

Markus Krimmel ,

Georgios Mastrapas ,

Andreas Koukounas ,

Nan Wang ,

Han Xiao

Abstract

We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter Sep 17, 2024

https://huggingface.co/jinaai/jina-embeddings-v3

librarian-bot

Sep 18, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

HeNa111

Nov 25, 2024

•

edited Nov 25, 2024

does it support multi-modal

jupyterjazz

Paper author Nov 25, 2024

@HeNa111 , jina-embeddings-v3 supports only text. However, we recently released jina-clip-v2 which is similar to jina-embeddings-v3 and additionally supports images.

wilfoderek

23 days ago

Hi everyone!
I am currently working on a project focused on asymmetric semantic search involving hard negative sentences, and I would like to fine-tune a model using this approach. I am seeking a practical example to better understand the process.

Could you please provide an example , please.