maidalun1020
/

bce-reranker-base_v1

Text Classification

sentence-transformers

PyTorch

Transformers

xlm-roberta

Model card Files Files and versions Community

maidalun1020 commited on Jan 31, 2024

Commit

eaa31a5

verified ·

1 Parent(s): c74b839

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -21

README.md CHANGED Viewed

@@ -36,27 +36,11 @@ language:
   <a href="https://github.com/netease-youdao/BCEmbedding">GitHub</a>
 </p>
-### Our Goals
-Provide a bilingual and crosslingual two-stage retrieval model repository for the RAG community, which can be used directly without finetuning, including `EmbeddingModel` and `RerankerModel`:
-- One Model: `EmbeddingModel` handle **bilingual and crosslingual** retrieval task in English and Chinese. `RerankerModel` supports **English, Chinese, Japanese and Korean**.
-- One Model: **Cover common business application scenarios with RAG optimization**. e.g. Education, Medical Scenario, Law, Finance, Literature, FAQ, Textbook, Wikipedia, General Conversation.
-- Easy to Integrate: We provide **API** in `BCEmbedding` for LlamaIndex and LangChain integrations.
-- Others Points:
-  - `RerankerModel` supports **long passages (more than 512 tokens) reranking**;
-  - `RerankerModel` provides **meaningful relevance score** that helps to remove passages with low quality.
-  - `EmbeddingModel` **does not need specific instructions**.
-  给RAG社区一个可以直接拿来用，尽可能不需要用户finetune的中英双语和跨语种二阶段检索模型库，包含`EmbeddingModel`和`RerankerModel`。
-  - 只需一个模型：`EmbeddingModel`覆盖 **中英双语和中英跨语种** 检索任务，尤其是其跨语种能力。`RerankerModel`支持 **中英日韩** 四个语种及其跨语种。
-  - 只需一个模型： **覆盖常见业务落地领域**（针对众多常见rag场景已做优化），比如：教育、医疗、法律、金融、科研论文、客服(FAQ)、书籍、百科、通用QA等场景。用户不需要在上述特定领域finetune，直接可以用。
-  - 方便集成：`EmbeddingModel`和`RerankerModel`提供了LlamaIndex和LangChain **集成接口** ，用户可非常方便集成进现有产品中。
-  - 其他特性：
-    - `RerankerModel`支持 **长passage（超过512）rerank**；
-    - `RerankerModel`可以给出有意义 **相关性分数** ，帮助 **过滤低质量召回**；
-    - `EmbeddingModel` **不需要“精心设计”instruction** ，尽可能召回有用片段。
 Related link for **EmbeddingModel** : [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)

   <a href="https://github.com/netease-youdao/BCEmbedding">GitHub</a>
 </p>
+主要特点(Key Features)：
+- 中英日韩四个语种，以及中英日韩四个语种的跨语种能力(Multilingual and Crosslingual capability in English, Chinese, Japanese and Korean)；
+- RAG优化，适配更多真实业务场景(RAG adaptation for more domains, including Education, Law, Finance, Medical, Literature, FAQ, Textbook, Wikipedia, etc.)；
+- <a href="https://github.com/netease-youdao/BCEmbedding">BCEmbedding</a>适配长文本做rerank(Handle long passages reranking more than 512 limit in <a href="https://github.com/netease-youdao/BCEmbedding">BCEmbedding</a>)；
+- RerankerModel可以提供可靠的 **相关性分数**，用于过滤低质量passage（RerankerModel provides **meaningful similarity score**, which help you figure out how relavent the query and passages are!）
 Related link for **EmbeddingModel** : [bce-embedding-base_v1](https://huggingface.co/maidalun1020/bce-embedding-base_v1)