|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: sentence-similarity |
|
--- |
|
Repository with files to perform BM25 searches with [FastEmbed](https://github.com/qdrant/fastembed). |
|
|
|
[BM25 (Best Matching 25)](https://en.wikipedia.org/wiki/Okapi_BM25) is a ranking function used by search engines to estimate the relevance of documents to a given search query. |
|
|
|
### Usage |
|
|
|
> Note: |
|
This model is supposed to be used with Qdrant. Vectors have to be configured with [Modifier.IDF](https://qdrant.tech/documentation/concepts/indexing/?q=modifier#idf-modifier). |
|
|
|
Here's an example of BM25 with [FastEmbed](https://github.com/qdrant/fastembed). |
|
|
|
```py |
|
from fastembed import SparseTextEmbedding |
|
|
|
documents = [ |
|
"You should stay, study and sprint.", |
|
"History can only prepare us to be surprised yet again.", |
|
] |
|
|
|
model = SparseTextEmbedding(model_name="Qdrant/bm25") |
|
embeddings = list(model.embed(documents)) |
|
|
|
# [ |
|
# SparseEmbedding( |
|
# values=array([1.67419738, 1.67419738, 1.67419738, 1.67419738]), |
|
# indices=array([171321964, 1881538586, 150760872, 1932363795])), |
|
# SparseEmbedding(values=array( |
|
# [1.66973021, 1.66973021, 1.66973021, 1.66973021, 1.66973021]), |
|
# indices=array([ |
|
# 578407224, 1849833631, 1008800696, 2090661150, |
|
# 1117393019 |
|
# ])) |
|
# ] |
|
``` |
|
|
|
|
|
|
|
``` |