---
base_model: BAAI/bge-base-en-v1.5
language:
- fr
library_name: sentence-transformers
license: apache-2.0
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:47560
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: Qui a écouté le roi Asa et envoyé son armée contre les villes d'Israël?
sentences:
- Ben-Hadad.
- Il se prosterna devant le roi, le visage contre terre.
- Baescha, fils d'Achija.
- source_sentence: Quelle est l'importance de distribuer tous ses biens aux pauvres
sans charité?
sentences:
- Adina, fils de Schiza, était le chef des Rubénites.
- Distribuer tous ses biens aux pauvres sans charité ne sert à rien.
- L'Éternel.
- source_sentence: Qui sont les enfants du père d'Étham?
sentences:
- Jizreel, Jischma, Jidbasch et leur sœur Hatselelponi.
- Chaque division comptait vingt-quatre mille hommes.
- 'Hosa était un fils de Merari et il avait quatre fils: Schimri, Hilkija, Thebalia,
et Zacharie.'
- source_sentence: Combien de temps Nadab, fils de Jéroboam, a-t-il régné sur Israël?
sentences:
- Ils sont des serviteurs par le moyen desquels les frères ont cru, selon que le
Seigneur l'a donné à chacun.
- 'Sept fils: Jeusch, Benjamin, Éhud, Kenaana, Zéthan, Tarsis et Achischachar, enregistrés
au nombre de dix-sept mille deux cents.'
- Deux ans.
- source_sentence: Quand les Lévites devaient-ils se présenter pour louer et célébrer
l'Éternel?
sentences:
- Chaque matin et chaque soir.
- Cinq mille talents d'or et dix mille talents d'argent ont été donnés.
- Il doit demeurer circoncis.
model-index:
- name: BGE base bible test
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.13359388879019363
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.18795523183513946
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.21389234322259726
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.25102149582519095
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.13359388879019363
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.06265174394504648
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.04277846864451945
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0251021495825191
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.13359388879019363
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.18795523183513946
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.21389234322259726
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.25102149582519095
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.18816833747648484
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.16858798117458645
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.17400088915411802
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.12773139101083675
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.18546811156510926
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.20572037662106946
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.24213892343222598
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.12773139101083675
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.061822703855036416
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.041144075324213894
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0242138923432226
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.12773139101083675
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.18546811156510926
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.20572037662106946
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.24213892343222598
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.18151482198424093
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.1625760305898876
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.16802226648065993
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.12488896784508793
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.17463137324569195
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.19737075857168235
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.23272339669568307
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.12488896784508793
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.058210457748563975
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.03947415171433648
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.023272339669568307
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.12488896784508793
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.17463137324569195
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.19737075857168235
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.23272339669568307
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.17440736005896854
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.156282728049472
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.16141647615447188
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.10943329188132883
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.15686622845976195
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.17853970509859654
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.20838514833895896
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.10943329188132883
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.052288742819920644
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.03570794101971931
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0208385148338959
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.10943329188132883
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.15686622845976195
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.17853970509859654
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.20838514833895896
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.15566336146326976
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.13917031134121227
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.14405644027137798
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.08935867827322792
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.13181737431160065
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.15011547344110854
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.17694084206786284
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.08935867827322792
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.04393912477053354
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.03002309468822171
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.01769408420678629
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.08935867827322792
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.13181737431160065
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.15011547344110854
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.17694084206786284
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.13031373727839585
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.11575599150656894
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.12066444582998255
name: Cosine Map@100
---
# BGE base bible test
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- json
- **Language:** fr
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Steve77/bge-base-bible-retrieval")
# Run inference
sentences = [
"Quand les Lévites devaient-ils se présenter pour louer et célébrer l'Éternel?",
'Chaque matin et chaque soir.',
"Cinq mille talents d'or et dix mille talents d'argent ont été donnés.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
| cosine_accuracy@1 | 0.1336 | 0.1277 | 0.1249 | 0.1094 | 0.0894 |
| cosine_accuracy@3 | 0.188 | 0.1855 | 0.1746 | 0.1569 | 0.1318 |
| cosine_accuracy@5 | 0.2139 | 0.2057 | 0.1974 | 0.1785 | 0.1501 |
| cosine_accuracy@10 | 0.251 | 0.2421 | 0.2327 | 0.2084 | 0.1769 |
| cosine_precision@1 | 0.1336 | 0.1277 | 0.1249 | 0.1094 | 0.0894 |
| cosine_precision@3 | 0.0627 | 0.0618 | 0.0582 | 0.0523 | 0.0439 |
| cosine_precision@5 | 0.0428 | 0.0411 | 0.0395 | 0.0357 | 0.03 |
| cosine_precision@10 | 0.0251 | 0.0242 | 0.0233 | 0.0208 | 0.0177 |
| cosine_recall@1 | 0.1336 | 0.1277 | 0.1249 | 0.1094 | 0.0894 |
| cosine_recall@3 | 0.188 | 0.1855 | 0.1746 | 0.1569 | 0.1318 |
| cosine_recall@5 | 0.2139 | 0.2057 | 0.1974 | 0.1785 | 0.1501 |
| cosine_recall@10 | 0.251 | 0.2421 | 0.2327 | 0.2084 | 0.1769 |
| **cosine_ndcg@10** | **0.1882** | **0.1815** | **0.1744** | **0.1557** | **0.1303** |
| cosine_mrr@10 | 0.1686 | 0.1626 | 0.1563 | 0.1392 | 0.1158 |
| cosine_map@100 | 0.174 | 0.168 | 0.1614 | 0.1441 | 0.1207 |
## Training Details
### Training Dataset
#### json
* Dataset: json
* Size: 47,560 training samples
* Columns: anchor
and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
Quels sont les noms des fils de Schobal?
| Aljan, Manahath, Ébal, Schephi et Onam
|
| Quels sont les noms des fils de Tsibeon?
| Ajja et Ana
|
| Qui est le fils d'Ana?
| Dischon
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `bf16`: True
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates
#### All Hyperparameters