Update README.md
Browse files
README.md
CHANGED
@@ -14236,7 +14236,7 @@ model-index:
|
|
14236 |
---
|
14237 |
|
14238 |
|
14239 |
-
Быстрая модель BERT для расчетов
|
14240 |
|
14241 |
|
14242 |
|
@@ -14252,6 +14252,7 @@ print(util.dot_score(embeddings, embeddings))
|
|
14252 |
```
|
14253 |
|
14254 |
## Метрики
|
|
|
14255 |
Оценки модели на бенчмарке [encodechka](https://github.com/avidale/encodechka):
|
14256 |
|
14257 |
| model | CPU | GPU | size | Mean S | Mean S+W | dim |
|
@@ -14274,5 +14275,38 @@ print(util.dot_score(embeddings, embeddings))
|
|
14274 |
| intfloat/multilingual-e5-small | 0.822 | 0.714 | 0.457 | 0.758 | 0.957 | 0.761 | 0.779 | 0.691 | 0.234 | 0.275 |
|
14275 |
| cointegrated/rubert-tiny2 | 0.750 | 0.651 | 0.417 | 0.737 | 0.937 | 0.746 | 0.757 | 0.638 | 0.360 | 0.386 |
|
14276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14277 |
|
14278 |
|
|
|
14236 |
---
|
14237 |
|
14238 |
|
14239 |
+
Быстрая модель BERT для расчетов эмбеддингов предложений на русском языке. Модель основана на [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) - имеет аналогичные размеры контекста (2048), ембединга (312) и быстродействие.
|
14240 |
|
14241 |
|
14242 |
|
|
|
14252 |
```
|
14253 |
|
14254 |
## Метрики
|
14255 |
+
|
14256 |
Оценки модели на бенчмарке [encodechka](https://github.com/avidale/encodechka):
|
14257 |
|
14258 |
| model | CPU | GPU | size | Mean S | Mean S+W | dim |
|
|
|
14275 |
| intfloat/multilingual-e5-small | 0.822 | 0.714 | 0.457 | 0.758 | 0.957 | 0.761 | 0.779 | 0.691 | 0.234 | 0.275 |
|
14276 |
| cointegrated/rubert-tiny2 | 0.750 | 0.651 | 0.417 | 0.737 | 0.937 | 0.746 | 0.757 | 0.638 | 0.360 | 0.386 |
|
14277 |
|
14278 |
+
Оценки модели на бенчмарке [ruMTEB](https://habr.com/ru/companies/sberdevices/articles/831150/):
|
14279 |
+
|
14280 |
+
|Model Name | Metric | sbert_large_mt_nlu_ru | sbert_large_nlu_ru | rubert-tiny2 | rubert-tiny-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large |
|
14281 |
+
|:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|---------------------:|----------------------:|
|
14282 |
+
|CEDRClassification | Accuracy | 0.368 | 0.358 | 0.369 | 0.390 | 0.401 | 0.423 | **0.448** |
|
14283 |
+
|GeoreviewClassification | Accuracy | 0.397 | 0.400 | 0.396 | 0.414 | 0.447 | 0.461 | **0.497** |
|
14284 |
+
|GeoreviewClusteringP2P | V-measure | 0.584 | 0.590 | 0.442 | 0.597 | 0.586 | 0.545 | **0.605** |
|
14285 |
+
|HeadlineClassification | Accuracy | 0.772 | **0.793** | 0.742 | 0.686 | 0.732 | 0.757 | 0.758 |
|
14286 |
+
|InappropriatenessClassification | Accuracy | **0.646** | 0.625 | 0.586 | 0.591 | 0.592 | 0.588 | 0.616 |
|
14287 |
+
|KinopoiskClassification | Accuracy | 0.503 | 0.495 | 0.491 | 0.505 | 0.500 | 0.509 | **0.566** |
|
14288 |
+
|RiaNewsRetrieval | NDCG@10 | 0.214 | 0.111 | 0.140 | 0.513 | 0.700 | 0.702 | **0.807** |
|
14289 |
+
|RuBQReranking | MAP@10 | 0.561 | 0.468 | 0.461 | 0.622 | 0.715 | 0.720 | **0.756** |
|
14290 |
+
|RuBQRetrieval | NDCG@10 | 0.298 | 0.124 | 0.109 | 0.517 | 0.685 | 0.696 | **0.741** |
|
14291 |
+
|RuReviewsClassification | Accuracy | 0.589 | 0.583 | 0.570 | 0.607 | 0.612 | 0.630 | **0.653** |
|
14292 |
+
|RuSTSBenchmarkSTS | Pearson correlation | 0.712 | 0.588 | 0.694 | 0.787 | 0.781 | 0.796 | **0.831** |
|
14293 |
+
|RuSciBenchGRNTIClassification | Accuracy | 0.542 | 0.539 | 0.456 | 0.529 | 0.550 | 0.563 | **0.582** |
|
14294 |
+
|RuSciBenchGRNTIClusteringP2P | V-measure | **0.522** | 0.504 | 0.414 | 0.481 | 0.511 | 0.516 | 0.520 |
|
14295 |
+
|RuSciBenchOECDClassification | Accuracy | 0.438 | 0.430 | 0.355 | 0.415 | 0.427 | 0.423 | **0.445** |
|
14296 |
+
|RuSciBenchOECDClusteringP2P | V-measure | **0.473** | 0.464 | 0.381 | 0.411 | 0.443 | 0.448 | 0.450 |
|
14297 |
+
|SensitiveTopicsClassification | Accuracy | **0.285** | 0.280 | 0.220 | 0.244 | 0.228 | 0.234 | 0.257 |
|
14298 |
+
|TERRaClassification | Average Precision | 0.520 | 0.502 | 0.519 | 0.563 | 0.551 | 0.550 | **0.584** |
|
14299 |
+
|
14300 |
+
|Model Name | Metric | sbert_large_mt_nlu_ru | sbert_large_nlu_ru | rubert-tiny2 | rubert-tiny-turbo | multilingual-e5-small | multilingual-e5-base | multilingual-e5-large |
|
14301 |
+
|:----------------------------------|:--------------------|-----------------------:|--------------------:|----------------:|------------------:|----------------------:|----------------------:|---------------------:|
|
14302 |
+
|Classification | Accuracy | 0,554 | 0,552 | 0,514 | 0,535 | 0,551 | 0,561 | **0,588** |
|
14303 |
+
|Clustering | V-measure | **0,526** | 0,519 | 0,412 | 0,496 | 0,513 | 0,503 | 0,525 |
|
14304 |
+
|MultiLabelClassification | Accuracy | 0,326 | 0,319 | 0,294 | 0,317 | 0,314 | 0,329 | **0,353** |
|
14305 |
+
|PairClassification | Average Precision | 0,520 | 0,502 | 0,519 | 0,563 | 0,551 | 0,550 | **0,584** |
|
14306 |
+
|Reranking | MAP@10 | 0,561 | 0,468 | 0,461 | 0,622 | 0,715 | 0,720 | **0,756** |
|
14307 |
+
|Retrieval | NDCG@10 | 0,256 | 0,118 | 0,124 | 0,515 | 0,697 | 0,699 | **0,774** |
|
14308 |
+
|STS | Pearson correlation | 0,712 | 0,588 | 0,694 | 0,787 | 0,781 | 0,796 | **0,831** |
|
14309 |
+
|Average | Average | 0,494 | 0,438 | 0,431 | 0,548 | 0,588 | 0,594 | **0,630** |
|
14310 |
+
|
14311 |
|
14312 |
|