diff --git "a/README.md" "b/README.md"
--- "a/README.md"
+++ "b/README.md"
@@ -7,7 +7,7 @@ tags:
- sentence-similarity
- feature-extraction
- generated_from_trainer
-- dataset_size:689221
+- dataset_size:131566
- loss:MultipleNegativesRankingLoss
- loss:CoSENTLoss
- loss:GISTEmbedLoss
@@ -22,6 +22,14 @@ datasets:
- allenai/scitail
- sentence-transformers/xsum
- sentence-transformers/sentence-compression
+- allenai/sciq
+- allenai/qasc
+- allenai/openbookqa
+- sentence-transformers/msmarco-msmarco-distilbert-base-v3
+- sentence-transformers/natural-questions
+- sentence-transformers/trivia-qa
+- sentence-transformers/quora-duplicates
+- sentence-transformers/gooaq
metrics:
- pearson_cosine
- spearman_cosine
@@ -34,43 +42,34 @@ metrics:
- pearson_max
- spearman_max
widget:
-- source_sentence: What are the exceptions in the constitution that require special
- considerations to amend?
+- source_sentence: Centrosome-independent mitotic spindle formation in vertebrates.
sentences:
- - The river makes a distinctive turn to the north near Chur.
- - The Victorian Constitution can be amended by the Parliament of Victoria, except
- for certain "entrenched" provisions that require either an absolute majority in
- both houses, a three-fifths majority in both houses, or the approval of the Victorian
- people in a referendum, depending on the provision.
- - A new arrangement of the theme, once again by Gold, was introduced in the 2007
- Christmas special episode, "Voyage of the Damned"; Gold returned as composer for
- the 2010 series.
-- source_sentence: What is the name of a Bodhisattva vow?
+ - Birds pair up with the same bird in mating season.
+ - We use voltage to keep track of electric potential energy.
+ - A mitotic spindle forms from the centrosomes.
+- source_sentence: A dog carrying a stick in its mouth runs through a snow-covered
+ field.
sentences:
- - In Tibetan Buddhism the teachers of Dharma in Tibet are most commonly called a
- Lama.
- - This origin of chloroplasts was first suggested by the Russian biologist Konstantin
- Mereschkowski in 1905 after Andreas Schimper observed in 1883 that chloroplasts
- closely resemble cyanobacteria.
- - The announcement came a day after Setanta Sports confirmed that it would launch
- in March as a subscription service on the digital terrestrial platform, and on
- the same day that NTL's services re-branded as Virgin Media.
-- source_sentence: Two dogs run around inside a fence.
+ - The children played on the floor.
+ - A pair of people play video games together on a couch.
+ - A animal carried a stick through a snow covered field.
+- source_sentence: A guy on a skateboard, jumping off some steps.
sentences:
- - A young woman tennis player have many tennis balls.
- - Two dogs are inside a fence.
- - A little girl in red plays tennis.
-- source_sentence: A little boy wearing a blue stiped shirt has a party hat on his
- head and is playing in a puddle.
+ - A woman is making music.
+ - a guy with a skateboard making a jump
+ - A dog holds an object in the water.
+- source_sentence: A photographer with bushy dark hair takes a photo of a skateboarder
+ at an indoor park.
sentences:
- - The party boy is playing in a puddle.
- - There is a crowd
- - Four people are skiing
-- source_sentence: Two wrestlers jump in a ring while an official watches.
+ - The person with the camera photographs the person skating.
+ - A man starring at a piece of paper.
+ - The man is riding a bike in sand.
+- source_sentence: Why did oil start getting priced in terms of gold?
sentences:
- - The man was walking.
- - Two men are dressed in makeup
- - Two wrestlers were just tagged in on a tag team match.
+ - Because oil was priced in dollars, oil producers' real income decreased.
+ - This allows all set top boxes in a household to share recordings and other media.
+ - Only the series from 2009 onwards are available on Blu-ray, except for the 1970
+ story Spearhead from Space, released in July 2013.
pipeline_tag: sentence-similarity
model-index:
- name: SentenceTransformer based on microsoft/deberta-v3-small
@@ -83,40 +82,70 @@ model-index:
type: sts-test
metrics:
- type: pearson_cosine
- value: 0.7827777535990615
+ value: 0.7740200646402275
name: Pearson Cosine
- type: spearman_cosine
- value: 0.7930096932283699
+ value: 0.7726824843726025
name: Spearman Cosine
- type: pearson_manhattan
- value: 0.7959463678643859
+ value: 0.7871287254831608
name: Pearson Manhattan
- type: spearman_manhattan
- value: 0.792182337344966
+ value: 0.7758049644234141
name: Spearman Manhattan
- type: pearson_euclidean
- value: 0.7948115210006163
+ value: 0.7842462717672578
name: Pearson Euclidean
- type: spearman_euclidean
- value: 0.7907409787879929
+ value: 0.7723622369393174
name: Spearman Euclidean
- type: pearson_dot
- value: 0.7150471304135075
+ value: 0.705919446324648
name: Pearson Dot
- type: spearman_dot
- value: 0.6966062484321753
+ value: 0.6867859662226861
name: Spearman Dot
- type: pearson_max
- value: 0.7959463678643859
+ value: 0.7871287254831608
name: Pearson Max
- type: spearman_max
- value: 0.7930096932283699
+ value: 0.7758049644234141
+ name: Spearman Max
+ - type: pearson_cosine
+ value: 0.7740200646402275
+ name: Pearson Cosine
+ - type: spearman_cosine
+ value: 0.7726824843726025
+ name: Spearman Cosine
+ - type: pearson_manhattan
+ value: 0.7871287254831608
+ name: Pearson Manhattan
+ - type: spearman_manhattan
+ value: 0.7758049644234141
+ name: Spearman Manhattan
+ - type: pearson_euclidean
+ value: 0.7842462717672578
+ name: Pearson Euclidean
+ - type: spearman_euclidean
+ value: 0.7723622369393174
+ name: Spearman Euclidean
+ - type: pearson_dot
+ value: 0.705919446324648
+ name: Pearson Dot
+ - type: spearman_dot
+ value: 0.6867859662226861
+ name: Spearman Dot
+ - type: pearson_max
+ value: 0.7871287254831608
+ name: Pearson Max
+ - type: spearman_max
+ value: 0.7758049644234141
name: Spearman Max
---
# SentenceTransformer based on microsoft/deberta-v3-small
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) and [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum), [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sciq_pairs](https://huggingface.co/datasets/allenai/sciq), [qasc_pairs](https://huggingface.co/datasets/allenai/qasc), [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa), [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3), [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions), [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) and [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
@@ -135,6 +164,14 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
- [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail)
- [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum)
- [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression)
+ - [sciq_pairs](https://huggingface.co/datasets/allenai/sciq)
+ - [qasc_pairs](https://huggingface.co/datasets/allenai/qasc)
+ - [openbookqa_pairs](https://huggingface.co/datasets/allenai/openbookqa)
+ - [msmarco_pairs](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)
+ - [nq_pairs](https://huggingface.co/datasets/sentence-transformers/natural-questions)
+ - [trivia_pairs](https://huggingface.co/datasets/sentence-transformers/trivia-qa)
+ - [quora_pairs](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)
+ - [gooaq_pairs](https://huggingface.co/datasets/sentence-transformers/gooaq)
- **Language:** en
@@ -171,9 +208,9 @@ from sentence_transformers import SentenceTransformer
model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer")
# Run inference
sentences = [
- 'Two wrestlers jump in a ring while an official watches.',
- 'Two wrestlers were just tagged in on a tag team match.',
- 'Two men are dressed in makeup',
+ 'Why did oil start getting priced in terms of gold?',
+ "Because oil was priced in dollars, oil producers' real income decreased.",
+ 'This allows all set top boxes in a household to share recordings and other media.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
@@ -217,18 +254,35 @@ You can finetune this model on your own dataset.
* Dataset: `sts-test`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric | Value |
-|:--------------------|:----------|
-| pearson_cosine | 0.7828 |
-| **spearman_cosine** | **0.793** |
-| pearson_manhattan | 0.7959 |
-| spearman_manhattan | 0.7922 |
-| pearson_euclidean | 0.7948 |
-| spearman_euclidean | 0.7907 |
-| pearson_dot | 0.715 |
-| spearman_dot | 0.6966 |
-| pearson_max | 0.7959 |
-| spearman_max | 0.793 |
+| Metric | Value |
+|:--------------------|:-----------|
+| pearson_cosine | 0.774 |
+| **spearman_cosine** | **0.7727** |
+| pearson_manhattan | 0.7871 |
+| spearman_manhattan | 0.7758 |
+| pearson_euclidean | 0.7842 |
+| spearman_euclidean | 0.7724 |
+| pearson_dot | 0.7059 |
+| spearman_dot | 0.6868 |
+| pearson_max | 0.7871 |
+| spearman_max | 0.7758 |
+
+#### Semantic Similarity
+* Dataset: `sts-test`
+* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
+
+| Metric | Value |
+|:--------------------|:-----------|
+| pearson_cosine | 0.774 |
+| **spearman_cosine** | **0.7727** |
+| pearson_manhattan | 0.7871 |
+| spearman_manhattan | 0.7758 |
+| pearson_euclidean | 0.7842 |
+| spearman_euclidean | 0.7724 |
+| pearson_dot | 0.7059 |
+| spearman_dot | 0.6868 |
+| pearson_max | 0.7871 |
+| spearman_max | 0.7758 |