armaniii commited on
Commit
43ab933
·
verified ·
1 Parent(s): 5bc9321

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,545 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ pipeline_tag: sentence-similarity
18
+ tags:
19
+ - sentence-transformers
20
+ - sentence-similarity
21
+ - feature-extraction
22
+ - generated_from_trainer
23
+ - dataset_size:17093
24
+ - loss:CosineSimilarityLoss
25
+ widget:
26
+ - source_sentence: In the realm of genetics , it is far better to be safe than sorry
27
+ .
28
+ sentences:
29
+ - Marijuana use harms the brain, and legalization will increase mental health problems.
30
+ - We are god now !
31
+ - Likewise , the proposal that addictive drugs should be legalized , regulated and
32
+ opened to " free market dynamics " is immediately belied by the recognition that
33
+ the drug market for an addict is no longer a free market – it is clear that they
34
+ will pay any price when needing their drug .
35
+ - source_sentence: The worldwide anti-nuclear power movement has provided enormous
36
+ stimulation to the Australian movement , and the decline in nuclear power expansion
37
+ since the late 1970s - due substantially to worldwide citizen opposition - has
38
+ been a great setback for Australian uranium mining interests .
39
+ sentences:
40
+ - Just as the state has the authority ( and duty ) to act justly in allocating scarce
41
+ resources , in meeting minimal needs of its ( deserving ) citizens , in defending
42
+ its citizens from violence and crime , and in not waging unjust wars ; so too
43
+ does it have the authority , flowing from its mission to promote justice and the
44
+ good of its people , to punish the criminal .
45
+ - The long lead times for construction that invalidate nuclear power as a way of
46
+ mitigating climate change was a point recognized in 2009 by the body whose mission
47
+ is to promote the use of nuclear power , the International Atomic Energy Agency
48
+ ( IAEA ) .
49
+ - Gun control laws would reduce the societal costs associated with gun violence.
50
+ - source_sentence: Requiring uniforms enhances school security by permitting identification
51
+ of non-students who try to enter the campus .
52
+ sentences:
53
+ - Many students who are against school uniforms argue that they lose their â € ‹
54
+ self identity when they lose their right to express themselves through fashion
55
+ .
56
+ - If reproductive cloning is perfected , a quadriplegic can also choose to have
57
+ himself cloned , so someone can take his place .
58
+ - A higher minimum wage might also decrease turnover and thus keep training costs
59
+ down , supporters say .
60
+ - source_sentence: Minimum wage has long been a minimum standard of living .
61
+ sentences:
62
+ - A minimum wage job is suppose to be an entry level stepping stone – not a career
63
+ goal .
64
+ - It is argued that just as it would be permissible to " unplug " and thereby cause
65
+ the death of the person who is using one 's kidneys , so it is permissible to
66
+ abort the fetus ( who similarly , it is said , has no right to use one 's body
67
+ 's life-support functions against one 's will ) .
68
+ - Abortion reduces welfare costs to taxpayers .
69
+ - source_sentence: Fanatics of the pro – life argument are sometimes so focused on
70
+ the fetus that they put no value to the mother ’s life and do not even consider
71
+ the viability of the fetus .
72
+ sentences:
73
+ - Life is life , whether it s outside the womb or not .
74
+ - Legalization of marijuana is phasing out black markets and taking money away from
75
+ drug cartels, organized crime, and street gangs.
76
+ - 'Response 2 : A child is not replaceable .'
77
+ model-index:
78
+ - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
79
+ results:
80
+ - task:
81
+ type: semantic-similarity
82
+ name: Semantic Similarity
83
+ dataset:
84
+ name: sts test
85
+ type: sts-test
86
+ metrics:
87
+ - type: pearson_cosine
88
+ value: 0.7294675022492696
89
+ name: Pearson Cosine
90
+ - type: spearman_cosine
91
+ value: 0.7234943835496113
92
+ name: Spearman Cosine
93
+ - type: pearson_manhattan
94
+ value: 0.7104391963353577
95
+ name: Pearson Manhattan
96
+ - type: spearman_manhattan
97
+ value: 0.7118078150763045
98
+ name: Spearman Manhattan
99
+ - type: pearson_euclidean
100
+ value: 0.7212412855224142
101
+ name: Pearson Euclidean
102
+ - type: spearman_euclidean
103
+ value: 0.7234943835496113
104
+ name: Spearman Euclidean
105
+ - type: pearson_dot
106
+ value: 0.7294674862347428
107
+ name: Pearson Dot
108
+ - type: spearman_dot
109
+ value: 0.7234943835496113
110
+ name: Spearman Dot
111
+ - type: pearson_max
112
+ value: 0.7294675022492696
113
+ name: Pearson Max
114
+ - type: spearman_max
115
+ value: 0.7234943835496113
116
+ name: Spearman Max
117
+ - type: pearson_cosine
118
+ value: 0.7146126101962849
119
+ name: Pearson Cosine
120
+ - type: spearman_cosine
121
+ value: 0.6886131469202397
122
+ name: Spearman Cosine
123
+ - type: pearson_manhattan
124
+ value: 0.7069653659670995
125
+ name: Pearson Manhattan
126
+ - type: spearman_manhattan
127
+ value: 0.6837201725651982
128
+ name: Spearman Manhattan
129
+ - type: pearson_euclidean
130
+ value: 0.7115078495768724
131
+ name: Pearson Euclidean
132
+ - type: spearman_euclidean
133
+ value: 0.6886131469202397
134
+ name: Spearman Euclidean
135
+ - type: pearson_dot
136
+ value: 0.7146126206763159
137
+ name: Pearson Dot
138
+ - type: spearman_dot
139
+ value: 0.6886131469202397
140
+ name: Spearman Dot
141
+ - type: pearson_max
142
+ value: 0.7146126206763159
143
+ name: Pearson Max
144
+ - type: spearman_max
145
+ value: 0.6886131469202397
146
+ name: Spearman Max
147
+ ---
148
+
149
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
150
+
151
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
152
+
153
+ ## Model Details
154
+
155
+ ### Model Description
156
+ - **Model Type:** Sentence Transformer
157
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 9a3225965996d404b775526de6dbfe85d3368642 -->
158
+ - **Maximum Sequence Length:** 512 tokens
159
+ - **Output Dimensionality:** 768 tokens
160
+ - **Similarity Function:** Cosine Similarity
161
+ <!-- - **Training Dataset:** Unknown -->
162
+ <!-- - **Language:** Unknown -->
163
+ <!-- - **License:** Unknown -->
164
+
165
+ ### Model Sources
166
+
167
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
168
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
169
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
170
+
171
+ ### Full Model Architecture
172
+
173
+ ```
174
+ SentenceTransformer(
175
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
176
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
177
+ (2): Normalize()
178
+ )
179
+ ```
180
+
181
+ ## Usage
182
+
183
+ ### Direct Usage (Sentence Transformers)
184
+
185
+ First install the Sentence Transformers library:
186
+
187
+ ```bash
188
+ pip install -U sentence-transformers
189
+ ```
190
+
191
+ Then you can load this model and run inference.
192
+ ```python
193
+ from sentence_transformers import SentenceTransformer
194
+
195
+ # Download from the 🤗 Hub
196
+ model = SentenceTransformer("armaniii/all-mpnet-base-v2-augmentation-indomain-bm25-sts")
197
+ # Run inference
198
+ sentences = [
199
+ 'Fanatics of the pro – life argument are sometimes so focused on the fetus that they put no value to the mother ’s life and do not even consider the viability of the fetus .',
200
+ 'Life is life , whether it s outside the womb or not .',
201
+ 'Legalization of marijuana is phasing out black markets and taking money away from drug cartels, organized crime, and street gangs.',
202
+ ]
203
+ embeddings = model.encode(sentences)
204
+ print(embeddings.shape)
205
+ # [3, 768]
206
+
207
+ # Get the similarity scores for the embeddings
208
+ similarities = model.similarity(embeddings, embeddings)
209
+ print(similarities.shape)
210
+ # [3, 3]
211
+ ```
212
+
213
+ <!--
214
+ ### Direct Usage (Transformers)
215
+
216
+ <details><summary>Click to see the direct usage in Transformers</summary>
217
+
218
+ </details>
219
+ -->
220
+
221
+ <!--
222
+ ### Downstream Usage (Sentence Transformers)
223
+
224
+ You can finetune this model on your own dataset.
225
+
226
+ <details><summary>Click to expand</summary>
227
+
228
+ </details>
229
+ -->
230
+
231
+ <!--
232
+ ### Out-of-Scope Use
233
+
234
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
235
+ -->
236
+
237
+ ## Evaluation
238
+
239
+ ### Metrics
240
+
241
+ #### Semantic Similarity
242
+ * Dataset: `sts-test`
243
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
244
+
245
+ | Metric | Value |
246
+ |:--------------------|:-----------|
247
+ | pearson_cosine | 0.7295 |
248
+ | **spearman_cosine** | **0.7235** |
249
+ | pearson_manhattan | 0.7104 |
250
+ | spearman_manhattan | 0.7118 |
251
+ | pearson_euclidean | 0.7212 |
252
+ | spearman_euclidean | 0.7235 |
253
+ | pearson_dot | 0.7295 |
254
+ | spearman_dot | 0.7235 |
255
+ | pearson_max | 0.7295 |
256
+ | spearman_max | 0.7235 |
257
+
258
+ #### Semantic Similarity
259
+ * Dataset: `sts-test`
260
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
261
+
262
+ | Metric | Value |
263
+ |:--------------------|:-----------|
264
+ | pearson_cosine | 0.7146 |
265
+ | **spearman_cosine** | **0.6886** |
266
+ | pearson_manhattan | 0.707 |
267
+ | spearman_manhattan | 0.6837 |
268
+ | pearson_euclidean | 0.7115 |
269
+ | spearman_euclidean | 0.6886 |
270
+ | pearson_dot | 0.7146 |
271
+ | spearman_dot | 0.6886 |
272
+ | pearson_max | 0.7146 |
273
+ | spearman_max | 0.6886 |
274
+
275
+ <!--
276
+ ## Bias, Risks and Limitations
277
+
278
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
279
+ -->
280
+
281
+ <!--
282
+ ### Recommendations
283
+
284
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
285
+ -->
286
+
287
+ ## Training Details
288
+
289
+ ### Training Dataset
290
+
291
+ #### Unnamed Dataset
292
+
293
+
294
+ * Size: 17,093 training samples
295
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
296
+ * Approximate statistics based on the first 1000 samples:
297
+ | | sentence1 | sentence2 | score |
298
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------|
299
+ | type | string | string | float |
300
+ | details | <ul><li>min: 7 tokens</li><li>mean: 33.23 tokens</li><li>max: 97 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 30.75 tokens</li><li>max: 96 tokens</li></ul> | <ul><li>min: 0.09</li><li>mean: 0.55</li><li>max: 0.95</li></ul> |
301
+ * Samples:
302
+ | sentence1 | sentence2 | score |
303
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
304
+ | <code>It is true that a Colorado study found a post-legalization increase in youths being treated for marijuana exposure .</code> | <code>In Colorado , recent figures correlate with the years since marijuana legalization to show a dramatic decrease in overall highway fatalities – and a two-fold increase in the frequency of marijuana-positive drivers in fatal auto crashes .</code> | <code>0.4642857142857143</code> |
305
+ | <code>The idea of a school uniform is that students wear the uniform at school , but do not wear the uniform , say , at a disco or other events outside school .</code> | <code>If it means that the schoolrooms will be more orderly , more disciplined , and that our young people will learn to evaluate themselves by what they are on the inside instead of what they 're wearing on the outside , then our public schools should be able to require their students to wear school uniforms . "</code> | <code>0.5714285714285714</code> |
306
+ | <code>The resulting embryonic stem cells could then theoretically be grown into adult cells to replace the ailing person 's mutated cells .</code> | <code>However , there is a more serious , less cartoonish objection to turning procreation into manufacturing .</code> | <code>0.4464285714285714</code> |
307
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
308
+ ```json
309
+ {
310
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
311
+ }
312
+ ```
313
+
314
+ ### Evaluation Dataset
315
+
316
+ #### Unnamed Dataset
317
+
318
+
319
+ * Size: 340 evaluation samples
320
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
321
+ * Approximate statistics based on the first 1000 samples:
322
+ | | sentence1 | sentence2 | score |
323
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------|
324
+ | type | string | string | float |
325
+ | details | <ul><li>min: 8 tokens</li><li>mean: 33.76 tokens</li><li>max: 105 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 31.86 tokens</li><li>max: 102 tokens</li></ul> | <ul><li>min: 0.09</li><li>mean: 0.5</li><li>max: 0.89</li></ul> |
326
+ * Samples:
327
+ | sentence1 | sentence2 | score |
328
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
329
+ | <code>[ quoting himself from Furman v. Georgia , 408 U.S. 238 , 257 ( 1972 ) ] As such it is a penalty that ' subjects the individual to a fate forbidden by the principle of civilized treatment guaranteed by the [ Clause ] . '</code> | <code>It provides a deterrent for prisoners already serving a life sentence .</code> | <code>0.3214285714285714</code> |
330
+ | <code>Of those savings , $ 25.7 billion would accrue to state and local governments , while $ 15.6 billion would accrue to the federal government .</code> | <code>Jaime Smith , deputy communications director for the governor ’s office , said , “ The legalization initiative was not driven by a desire for a revenue , but it has provided a small assist for our state budget . ”</code> | <code>0.5357142857142857</code> |
331
+ | <code>If the uterus is designed to sustain an unborn child ’s life , do n’t unborn children have a right to receive nutrition and shelter through the one organ designed to provide them with that ordinary care ?</code> | <code>We as parents are supposed to protect our children at all costs whether they are in the womb or not .</code> | <code>0.7678571428571428</code> |
332
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
333
+ ```json
334
+ {
335
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
336
+ }
337
+ ```
338
+
339
+ ### Training Hyperparameters
340
+ #### Non-Default Hyperparameters
341
+
342
+ - `eval_strategy`: steps
343
+ - `per_device_train_batch_size`: 16
344
+ - `per_device_eval_batch_size`: 16
345
+ - `warmup_ratio`: 0.1
346
+ - `bf16`: True
347
+
348
+ #### All Hyperparameters
349
+ <details><summary>Click to expand</summary>
350
+
351
+ - `overwrite_output_dir`: False
352
+ - `do_predict`: False
353
+ - `eval_strategy`: steps
354
+ - `prediction_loss_only`: True
355
+ - `per_device_train_batch_size`: 16
356
+ - `per_device_eval_batch_size`: 16
357
+ - `per_gpu_train_batch_size`: None
358
+ - `per_gpu_eval_batch_size`: None
359
+ - `gradient_accumulation_steps`: 1
360
+ - `eval_accumulation_steps`: None
361
+ - `torch_empty_cache_steps`: None
362
+ - `learning_rate`: 5e-05
363
+ - `weight_decay`: 0.0
364
+ - `adam_beta1`: 0.9
365
+ - `adam_beta2`: 0.999
366
+ - `adam_epsilon`: 1e-08
367
+ - `max_grad_norm`: 1.0
368
+ - `num_train_epochs`: 3
369
+ - `max_steps`: -1
370
+ - `lr_scheduler_type`: linear
371
+ - `lr_scheduler_kwargs`: {}
372
+ - `warmup_ratio`: 0.1
373
+ - `warmup_steps`: 0
374
+ - `log_level`: passive
375
+ - `log_level_replica`: warning
376
+ - `log_on_each_node`: True
377
+ - `logging_nan_inf_filter`: True
378
+ - `save_safetensors`: True
379
+ - `save_on_each_node`: False
380
+ - `save_only_model`: False
381
+ - `restore_callback_states_from_checkpoint`: False
382
+ - `no_cuda`: False
383
+ - `use_cpu`: False
384
+ - `use_mps_device`: False
385
+ - `seed`: 42
386
+ - `data_seed`: None
387
+ - `jit_mode_eval`: False
388
+ - `use_ipex`: False
389
+ - `bf16`: True
390
+ - `fp16`: False
391
+ - `fp16_opt_level`: O1
392
+ - `half_precision_backend`: auto
393
+ - `bf16_full_eval`: False
394
+ - `fp16_full_eval`: False
395
+ - `tf32`: None
396
+ - `local_rank`: 0
397
+ - `ddp_backend`: None
398
+ - `tpu_num_cores`: None
399
+ - `tpu_metrics_debug`: False
400
+ - `debug`: []
401
+ - `dataloader_drop_last`: False
402
+ - `dataloader_num_workers`: 0
403
+ - `dataloader_prefetch_factor`: None
404
+ - `past_index`: -1
405
+ - `disable_tqdm`: False
406
+ - `remove_unused_columns`: True
407
+ - `label_names`: None
408
+ - `load_best_model_at_end`: False
409
+ - `ignore_data_skip`: False
410
+ - `fsdp`: []
411
+ - `fsdp_min_num_params`: 0
412
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
413
+ - `fsdp_transformer_layer_cls_to_wrap`: None
414
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
415
+ - `deepspeed`: None
416
+ - `label_smoothing_factor`: 0.0
417
+ - `optim`: adamw_torch
418
+ - `optim_args`: None
419
+ - `adafactor`: False
420
+ - `group_by_length`: False
421
+ - `length_column_name`: length
422
+ - `ddp_find_unused_parameters`: None
423
+ - `ddp_bucket_cap_mb`: None
424
+ - `ddp_broadcast_buffers`: False
425
+ - `dataloader_pin_memory`: True
426
+ - `dataloader_persistent_workers`: False
427
+ - `skip_memory_metrics`: True
428
+ - `use_legacy_prediction_loop`: False
429
+ - `push_to_hub`: False
430
+ - `resume_from_checkpoint`: None
431
+ - `hub_model_id`: None
432
+ - `hub_strategy`: every_save
433
+ - `hub_private_repo`: False
434
+ - `hub_always_push`: False
435
+ - `gradient_checkpointing`: False
436
+ - `gradient_checkpointing_kwargs`: None
437
+ - `include_inputs_for_metrics`: False
438
+ - `eval_do_concat_batches`: True
439
+ - `fp16_backend`: auto
440
+ - `push_to_hub_model_id`: None
441
+ - `push_to_hub_organization`: None
442
+ - `mp_parameters`:
443
+ - `auto_find_batch_size`: False
444
+ - `full_determinism`: False
445
+ - `torchdynamo`: None
446
+ - `ray_scope`: last
447
+ - `ddp_timeout`: 1800
448
+ - `torch_compile`: False
449
+ - `torch_compile_backend`: None
450
+ - `torch_compile_mode`: None
451
+ - `dispatch_batches`: None
452
+ - `split_batches`: None
453
+ - `include_tokens_per_second`: False
454
+ - `include_num_input_tokens_seen`: False
455
+ - `neftune_noise_alpha`: None
456
+ - `optim_target_modules`: None
457
+ - `batch_eval_metrics`: False
458
+ - `eval_on_start`: False
459
+ - `eval_use_gather_object`: False
460
+ - `batch_sampler`: batch_sampler
461
+ - `multi_dataset_batch_sampler`: proportional
462
+
463
+ </details>
464
+
465
+ ### Training Logs
466
+ | Epoch | Step | Training Loss | loss | sts-test_spearman_cosine |
467
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
468
+ | 0.0935 | 100 | 0.0151 | 0.0098 | 0.7013 |
469
+ | 0.1871 | 200 | 0.0069 | 0.0112 | 0.6857 |
470
+ | 0.2806 | 300 | 0.0058 | 0.0106 | 0.6860 |
471
+ | 0.3742 | 400 | 0.0059 | 0.0102 | 0.6915 |
472
+ | 0.4677 | 500 | 0.0057 | 0.0097 | 0.6903 |
473
+ | 0.5613 | 600 | 0.0049 | 0.0100 | 0.6797 |
474
+ | 0.6548 | 700 | 0.0055 | 0.0101 | 0.6766 |
475
+ | 0.7484 | 800 | 0.0049 | 0.0116 | 0.6529 |
476
+ | 0.8419 | 900 | 0.0049 | 0.0105 | 0.6572 |
477
+ | 0.9355 | 1000 | 0.0051 | 0.0115 | 0.6842 |
478
+ | 1.0290 | 1100 | 0.0038 | 0.0094 | 0.7000 |
479
+ | 1.1225 | 1200 | 0.0029 | 0.0091 | 0.7027 |
480
+ | 1.2161 | 1300 | 0.0026 | 0.0093 | 0.7016 |
481
+ | 1.3096 | 1400 | 0.0027 | 0.0088 | 0.7192 |
482
+ | 1.4032 | 1500 | 0.0027 | 0.0097 | 0.7065 |
483
+ | 1.4967 | 1600 | 0.0028 | 0.0091 | 0.7011 |
484
+ | 1.5903 | 1700 | 0.0027 | 0.0095 | 0.7186 |
485
+ | 1.6838 | 1800 | 0.0026 | 0.0087 | 0.7277 |
486
+ | 1.7774 | 1900 | 0.0024 | 0.0085 | 0.7227 |
487
+ | 1.8709 | 2000 | 0.0025 | 0.0086 | 0.7179 |
488
+ | 1.9645 | 2100 | 0.0022 | 0.0086 | 0.7195 |
489
+ | 2.0580 | 2200 | 0.0017 | 0.0088 | 0.7183 |
490
+ | 2.1515 | 2300 | 0.0014 | 0.0088 | 0.7229 |
491
+ | 2.2451 | 2400 | 0.0014 | 0.0086 | 0.7200 |
492
+ | 2.3386 | 2500 | 0.0013 | 0.0088 | 0.7248 |
493
+ | 2.4322 | 2600 | 0.0014 | 0.0085 | 0.7286 |
494
+ | 2.5257 | 2700 | 0.0015 | 0.0085 | 0.7283 |
495
+ | 2.6193 | 2800 | 0.0014 | 0.0085 | 0.7263 |
496
+ | 2.7128 | 2900 | 0.0014 | 0.0085 | 0.7248 |
497
+ | 2.8064 | 3000 | 0.0013 | 0.0087 | 0.7191 |
498
+ | 2.8999 | 3100 | 0.0011 | 0.0086 | 0.7225 |
499
+ | 2.9935 | 3200 | 0.0012 | 0.0085 | 0.7235 |
500
+ | 3.0 | 3207 | - | - | 0.6886 |
501
+
502
+
503
+ ### Framework Versions
504
+ - Python: 3.9.2
505
+ - Sentence Transformers: 3.0.1
506
+ - Transformers: 4.43.1
507
+ - PyTorch: 2.3.1+cu121
508
+ - Accelerate: 0.34.2
509
+ - Datasets: 2.14.7
510
+ - Tokenizers: 0.19.1
511
+
512
+ ## Citation
513
+
514
+ ### BibTeX
515
+
516
+ #### Sentence Transformers
517
+ ```bibtex
518
+ @inproceedings{reimers-2019-sentence-bert,
519
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
520
+ author = "Reimers, Nils and Gurevych, Iryna",
521
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
522
+ month = "11",
523
+ year = "2019",
524
+ publisher = "Association for Computational Linguistics",
525
+ url = "https://arxiv.org/abs/1908.10084",
526
+ }
527
+ ```
528
+
529
+ <!--
530
+ ## Glossary
531
+
532
+ *Clearly define terms in order to be accessible across audiences.*
533
+ -->
534
+
535
+ <!--
536
+ ## Model Card Authors
537
+
538
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
539
+ -->
540
+
541
+ <!--
542
+ ## Model Card Contact
543
+
544
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
545
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.43.1",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.43.1",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff9b9e6201722cd4c5c86607f3b10ac76089f23af114e9b40bb647e866be05ae
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": true,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "sep_token": "</s>",
65
+ "stride": 0,
66
+ "strip_accents": null,
67
+ "tokenize_chinese_chars": true,
68
+ "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
+ "unk_token": "[UNK]"
72
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff