haophancs commited on
Commit
751d2c0
·
verified ·
1 Parent(s): 4bc1cbf

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ - generated_from_trainer
11
+ - dataset_size:6300
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ base_model: BAAI/bge-small-en-v1.5
15
+ datasets: []
16
+ metrics:
17
+ - cosine_accuracy@1
18
+ - cosine_accuracy@3
19
+ - cosine_accuracy@5
20
+ - cosine_accuracy@10
21
+ - cosine_precision@1
22
+ - cosine_precision@3
23
+ - cosine_precision@5
24
+ - cosine_precision@10
25
+ - cosine_recall@1
26
+ - cosine_recall@3
27
+ - cosine_recall@5
28
+ - cosine_recall@10
29
+ - cosine_ndcg@10
30
+ - cosine_mrr@10
31
+ - cosine_map@100
32
+ widget:
33
+ - source_sentence: We offer dual motor powertrain vehicles, which use two electric
34
+ motors to maximize traction and performance in an all-wheel-drive configuration,
35
+ as well as vehicle powertrain technology featuring three electric motors for further
36
+ increased performance in certain versions of Model S and Model X, Cybertruck,
37
+ and the Tesla Semi.
38
+ sentences:
39
+ - What is the purpose of The Home Depot Foundation?
40
+ - What are the features of the company's vehicle powertrain technology?
41
+ - Where can public access the company's SEC filings?
42
+ - source_sentence: The litigation requests a declaration that the IRA violates Janssen’s
43
+ rights under the First Amendment and the Fifth Amendment to the Constitution.
44
+ sentences:
45
+ - What changes occurred in the valuation of equity warrants from 2021 to 2023?
46
+ - What constitutional rights does Janssen claim the Inflation Reduction Act violates?
47
+ - What was the cash paid for amounts included in the measurement of operating lease
48
+ liabilities for the years 2021, 2022, and 2023?
49
+ - source_sentence: After-tax earnings of other energy businesses decreased $332 million
50
+ (24.5%) in 2023 compared to 2022. The decline reflected lower earnings at Northern
51
+ Powergrid due to unfavorable results at a natural gas exploration project, including
52
+ the write-off of capitalized exploration costs and lower gas production volumes
53
+ and prices, as well as from higher deferred income tax expense related to the
54
+ enactment of the Energy Profits Levy income tax in the United Kingdom. The earnings
55
+ decline was also attributable to lower earnings from renewable energy and retail
56
+ services businesses. The decline in renewable energy and retail services earnings
57
+ was primarily due to lower income tax benefits, higher operating expenses, lower
58
+ solar and wind generation at owned projects and the impact of unfavorable changes
59
+ in valuations of derivatives contracts, partially offset by debt extinguishment
60
+ gains.
61
+ sentences:
62
+ - What were the reasons for the decline in after-tax earnings of other energy businesses
63
+ in 2023?
64
+ - What was the main reason for the increase in the company's valuation allowance
65
+ during fiscal 2023?
66
+ - What were the net purchase amounts of treasury shares for the years ended December
67
+ 31, 2022, and 2023?
68
+ - source_sentence: The Phase 3 OAKTREE trial of obeldesivir in non-hospitalized participants
69
+ without risk factors for developing severe COVID-19 did not meet its primary endpoint
70
+ of improvement in time to symptom alleviation. Obeldesivir was well-tolerated
71
+ in this large study population.
72
+ sentences:
73
+ - How did the P&C combined ratios trend from 2021 to 2023?
74
+ - What are some of the digital tools Walmart uses to improve associate productivity,
75
+ engagement, and performance?
76
+ - What was the result of the Phase 3 OAKTREE trial of obeldesivir conducted by Gilead?
77
+ - source_sentence: The issuance of preferred stock could have the effect of restricting
78
+ dividends on the Company’s common stock, diluting the voting power of its common
79
+ stock, impairing the liquidation rights of its common stock, or delaying or preventing
80
+ a change in control.
81
+ sentences:
82
+ - What is the impact of issuing preferred stock according to the Company's description?
83
+ - For how long did Jeffrey P. Bezos serve as President at Amazon?
84
+ - Where in an Annual Report on Form 10-K is 'Note 13 — Commitments and Contingencies
85
+ — Litigation and Other Legal Matters' included?
86
+ pipeline_tag: sentence-similarity
87
+ model-index:
88
+ - name: BGE small Financial Matryoshka
89
+ results:
90
+ - task:
91
+ type: information-retrieval
92
+ name: Information Retrieval
93
+ dataset:
94
+ name: dim 384
95
+ type: dim_384
96
+ metrics:
97
+ - type: cosine_accuracy@1
98
+ value: 0.6642857142857143
99
+ name: Cosine Accuracy@1
100
+ - type: cosine_accuracy@3
101
+ value: 0.8242857142857143
102
+ name: Cosine Accuracy@3
103
+ - type: cosine_accuracy@5
104
+ value: 0.8614285714285714
105
+ name: Cosine Accuracy@5
106
+ - type: cosine_accuracy@10
107
+ value: 0.9085714285714286
108
+ name: Cosine Accuracy@10
109
+ - type: cosine_precision@1
110
+ value: 0.6642857142857143
111
+ name: Cosine Precision@1
112
+ - type: cosine_precision@3
113
+ value: 0.2747619047619047
114
+ name: Cosine Precision@3
115
+ - type: cosine_precision@5
116
+ value: 0.17228571428571426
117
+ name: Cosine Precision@5
118
+ - type: cosine_precision@10
119
+ value: 0.09085714285714284
120
+ name: Cosine Precision@10
121
+ - type: cosine_recall@1
122
+ value: 0.6642857142857143
123
+ name: Cosine Recall@1
124
+ - type: cosine_recall@3
125
+ value: 0.8242857142857143
126
+ name: Cosine Recall@3
127
+ - type: cosine_recall@5
128
+ value: 0.8614285714285714
129
+ name: Cosine Recall@5
130
+ - type: cosine_recall@10
131
+ value: 0.9085714285714286
132
+ name: Cosine Recall@10
133
+ - type: cosine_ndcg@10
134
+ value: 0.7905933695158355
135
+ name: Cosine Ndcg@10
136
+ - type: cosine_mrr@10
137
+ value: 0.7523809523809522
138
+ name: Cosine Mrr@10
139
+ - type: cosine_map@100
140
+ value: 0.7562726267140966
141
+ name: Cosine Map@100
142
+ - task:
143
+ type: information-retrieval
144
+ name: Information Retrieval
145
+ dataset:
146
+ name: dim 256
147
+ type: dim_256
148
+ metrics:
149
+ - type: cosine_accuracy@1
150
+ value: 0.6657142857142857
151
+ name: Cosine Accuracy@1
152
+ - type: cosine_accuracy@3
153
+ value: 0.8242857142857143
154
+ name: Cosine Accuracy@3
155
+ - type: cosine_accuracy@5
156
+ value: 0.8628571428571429
157
+ name: Cosine Accuracy@5
158
+ - type: cosine_accuracy@10
159
+ value: 0.9114285714285715
160
+ name: Cosine Accuracy@10
161
+ - type: cosine_precision@1
162
+ value: 0.6657142857142857
163
+ name: Cosine Precision@1
164
+ - type: cosine_precision@3
165
+ value: 0.2747619047619047
166
+ name: Cosine Precision@3
167
+ - type: cosine_precision@5
168
+ value: 0.17257142857142854
169
+ name: Cosine Precision@5
170
+ - type: cosine_precision@10
171
+ value: 0.09114285714285712
172
+ name: Cosine Precision@10
173
+ - type: cosine_recall@1
174
+ value: 0.6657142857142857
175
+ name: Cosine Recall@1
176
+ - type: cosine_recall@3
177
+ value: 0.8242857142857143
178
+ name: Cosine Recall@3
179
+ - type: cosine_recall@5
180
+ value: 0.8628571428571429
181
+ name: Cosine Recall@5
182
+ - type: cosine_recall@10
183
+ value: 0.9114285714285715
184
+ name: Cosine Recall@10
185
+ - type: cosine_ndcg@10
186
+ value: 0.7919632560554437
187
+ name: Cosine Ndcg@10
188
+ - type: cosine_mrr@10
189
+ value: 0.7534053287981859
190
+ name: Cosine Mrr@10
191
+ - type: cosine_map@100
192
+ value: 0.756861587821826
193
+ name: Cosine Map@100
194
+ - task:
195
+ type: information-retrieval
196
+ name: Information Retrieval
197
+ dataset:
198
+ name: dim 128
199
+ type: dim_128
200
+ metrics:
201
+ - type: cosine_accuracy@1
202
+ value: 0.6528571428571428
203
+ name: Cosine Accuracy@1
204
+ - type: cosine_accuracy@3
205
+ value: 0.8071428571428572
206
+ name: Cosine Accuracy@3
207
+ - type: cosine_accuracy@5
208
+ value: 0.8485714285714285
209
+ name: Cosine Accuracy@5
210
+ - type: cosine_accuracy@10
211
+ value: 0.9
212
+ name: Cosine Accuracy@10
213
+ - type: cosine_precision@1
214
+ value: 0.6528571428571428
215
+ name: Cosine Precision@1
216
+ - type: cosine_precision@3
217
+ value: 0.26904761904761904
218
+ name: Cosine Precision@3
219
+ - type: cosine_precision@5
220
+ value: 0.16971428571428568
221
+ name: Cosine Precision@5
222
+ - type: cosine_precision@10
223
+ value: 0.09
224
+ name: Cosine Precision@10
225
+ - type: cosine_recall@1
226
+ value: 0.6528571428571428
227
+ name: Cosine Recall@1
228
+ - type: cosine_recall@3
229
+ value: 0.8071428571428572
230
+ name: Cosine Recall@3
231
+ - type: cosine_recall@5
232
+ value: 0.8485714285714285
233
+ name: Cosine Recall@5
234
+ - type: cosine_recall@10
235
+ value: 0.9
236
+ name: Cosine Recall@10
237
+ - type: cosine_ndcg@10
238
+ value: 0.778048727585675
239
+ name: Cosine Ndcg@10
240
+ - type: cosine_mrr@10
241
+ value: 0.7388730158730156
242
+ name: Cosine Mrr@10
243
+ - type: cosine_map@100
244
+ value: 0.7424840237912022
245
+ name: Cosine Map@100
246
+ - task:
247
+ type: information-retrieval
248
+ name: Information Retrieval
249
+ dataset:
250
+ name: dim 64
251
+ type: dim_64
252
+ metrics:
253
+ - type: cosine_accuracy@1
254
+ value: 0.6357142857142857
255
+ name: Cosine Accuracy@1
256
+ - type: cosine_accuracy@3
257
+ value: 0.7757142857142857
258
+ name: Cosine Accuracy@3
259
+ - type: cosine_accuracy@5
260
+ value: 0.8128571428571428
261
+ name: Cosine Accuracy@5
262
+ - type: cosine_accuracy@10
263
+ value: 0.8585714285714285
264
+ name: Cosine Accuracy@10
265
+ - type: cosine_precision@1
266
+ value: 0.6357142857142857
267
+ name: Cosine Precision@1
268
+ - type: cosine_precision@3
269
+ value: 0.25857142857142856
270
+ name: Cosine Precision@3
271
+ - type: cosine_precision@5
272
+ value: 0.16257142857142853
273
+ name: Cosine Precision@5
274
+ - type: cosine_precision@10
275
+ value: 0.08585714285714285
276
+ name: Cosine Precision@10
277
+ - type: cosine_recall@1
278
+ value: 0.6357142857142857
279
+ name: Cosine Recall@1
280
+ - type: cosine_recall@3
281
+ value: 0.7757142857142857
282
+ name: Cosine Recall@3
283
+ - type: cosine_recall@5
284
+ value: 0.8128571428571428
285
+ name: Cosine Recall@5
286
+ - type: cosine_recall@10
287
+ value: 0.8585714285714285
288
+ name: Cosine Recall@10
289
+ - type: cosine_ndcg@10
290
+ value: 0.7490553533476035
291
+ name: Cosine Ndcg@10
292
+ - type: cosine_mrr@10
293
+ value: 0.7138038548752832
294
+ name: Cosine Mrr@10
295
+ - type: cosine_map@100
296
+ value: 0.7189504452927022
297
+ name: Cosine Map@100
298
+ ---
299
+
300
+ # BGE small Financial Matryoshka
301
+
302
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
303
+
304
+ ## Model Details
305
+
306
+ ### Model Description
307
+ - **Model Type:** Sentence Transformer
308
+ - **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a -->
309
+ - **Maximum Sequence Length:** 512 tokens
310
+ - **Output Dimensionality:** 384 tokens
311
+ - **Similarity Function:** Cosine Similarity
312
+ <!-- - **Training Dataset:** Unknown -->
313
+ - **Language:** en
314
+ - **License:** apache-2.0
315
+
316
+ ### Model Sources
317
+
318
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
319
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
320
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
321
+
322
+ ### Full Model Architecture
323
+
324
+ ```
325
+ SentenceTransformer(
326
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
327
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
328
+ (2): Normalize()
329
+ )
330
+ ```
331
+
332
+ ## Usage
333
+
334
+ ### Direct Usage (Sentence Transformers)
335
+
336
+ First install the Sentence Transformers library:
337
+
338
+ ```bash
339
+ pip install -U sentence-transformers
340
+ ```
341
+
342
+ Then you can load this model and run inference.
343
+ ```python
344
+ from sentence_transformers import SentenceTransformer
345
+
346
+ # Download from the 🤗 Hub
347
+ model = SentenceTransformer("haophancs/bge-small-financial-matryoshka")
348
+ # Run inference
349
+ sentences = [
350
+ 'The issuance of preferred stock could have the effect of restricting dividends on the Company’s common stock, diluting the voting power of its common stock, impairing the liquidation rights of its common stock, or delaying or preventing a change in control.',
351
+ "What is the impact of issuing preferred stock according to the Company's description?",
352
+ 'For how long did Jeffrey P. Bezos serve as President at Amazon?',
353
+ ]
354
+ embeddings = model.encode(sentences)
355
+ print(embeddings.shape)
356
+ # [3, 384]
357
+
358
+ # Get the similarity scores for the embeddings
359
+ similarities = model.similarity(embeddings, embeddings)
360
+ print(similarities.shape)
361
+ # [3, 3]
362
+ ```
363
+
364
+ <!--
365
+ ### Direct Usage (Transformers)
366
+
367
+ <details><summary>Click to see the direct usage in Transformers</summary>
368
+
369
+ </details>
370
+ -->
371
+
372
+ <!--
373
+ ### Downstream Usage (Sentence Transformers)
374
+
375
+ You can finetune this model on your own dataset.
376
+
377
+ <details><summary>Click to expand</summary>
378
+
379
+ </details>
380
+ -->
381
+
382
+ <!--
383
+ ### Out-of-Scope Use
384
+
385
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
386
+ -->
387
+
388
+ ## Evaluation
389
+
390
+ ### Metrics
391
+
392
+ #### Information Retrieval
393
+ * Dataset: `dim_384`
394
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
395
+
396
+ | Metric | Value |
397
+ |:--------------------|:-----------|
398
+ | cosine_accuracy@1 | 0.6643 |
399
+ | cosine_accuracy@3 | 0.8243 |
400
+ | cosine_accuracy@5 | 0.8614 |
401
+ | cosine_accuracy@10 | 0.9086 |
402
+ | cosine_precision@1 | 0.6643 |
403
+ | cosine_precision@3 | 0.2748 |
404
+ | cosine_precision@5 | 0.1723 |
405
+ | cosine_precision@10 | 0.0909 |
406
+ | cosine_recall@1 | 0.6643 |
407
+ | cosine_recall@3 | 0.8243 |
408
+ | cosine_recall@5 | 0.8614 |
409
+ | cosine_recall@10 | 0.9086 |
410
+ | cosine_ndcg@10 | 0.7906 |
411
+ | cosine_mrr@10 | 0.7524 |
412
+ | **cosine_map@100** | **0.7563** |
413
+
414
+ #### Information Retrieval
415
+ * Dataset: `dim_256`
416
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
417
+
418
+ | Metric | Value |
419
+ |:--------------------|:-----------|
420
+ | cosine_accuracy@1 | 0.6657 |
421
+ | cosine_accuracy@3 | 0.8243 |
422
+ | cosine_accuracy@5 | 0.8629 |
423
+ | cosine_accuracy@10 | 0.9114 |
424
+ | cosine_precision@1 | 0.6657 |
425
+ | cosine_precision@3 | 0.2748 |
426
+ | cosine_precision@5 | 0.1726 |
427
+ | cosine_precision@10 | 0.0911 |
428
+ | cosine_recall@1 | 0.6657 |
429
+ | cosine_recall@3 | 0.8243 |
430
+ | cosine_recall@5 | 0.8629 |
431
+ | cosine_recall@10 | 0.9114 |
432
+ | cosine_ndcg@10 | 0.792 |
433
+ | cosine_mrr@10 | 0.7534 |
434
+ | **cosine_map@100** | **0.7569** |
435
+
436
+ #### Information Retrieval
437
+ * Dataset: `dim_128`
438
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
439
+
440
+ | Metric | Value |
441
+ |:--------------------|:-----------|
442
+ | cosine_accuracy@1 | 0.6529 |
443
+ | cosine_accuracy@3 | 0.8071 |
444
+ | cosine_accuracy@5 | 0.8486 |
445
+ | cosine_accuracy@10 | 0.9 |
446
+ | cosine_precision@1 | 0.6529 |
447
+ | cosine_precision@3 | 0.269 |
448
+ | cosine_precision@5 | 0.1697 |
449
+ | cosine_precision@10 | 0.09 |
450
+ | cosine_recall@1 | 0.6529 |
451
+ | cosine_recall@3 | 0.8071 |
452
+ | cosine_recall@5 | 0.8486 |
453
+ | cosine_recall@10 | 0.9 |
454
+ | cosine_ndcg@10 | 0.778 |
455
+ | cosine_mrr@10 | 0.7389 |
456
+ | **cosine_map@100** | **0.7425** |
457
+
458
+ #### Information Retrieval
459
+ * Dataset: `dim_64`
460
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
461
+
462
+ | Metric | Value |
463
+ |:--------------------|:----------|
464
+ | cosine_accuracy@1 | 0.6357 |
465
+ | cosine_accuracy@3 | 0.7757 |
466
+ | cosine_accuracy@5 | 0.8129 |
467
+ | cosine_accuracy@10 | 0.8586 |
468
+ | cosine_precision@1 | 0.6357 |
469
+ | cosine_precision@3 | 0.2586 |
470
+ | cosine_precision@5 | 0.1626 |
471
+ | cosine_precision@10 | 0.0859 |
472
+ | cosine_recall@1 | 0.6357 |
473
+ | cosine_recall@3 | 0.7757 |
474
+ | cosine_recall@5 | 0.8129 |
475
+ | cosine_recall@10 | 0.8586 |
476
+ | cosine_ndcg@10 | 0.7491 |
477
+ | cosine_mrr@10 | 0.7138 |
478
+ | **cosine_map@100** | **0.719** |
479
+
480
+ <!--
481
+ ## Bias, Risks and Limitations
482
+
483
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
484
+ -->
485
+
486
+ <!--
487
+ ### Recommendations
488
+
489
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
490
+ -->
491
+
492
+ ## Training Details
493
+
494
+ ### Training Dataset
495
+
496
+ #### Unnamed Dataset
497
+
498
+
499
+ * Size: 6,300 training samples
500
+ * Columns: <code>positive</code> and <code>anchor</code>
501
+ * Approximate statistics based on the first 1000 samples:
502
+ | | positive | anchor |
503
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
504
+ | type | string | string |
505
+ | details | <ul><li>min: 9 tokens</li><li>mean: 45.74 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 20.77 tokens</li><li>max: 43 tokens</li></ul> |
506
+ * Samples:
507
+ | positive | anchor |
508
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------|
509
+ | <code>The company believes that trademarks have significant value for marketing products, e-commerce, stores, and business, with the possibility of indefinite renewal as long as the trademarks are in use.</code> | <code>What are the benefits of registering trademarks for the company's business?</code> |
510
+ | <code>The consolidated financial statements and accompanying notes listed in Part IV, Item 15(a)(1) of this Annual Report on Form 10-K are included immediately following Part IV hereof and incorporated by reference herein.</code> | <code>How are the consolidated financial statements and accompanying notes incorporated into the Annual Report on Form 10-K?</code> |
511
+ | <code>During the year ended December 31, 2023, the Company repurchased and subsequently retired 2,029,894 shares of common stock from the open market at an average cost of $103.45 per share for a total of $210.0 million.</code> | <code>How many shares of common stock did the Company repurchase and subsequently retire during the year ended December 31, 2023?</code> |
512
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
513
+ ```json
514
+ {
515
+ "loss": "MultipleNegativesRankingLoss",
516
+ "matryoshka_dims": [
517
+ 384,
518
+ 256,
519
+ 128,
520
+ 64
521
+ ],
522
+ "matryoshka_weights": [
523
+ 1,
524
+ 1,
525
+ 1,
526
+ 1
527
+ ],
528
+ "n_dims_per_step": -1
529
+ }
530
+ ```
531
+
532
+ ### Training Hyperparameters
533
+ #### Non-Default Hyperparameters
534
+
535
+ - `eval_strategy`: epoch
536
+ - `per_device_train_batch_size`: 32
537
+ - `per_device_eval_batch_size`: 16
538
+ - `gradient_accumulation_steps`: 16
539
+ - `learning_rate`: 2e-05
540
+ - `num_train_epochs`: 4
541
+ - `lr_scheduler_type`: cosine
542
+ - `warmup_ratio`: 0.1
543
+ - `bf16`: True
544
+ - `tf32`: True
545
+ - `load_best_model_at_end`: True
546
+ - `optim`: adamw_torch_fused
547
+ - `batch_sampler`: no_duplicates
548
+
549
+ #### All Hyperparameters
550
+ <details><summary>Click to expand</summary>
551
+
552
+ - `overwrite_output_dir`: False
553
+ - `do_predict`: False
554
+ - `eval_strategy`: epoch
555
+ - `prediction_loss_only`: True
556
+ - `per_device_train_batch_size`: 32
557
+ - `per_device_eval_batch_size`: 16
558
+ - `per_gpu_train_batch_size`: None
559
+ - `per_gpu_eval_batch_size`: None
560
+ - `gradient_accumulation_steps`: 16
561
+ - `eval_accumulation_steps`: None
562
+ - `learning_rate`: 2e-05
563
+ - `weight_decay`: 0.0
564
+ - `adam_beta1`: 0.9
565
+ - `adam_beta2`: 0.999
566
+ - `adam_epsilon`: 1e-08
567
+ - `max_grad_norm`: 1.0
568
+ - `num_train_epochs`: 4
569
+ - `max_steps`: -1
570
+ - `lr_scheduler_type`: cosine
571
+ - `lr_scheduler_kwargs`: {}
572
+ - `warmup_ratio`: 0.1
573
+ - `warmup_steps`: 0
574
+ - `log_level`: passive
575
+ - `log_level_replica`: warning
576
+ - `log_on_each_node`: True
577
+ - `logging_nan_inf_filter`: True
578
+ - `save_safetensors`: True
579
+ - `save_on_each_node`: False
580
+ - `save_only_model`: False
581
+ - `restore_callback_states_from_checkpoint`: False
582
+ - `no_cuda`: False
583
+ - `use_cpu`: False
584
+ - `use_mps_device`: False
585
+ - `seed`: 42
586
+ - `data_seed`: None
587
+ - `jit_mode_eval`: False
588
+ - `use_ipex`: False
589
+ - `bf16`: True
590
+ - `fp16`: False
591
+ - `fp16_opt_level`: O1
592
+ - `half_precision_backend`: auto
593
+ - `bf16_full_eval`: False
594
+ - `fp16_full_eval`: False
595
+ - `tf32`: True
596
+ - `local_rank`: 0
597
+ - `ddp_backend`: None
598
+ - `tpu_num_cores`: None
599
+ - `tpu_metrics_debug`: False
600
+ - `debug`: []
601
+ - `dataloader_drop_last`: False
602
+ - `dataloader_num_workers`: 0
603
+ - `dataloader_prefetch_factor`: None
604
+ - `past_index`: -1
605
+ - `disable_tqdm`: False
606
+ - `remove_unused_columns`: True
607
+ - `label_names`: None
608
+ - `load_best_model_at_end`: True
609
+ - `ignore_data_skip`: False
610
+ - `fsdp`: []
611
+ - `fsdp_min_num_params`: 0
612
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
613
+ - `fsdp_transformer_layer_cls_to_wrap`: None
614
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
615
+ - `deepspeed`: None
616
+ - `label_smoothing_factor`: 0.0
617
+ - `optim`: adamw_torch_fused
618
+ - `optim_args`: None
619
+ - `adafactor`: False
620
+ - `group_by_length`: False
621
+ - `length_column_name`: length
622
+ - `ddp_find_unused_parameters`: None
623
+ - `ddp_bucket_cap_mb`: None
624
+ - `ddp_broadcast_buffers`: False
625
+ - `dataloader_pin_memory`: True
626
+ - `dataloader_persistent_workers`: False
627
+ - `skip_memory_metrics`: True
628
+ - `use_legacy_prediction_loop`: False
629
+ - `push_to_hub`: False
630
+ - `resume_from_checkpoint`: None
631
+ - `hub_model_id`: None
632
+ - `hub_strategy`: every_save
633
+ - `hub_private_repo`: False
634
+ - `hub_always_push`: False
635
+ - `gradient_checkpointing`: False
636
+ - `gradient_checkpointing_kwargs`: None
637
+ - `include_inputs_for_metrics`: False
638
+ - `eval_do_concat_batches`: True
639
+ - `fp16_backend`: auto
640
+ - `push_to_hub_model_id`: None
641
+ - `push_to_hub_organization`: None
642
+ - `mp_parameters`:
643
+ - `auto_find_batch_size`: False
644
+ - `full_determinism`: False
645
+ - `torchdynamo`: None
646
+ - `ray_scope`: last
647
+ - `ddp_timeout`: 1800
648
+ - `torch_compile`: False
649
+ - `torch_compile_backend`: None
650
+ - `torch_compile_mode`: None
651
+ - `dispatch_batches`: None
652
+ - `split_batches`: None
653
+ - `include_tokens_per_second`: False
654
+ - `include_num_input_tokens_seen`: False
655
+ - `neftune_noise_alpha`: None
656
+ - `optim_target_modules`: None
657
+ - `batch_eval_metrics`: False
658
+ - `batch_sampler`: no_duplicates
659
+ - `multi_dataset_batch_sampler`: proportional
660
+
661
+ </details>
662
+
663
+ ### Training Logs
664
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_384_cosine_map@100 | dim_64_cosine_map@100 |
665
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
666
+ | 0.8122 | 10 | 1.7741 | - | - | - | - |
667
+ | 0.9746 | 12 | - | 0.7042 | 0.7262 | 0.7327 | 0.6639 |
668
+ | 1.6244 | 20 | 0.7817 | - | - | - | - |
669
+ | 1.9492 | 24 | - | 0.7322 | 0.7477 | 0.7498 | 0.7136 |
670
+ | 2.4365 | 30 | 0.5816 | - | - | - | - |
671
+ | 2.9239 | 36 | - | 0.7387 | 0.7563 | 0.7549 | 0.7165 |
672
+ | 3.2487 | 40 | 0.5121 | - | - | - | - |
673
+ | **3.8985** | **48** | **-** | **0.7425** | **0.7569** | **0.7563** | **0.719** |
674
+
675
+ * The bold row denotes the saved checkpoint.
676
+
677
+ ### Framework Versions
678
+ - Python: 3.12.2
679
+ - Sentence Transformers: 3.0.1
680
+ - Transformers: 4.41.2
681
+ - PyTorch: 2.2.0+cu121
682
+ - Accelerate: 0.31.0
683
+ - Datasets: 2.19.1
684
+ - Tokenizers: 0.19.1
685
+
686
+ ## Citation
687
+
688
+ ### BibTeX
689
+
690
+ #### Sentence Transformers
691
+ ```bibtex
692
+ @inproceedings{reimers-2019-sentence-bert,
693
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
694
+ author = "Reimers, Nils and Gurevych, Iryna",
695
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
696
+ month = "11",
697
+ year = "2019",
698
+ publisher = "Association for Computational Linguistics",
699
+ url = "https://arxiv.org/abs/1908.10084",
700
+ }
701
+ ```
702
+
703
+ #### MatryoshkaLoss
704
+ ```bibtex
705
+ @misc{kusupati2024matryoshka,
706
+ title={Matryoshka Representation Learning},
707
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
708
+ year={2024},
709
+ eprint={2205.13147},
710
+ archivePrefix={arXiv},
711
+ primaryClass={cs.LG}
712
+ }
713
+ ```
714
+
715
+ #### MultipleNegativesRankingLoss
716
+ ```bibtex
717
+ @misc{henderson2017efficient,
718
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
719
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
720
+ year={2017},
721
+ eprint={1705.00652},
722
+ archivePrefix={arXiv},
723
+ primaryClass={cs.CL}
724
+ }
725
+ ```
726
+
727
+ <!--
728
+ ## Glossary
729
+
730
+ *Clearly define terms in order to be accessible across audiences.*
731
+ -->
732
+
733
+ <!--
734
+ ## Model Card Authors
735
+
736
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
737
+ -->
738
+
739
+ <!--
740
+ ## Model Card Contact
741
+
742
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
743
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-small-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.41.2",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a50cc7aa72be963ac274dbbb0bdbfdfee352afb0641f8440c5349a8d15ac3efd
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff