pavanmantha commited on
Commit
76f00c3
·
verified ·
1 Parent(s): bcffb05

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,729 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ - generated_from_trainer
11
+ - dataset_size:6300
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ base_model: BAAI/bge-base-en-v1.5
15
+ datasets: []
16
+ metrics:
17
+ - cosine_accuracy@1
18
+ - cosine_accuracy@3
19
+ - cosine_accuracy@5
20
+ - cosine_accuracy@10
21
+ - cosine_precision@1
22
+ - cosine_precision@3
23
+ - cosine_precision@5
24
+ - cosine_precision@10
25
+ - cosine_recall@1
26
+ - cosine_recall@3
27
+ - cosine_recall@5
28
+ - cosine_recall@10
29
+ - cosine_ndcg@10
30
+ - cosine_mrr@10
31
+ - cosine_map@100
32
+ widget:
33
+ - source_sentence: In 2023, total government-based programs, including Medicare, Medicaid,
34
+ and other government-based programs, contributed 67% to the U.S. dialysis patient
35
+ service revenues.
36
+ sentences:
37
+ - How does Iron Mountain's reported EPS fully diluted from net income in 2023 compare
38
+ to 2022?
39
+ - What was the total percentage of U.S. dialysis patient service revenues coming
40
+ from government-based programs in 2023?
41
+ - What year did the company introduce multiplex theatres?
42
+ - source_sentence: The gross realized losses on sales of AFS debt associated for 2023
43
+ amounted to $514 million, indicating a negative financial outcome from these transactions
44
+ during the year.
45
+ sentences:
46
+ - What were the gross realized losses on sales of AFS debt securities in 2023?
47
+ - How is information about legal proceedings described in the Annual Report on Form
48
+ 10-K?
49
+ - What sections are included alongside the Financial Statements in this report?
50
+ - source_sentence: Other income, net, changed favorably by $215 million in the year
51
+ ended December 31, 2023 as compared to the year ended December 31, 2022. The favorable
52
+ change was primarily due to fluctuations in foreign currency exchange rates on
53
+ our intercompany balances.
54
+ sentences:
55
+ - What was the monetary change in other income (expense), net, from 2022 to 2023?
56
+ - What strategic actions has Walmart International taken over the last three years?
57
+ - What is described under Item 8 in the context of a financial document?
58
+ - source_sentence: Segments The Company manages its business primarily on a geographic
59
+ basis. The Company’s reportable segments consist of the Americas, Europe, Greater
60
+ China, Japan and Rest of Asia Pacific.
61
+ sentences:
62
+ - What is the total debt repayment obligation mentioned in the financial outline?
63
+ - What segments does the Company manage its business on?
64
+ - What is the title of Item 8 which contains page information in a financial document?
65
+ - source_sentence: Item 8 typically refers to Financial Statements and Supplementary
66
+ Data in a document.
67
+ sentences:
68
+ - What is the primary function of Etsy's online marketplaces?
69
+ - What are the maximum leverage ratios specified under the Senior Credit Facilities
70
+ for the periods ending fourth quarter of 2023 and first quarter of 2024?
71
+ - What does Item 8 in a document usually represent?
72
+ pipeline_tag: sentence-similarity
73
+ model-index:
74
+ - name: BGE base Financial Matryoshka
75
+ results:
76
+ - task:
77
+ type: information-retrieval
78
+ name: Information Retrieval
79
+ dataset:
80
+ name: dim 768
81
+ type: dim_768
82
+ metrics:
83
+ - type: cosine_accuracy@1
84
+ value: 0.7057142857142857
85
+ name: Cosine Accuracy@1
86
+ - type: cosine_accuracy@3
87
+ value: 0.8371428571428572
88
+ name: Cosine Accuracy@3
89
+ - type: cosine_accuracy@5
90
+ value: 0.8742857142857143
91
+ name: Cosine Accuracy@5
92
+ - type: cosine_accuracy@10
93
+ value: 0.9128571428571428
94
+ name: Cosine Accuracy@10
95
+ - type: cosine_precision@1
96
+ value: 0.7057142857142857
97
+ name: Cosine Precision@1
98
+ - type: cosine_precision@3
99
+ value: 0.27904761904761904
100
+ name: Cosine Precision@3
101
+ - type: cosine_precision@5
102
+ value: 0.17485714285714282
103
+ name: Cosine Precision@5
104
+ - type: cosine_precision@10
105
+ value: 0.09128571428571428
106
+ name: Cosine Precision@10
107
+ - type: cosine_recall@1
108
+ value: 0.7057142857142857
109
+ name: Cosine Recall@1
110
+ - type: cosine_recall@3
111
+ value: 0.8371428571428572
112
+ name: Cosine Recall@3
113
+ - type: cosine_recall@5
114
+ value: 0.8742857142857143
115
+ name: Cosine Recall@5
116
+ - type: cosine_recall@10
117
+ value: 0.9128571428571428
118
+ name: Cosine Recall@10
119
+ - type: cosine_ndcg@10
120
+ value: 0.8114149232737874
121
+ name: Cosine Ndcg@10
122
+ - type: cosine_mrr@10
123
+ value: 0.7786632653061224
124
+ name: Cosine Mrr@10
125
+ - type: cosine_map@100
126
+ value: 0.7821804400415905
127
+ name: Cosine Map@100
128
+ - task:
129
+ type: information-retrieval
130
+ name: Information Retrieval
131
+ dataset:
132
+ name: dim 512
133
+ type: dim_512
134
+ metrics:
135
+ - type: cosine_accuracy@1
136
+ value: 0.7057142857142857
137
+ name: Cosine Accuracy@1
138
+ - type: cosine_accuracy@3
139
+ value: 0.8328571428571429
140
+ name: Cosine Accuracy@3
141
+ - type: cosine_accuracy@5
142
+ value: 0.8714285714285714
143
+ name: Cosine Accuracy@5
144
+ - type: cosine_accuracy@10
145
+ value: 0.9128571428571428
146
+ name: Cosine Accuracy@10
147
+ - type: cosine_precision@1
148
+ value: 0.7057142857142857
149
+ name: Cosine Precision@1
150
+ - type: cosine_precision@3
151
+ value: 0.2776190476190476
152
+ name: Cosine Precision@3
153
+ - type: cosine_precision@5
154
+ value: 0.17428571428571427
155
+ name: Cosine Precision@5
156
+ - type: cosine_precision@10
157
+ value: 0.09128571428571428
158
+ name: Cosine Precision@10
159
+ - type: cosine_recall@1
160
+ value: 0.7057142857142857
161
+ name: Cosine Recall@1
162
+ - type: cosine_recall@3
163
+ value: 0.8328571428571429
164
+ name: Cosine Recall@3
165
+ - type: cosine_recall@5
166
+ value: 0.8714285714285714
167
+ name: Cosine Recall@5
168
+ - type: cosine_recall@10
169
+ value: 0.9128571428571428
170
+ name: Cosine Recall@10
171
+ - type: cosine_ndcg@10
172
+ value: 0.8108495475926208
173
+ name: Cosine Ndcg@10
174
+ - type: cosine_mrr@10
175
+ value: 0.7780068027210884
176
+ name: Cosine Mrr@10
177
+ - type: cosine_map@100
178
+ value: 0.7816465534941897
179
+ name: Cosine Map@100
180
+ - task:
181
+ type: information-retrieval
182
+ name: Information Retrieval
183
+ dataset:
184
+ name: dim 256
185
+ type: dim_256
186
+ metrics:
187
+ - type: cosine_accuracy@1
188
+ value: 0.7157142857142857
189
+ name: Cosine Accuracy@1
190
+ - type: cosine_accuracy@3
191
+ value: 0.8342857142857143
192
+ name: Cosine Accuracy@3
193
+ - type: cosine_accuracy@5
194
+ value: 0.87
195
+ name: Cosine Accuracy@5
196
+ - type: cosine_accuracy@10
197
+ value: 0.9057142857142857
198
+ name: Cosine Accuracy@10
199
+ - type: cosine_precision@1
200
+ value: 0.7157142857142857
201
+ name: Cosine Precision@1
202
+ - type: cosine_precision@3
203
+ value: 0.27809523809523806
204
+ name: Cosine Precision@3
205
+ - type: cosine_precision@5
206
+ value: 0.174
207
+ name: Cosine Precision@5
208
+ - type: cosine_precision@10
209
+ value: 0.09057142857142855
210
+ name: Cosine Precision@10
211
+ - type: cosine_recall@1
212
+ value: 0.7157142857142857
213
+ name: Cosine Recall@1
214
+ - type: cosine_recall@3
215
+ value: 0.8342857142857143
216
+ name: Cosine Recall@3
217
+ - type: cosine_recall@5
218
+ value: 0.87
219
+ name: Cosine Recall@5
220
+ - type: cosine_recall@10
221
+ value: 0.9057142857142857
222
+ name: Cosine Recall@10
223
+ - type: cosine_ndcg@10
224
+ value: 0.8123157823677117
225
+ name: Cosine Ndcg@10
226
+ - type: cosine_mrr@10
227
+ value: 0.7823004535147391
228
+ name: Cosine Mrr@10
229
+ - type: cosine_map@100
230
+ value: 0.7862892219643212
231
+ name: Cosine Map@100
232
+ - task:
233
+ type: information-retrieval
234
+ name: Information Retrieval
235
+ dataset:
236
+ name: dim 128
237
+ type: dim_128
238
+ metrics:
239
+ - type: cosine_accuracy@1
240
+ value: 0.6928571428571428
241
+ name: Cosine Accuracy@1
242
+ - type: cosine_accuracy@3
243
+ value: 0.8171428571428572
244
+ name: Cosine Accuracy@3
245
+ - type: cosine_accuracy@5
246
+ value: 0.8614285714285714
247
+ name: Cosine Accuracy@5
248
+ - type: cosine_accuracy@10
249
+ value: 0.9028571428571428
250
+ name: Cosine Accuracy@10
251
+ - type: cosine_precision@1
252
+ value: 0.6928571428571428
253
+ name: Cosine Precision@1
254
+ - type: cosine_precision@3
255
+ value: 0.2723809523809524
256
+ name: Cosine Precision@3
257
+ - type: cosine_precision@5
258
+ value: 0.17228571428571426
259
+ name: Cosine Precision@5
260
+ - type: cosine_precision@10
261
+ value: 0.09028571428571427
262
+ name: Cosine Precision@10
263
+ - type: cosine_recall@1
264
+ value: 0.6928571428571428
265
+ name: Cosine Recall@1
266
+ - type: cosine_recall@3
267
+ value: 0.8171428571428572
268
+ name: Cosine Recall@3
269
+ - type: cosine_recall@5
270
+ value: 0.8614285714285714
271
+ name: Cosine Recall@5
272
+ - type: cosine_recall@10
273
+ value: 0.9028571428571428
274
+ name: Cosine Recall@10
275
+ - type: cosine_ndcg@10
276
+ value: 0.7975011441256048
277
+ name: Cosine Ndcg@10
278
+ - type: cosine_mrr@10
279
+ value: 0.7638248299319729
280
+ name: Cosine Mrr@10
281
+ - type: cosine_map@100
282
+ value: 0.7673061455577762
283
+ name: Cosine Map@100
284
+ ---
285
+
286
+ # BGE base Financial Matryoshka
287
+
288
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
289
+
290
+ ## Model Details
291
+
292
+ ### Model Description
293
+ - **Model Type:** Sentence Transformer
294
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
295
+ - **Maximum Sequence Length:** 512 tokens
296
+ - **Output Dimensionality:** 768 tokens
297
+ - **Similarity Function:** Cosine Similarity
298
+ <!-- - **Training Dataset:** Unknown -->
299
+ - **Language:** en
300
+ - **License:** apache-2.0
301
+
302
+ ### Model Sources
303
+
304
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
305
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
306
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
307
+
308
+ ### Full Model Architecture
309
+
310
+ ```
311
+ SentenceTransformer(
312
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
313
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
314
+ (2): Normalize()
315
+ )
316
+ ```
317
+
318
+ ## Usage
319
+
320
+ ### Direct Usage (Sentence Transformers)
321
+
322
+ First install the Sentence Transformers library:
323
+
324
+ ```bash
325
+ pip install -U sentence-transformers
326
+ ```
327
+
328
+ Then you can load this model and run inference.
329
+ ```python
330
+ from sentence_transformers import SentenceTransformer
331
+
332
+ # Download from the 🤗 Hub
333
+ model = SentenceTransformer("pavanmantha/bge-base-en-honsec10k-embed")
334
+ # Run inference
335
+ sentences = [
336
+ 'Item 8 typically refers to Financial Statements and Supplementary Data in a document.',
337
+ 'What does Item 8 in a document usually represent?',
338
+ 'What are the maximum leverage ratios specified under the Senior Credit Facilities for the periods ending fourth quarter of 2023 and first quarter of 2024?',
339
+ ]
340
+ embeddings = model.encode(sentences)
341
+ print(embeddings.shape)
342
+ # [3, 768]
343
+
344
+ # Get the similarity scores for the embeddings
345
+ similarities = model.similarity(embeddings, embeddings)
346
+ print(similarities.shape)
347
+ # [3, 3]
348
+ ```
349
+
350
+ <!--
351
+ ### Direct Usage (Transformers)
352
+
353
+ <details><summary>Click to see the direct usage in Transformers</summary>
354
+
355
+ </details>
356
+ -->
357
+
358
+ <!--
359
+ ### Downstream Usage (Sentence Transformers)
360
+
361
+ You can finetune this model on your own dataset.
362
+
363
+ <details><summary>Click to expand</summary>
364
+
365
+ </details>
366
+ -->
367
+
368
+ <!--
369
+ ### Out-of-Scope Use
370
+
371
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
372
+ -->
373
+
374
+ ## Evaluation
375
+
376
+ ### Metrics
377
+
378
+ #### Information Retrieval
379
+ * Dataset: `dim_768`
380
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
381
+
382
+ | Metric | Value |
383
+ |:--------------------|:-----------|
384
+ | cosine_accuracy@1 | 0.7057 |
385
+ | cosine_accuracy@3 | 0.8371 |
386
+ | cosine_accuracy@5 | 0.8743 |
387
+ | cosine_accuracy@10 | 0.9129 |
388
+ | cosine_precision@1 | 0.7057 |
389
+ | cosine_precision@3 | 0.279 |
390
+ | cosine_precision@5 | 0.1749 |
391
+ | cosine_precision@10 | 0.0913 |
392
+ | cosine_recall@1 | 0.7057 |
393
+ | cosine_recall@3 | 0.8371 |
394
+ | cosine_recall@5 | 0.8743 |
395
+ | cosine_recall@10 | 0.9129 |
396
+ | cosine_ndcg@10 | 0.8114 |
397
+ | cosine_mrr@10 | 0.7787 |
398
+ | **cosine_map@100** | **0.7822** |
399
+
400
+ #### Information Retrieval
401
+ * Dataset: `dim_512`
402
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
403
+
404
+ | Metric | Value |
405
+ |:--------------------|:-----------|
406
+ | cosine_accuracy@1 | 0.7057 |
407
+ | cosine_accuracy@3 | 0.8329 |
408
+ | cosine_accuracy@5 | 0.8714 |
409
+ | cosine_accuracy@10 | 0.9129 |
410
+ | cosine_precision@1 | 0.7057 |
411
+ | cosine_precision@3 | 0.2776 |
412
+ | cosine_precision@5 | 0.1743 |
413
+ | cosine_precision@10 | 0.0913 |
414
+ | cosine_recall@1 | 0.7057 |
415
+ | cosine_recall@3 | 0.8329 |
416
+ | cosine_recall@5 | 0.8714 |
417
+ | cosine_recall@10 | 0.9129 |
418
+ | cosine_ndcg@10 | 0.8108 |
419
+ | cosine_mrr@10 | 0.778 |
420
+ | **cosine_map@100** | **0.7816** |
421
+
422
+ #### Information Retrieval
423
+ * Dataset: `dim_256`
424
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
425
+
426
+ | Metric | Value |
427
+ |:--------------------|:-----------|
428
+ | cosine_accuracy@1 | 0.7157 |
429
+ | cosine_accuracy@3 | 0.8343 |
430
+ | cosine_accuracy@5 | 0.87 |
431
+ | cosine_accuracy@10 | 0.9057 |
432
+ | cosine_precision@1 | 0.7157 |
433
+ | cosine_precision@3 | 0.2781 |
434
+ | cosine_precision@5 | 0.174 |
435
+ | cosine_precision@10 | 0.0906 |
436
+ | cosine_recall@1 | 0.7157 |
437
+ | cosine_recall@3 | 0.8343 |
438
+ | cosine_recall@5 | 0.87 |
439
+ | cosine_recall@10 | 0.9057 |
440
+ | cosine_ndcg@10 | 0.8123 |
441
+ | cosine_mrr@10 | 0.7823 |
442
+ | **cosine_map@100** | **0.7863** |
443
+
444
+ #### Information Retrieval
445
+ * Dataset: `dim_128`
446
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
447
+
448
+ | Metric | Value |
449
+ |:--------------------|:-----------|
450
+ | cosine_accuracy@1 | 0.6929 |
451
+ | cosine_accuracy@3 | 0.8171 |
452
+ | cosine_accuracy@5 | 0.8614 |
453
+ | cosine_accuracy@10 | 0.9029 |
454
+ | cosine_precision@1 | 0.6929 |
455
+ | cosine_precision@3 | 0.2724 |
456
+ | cosine_precision@5 | 0.1723 |
457
+ | cosine_precision@10 | 0.0903 |
458
+ | cosine_recall@1 | 0.6929 |
459
+ | cosine_recall@3 | 0.8171 |
460
+ | cosine_recall@5 | 0.8614 |
461
+ | cosine_recall@10 | 0.9029 |
462
+ | cosine_ndcg@10 | 0.7975 |
463
+ | cosine_mrr@10 | 0.7638 |
464
+ | **cosine_map@100** | **0.7673** |
465
+
466
+ <!--
467
+ ## Bias, Risks and Limitations
468
+
469
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
470
+ -->
471
+
472
+ <!--
473
+ ### Recommendations
474
+
475
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
476
+ -->
477
+
478
+ ## Training Details
479
+
480
+ ### Training Dataset
481
+
482
+ #### Unnamed Dataset
483
+
484
+
485
+ * Size: 6,300 training samples
486
+ * Columns: <code>positive</code> and <code>anchor</code>
487
+ * Approximate statistics based on the first 1000 samples:
488
+ | | positive | anchor |
489
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
490
+ | type | string | string |
491
+ | details | <ul><li>min: 6 tokens</li><li>mean: 44.43 tokens</li><li>max: 248 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.52 tokens</li><li>max: 45 tokens</li></ul> |
492
+ * Samples:
493
+ | positive | anchor |
494
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------|
495
+ | <code>Net deferred tax liabilities | $ | (304) | | | $ | (279) The deferred tax accounts at the end of 2023 and 2022 include deferred income tax assets of $491 and $445, included in other long-term assets; and deferred income tax liabilities of $795 and $724, included in other long-term liabilities.</code> | <code>What are the net deferred tax liabilities for the company at the end of 2023?</code> |
496
+ | <code>ITEM 3. LEGAL PROCEEDINGS Please see the legal proceedings described in Note 21. Commitments and Contingencies included in Item 8 of Part II of this report.</code> | <code>In what part and item of the report is Note 21 located?</code> |
497
+ | <code>During fiscal year 2023, we repurchased 10.4 million shares for approximately $1,295 million.</code> | <code>What total amount was spent on share repurchases during fiscal year 2023?</code> |
498
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
499
+ ```json
500
+ {
501
+ "loss": "MultipleNegativesRankingLoss",
502
+ "matryoshka_dims": [
503
+ 768,
504
+ 512,
505
+ 256,
506
+ 128
507
+ ],
508
+ "matryoshka_weights": [
509
+ 1,
510
+ 1,
511
+ 1,
512
+ 1
513
+ ],
514
+ "n_dims_per_step": -1
515
+ }
516
+ ```
517
+
518
+ ### Training Hyperparameters
519
+ #### Non-Default Hyperparameters
520
+
521
+ - `eval_strategy`: epoch
522
+ - `per_device_train_batch_size`: 32
523
+ - `per_device_eval_batch_size`: 16
524
+ - `gradient_accumulation_steps`: 16
525
+ - `learning_rate`: 2e-05
526
+ - `num_train_epochs`: 4
527
+ - `lr_scheduler_type`: cosine
528
+ - `warmup_ratio`: 0.1
529
+ - `fp16`: True
530
+ - `tf32`: False
531
+ - `load_best_model_at_end`: True
532
+ - `optim`: adamw_torch_fused
533
+ - `batch_sampler`: no_duplicates
534
+
535
+ #### All Hyperparameters
536
+ <details><summary>Click to expand</summary>
537
+
538
+ - `overwrite_output_dir`: False
539
+ - `do_predict`: False
540
+ - `eval_strategy`: epoch
541
+ - `prediction_loss_only`: True
542
+ - `per_device_train_batch_size`: 32
543
+ - `per_device_eval_batch_size`: 16
544
+ - `per_gpu_train_batch_size`: None
545
+ - `per_gpu_eval_batch_size`: None
546
+ - `gradient_accumulation_steps`: 16
547
+ - `eval_accumulation_steps`: None
548
+ - `learning_rate`: 2e-05
549
+ - `weight_decay`: 0.0
550
+ - `adam_beta1`: 0.9
551
+ - `adam_beta2`: 0.999
552
+ - `adam_epsilon`: 1e-08
553
+ - `max_grad_norm`: 1.0
554
+ - `num_train_epochs`: 4
555
+ - `max_steps`: -1
556
+ - `lr_scheduler_type`: cosine
557
+ - `lr_scheduler_kwargs`: {}
558
+ - `warmup_ratio`: 0.1
559
+ - `warmup_steps`: 0
560
+ - `log_level`: passive
561
+ - `log_level_replica`: warning
562
+ - `log_on_each_node`: True
563
+ - `logging_nan_inf_filter`: True
564
+ - `save_safetensors`: True
565
+ - `save_on_each_node`: False
566
+ - `save_only_model`: False
567
+ - `restore_callback_states_from_checkpoint`: False
568
+ - `no_cuda`: False
569
+ - `use_cpu`: False
570
+ - `use_mps_device`: False
571
+ - `seed`: 42
572
+ - `data_seed`: None
573
+ - `jit_mode_eval`: False
574
+ - `use_ipex`: False
575
+ - `bf16`: False
576
+ - `fp16`: True
577
+ - `fp16_opt_level`: O1
578
+ - `half_precision_backend`: auto
579
+ - `bf16_full_eval`: False
580
+ - `fp16_full_eval`: False
581
+ - `tf32`: False
582
+ - `local_rank`: 0
583
+ - `ddp_backend`: None
584
+ - `tpu_num_cores`: None
585
+ - `tpu_metrics_debug`: False
586
+ - `debug`: []
587
+ - `dataloader_drop_last`: False
588
+ - `dataloader_num_workers`: 0
589
+ - `dataloader_prefetch_factor`: None
590
+ - `past_index`: -1
591
+ - `disable_tqdm`: False
592
+ - `remove_unused_columns`: True
593
+ - `label_names`: None
594
+ - `load_best_model_at_end`: True
595
+ - `ignore_data_skip`: False
596
+ - `fsdp`: []
597
+ - `fsdp_min_num_params`: 0
598
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
599
+ - `fsdp_transformer_layer_cls_to_wrap`: None
600
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
601
+ - `deepspeed`: None
602
+ - `label_smoothing_factor`: 0.0
603
+ - `optim`: adamw_torch_fused
604
+ - `optim_args`: None
605
+ - `adafactor`: False
606
+ - `group_by_length`: False
607
+ - `length_column_name`: length
608
+ - `ddp_find_unused_parameters`: None
609
+ - `ddp_bucket_cap_mb`: None
610
+ - `ddp_broadcast_buffers`: False
611
+ - `dataloader_pin_memory`: True
612
+ - `dataloader_persistent_workers`: False
613
+ - `skip_memory_metrics`: True
614
+ - `use_legacy_prediction_loop`: False
615
+ - `push_to_hub`: False
616
+ - `resume_from_checkpoint`: None
617
+ - `hub_model_id`: None
618
+ - `hub_strategy`: every_save
619
+ - `hub_private_repo`: False
620
+ - `hub_always_push`: False
621
+ - `gradient_checkpointing`: False
622
+ - `gradient_checkpointing_kwargs`: None
623
+ - `include_inputs_for_metrics`: False
624
+ - `eval_do_concat_batches`: True
625
+ - `fp16_backend`: auto
626
+ - `push_to_hub_model_id`: None
627
+ - `push_to_hub_organization`: None
628
+ - `mp_parameters`:
629
+ - `auto_find_batch_size`: False
630
+ - `full_determinism`: False
631
+ - `torchdynamo`: None
632
+ - `ray_scope`: last
633
+ - `ddp_timeout`: 1800
634
+ - `torch_compile`: False
635
+ - `torch_compile_backend`: None
636
+ - `torch_compile_mode`: None
637
+ - `dispatch_batches`: None
638
+ - `split_batches`: None
639
+ - `include_tokens_per_second`: False
640
+ - `include_num_input_tokens_seen`: False
641
+ - `neftune_noise_alpha`: None
642
+ - `optim_target_modules`: None
643
+ - `batch_eval_metrics`: False
644
+ - `batch_sampler`: no_duplicates
645
+ - `multi_dataset_batch_sampler`: proportional
646
+
647
+ </details>
648
+
649
+ ### Training Logs
650
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_768_cosine_map@100 |
651
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|
652
+ | 0.8122 | 10 | 1.1537 | - | - | - | - |
653
+ | 0.9746 | 12 | - | 0.7517 | 0.7620 | 0.7633 | 0.7636 |
654
+ | 1.6244 | 20 | 0.4387 | - | - | - | - |
655
+ | 1.9492 | 24 | - | 0.7616 | 0.7802 | 0.7796 | 0.7769 |
656
+ | 2.4365 | 30 | 0.3113 | - | - | - | - |
657
+ | 2.9239 | 36 | - | 0.7668 | 0.7837 | 0.7809 | 0.7821 |
658
+ | 3.2487 | 40 | 0.2554 | - | - | - | - |
659
+ | **3.8985** | **48** | **-** | **0.7673** | **0.7863** | **0.7816** | **0.7822** |
660
+
661
+ * The bold row denotes the saved checkpoint.
662
+
663
+ ### Framework Versions
664
+ - Python: 3.10.13
665
+ - Sentence Transformers: 3.0.1
666
+ - Transformers: 4.41.2
667
+ - PyTorch: 2.1.2
668
+ - Accelerate: 0.31.0
669
+ - Datasets: 2.19.1
670
+ - Tokenizers: 0.19.1
671
+
672
+ ## Citation
673
+
674
+ ### BibTeX
675
+
676
+ #### Sentence Transformers
677
+ ```bibtex
678
+ @inproceedings{reimers-2019-sentence-bert,
679
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
680
+ author = "Reimers, Nils and Gurevych, Iryna",
681
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
682
+ month = "11",
683
+ year = "2019",
684
+ publisher = "Association for Computational Linguistics",
685
+ url = "https://arxiv.org/abs/1908.10084",
686
+ }
687
+ ```
688
+
689
+ #### MatryoshkaLoss
690
+ ```bibtex
691
+ @misc{kusupati2024matryoshka,
692
+ title={Matryoshka Representation Learning},
693
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
694
+ year={2024},
695
+ eprint={2205.13147},
696
+ archivePrefix={arXiv},
697
+ primaryClass={cs.LG}
698
+ }
699
+ ```
700
+
701
+ #### MultipleNegativesRankingLoss
702
+ ```bibtex
703
+ @misc{henderson2017efficient,
704
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
705
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
706
+ year={2017},
707
+ eprint={1705.00652},
708
+ archivePrefix={arXiv},
709
+ primaryClass={cs.CL}
710
+ }
711
+ ```
712
+
713
+ <!--
714
+ ## Glossary
715
+
716
+ *Clearly define terms in order to be accessible across audiences.*
717
+ -->
718
+
719
+ <!--
720
+ ## Model Card Authors
721
+
722
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
723
+ -->
724
+
725
+ <!--
726
+ ## Model Card Contact
727
+
728
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
729
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f999acf756b810dcf976ff67f7c8ba799eccfe9d518e011d32f3e92324d5318
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff