deep learning project 2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bbmb/deep-learning-for-embedding-model-ssilwal-qpham6_army_doc")
# Run inference
sentences = [
    'Offense \n11 January 2024 ATP 3-21.8 4-61\nlight the target, making it easier to acquire effectively. Leaders and Soldiers \nuse the infrared devices to identify enemy or friendly personnel and then \nengage targets using their aiming lights. \n4-172. Illuminating rounds fired to burn on the ground can mark objectives. This helps\nthe platoon orient on the objective but may adversely affect night vision devices.\n4-173. Leaders plan but do not always use illumination during limited visibility\nattacks. Battalion commanders normally control conventional illumination but ma y\na\nuthorize the company commander to do so. If the commander decides to use\nconventional illumination , the commander should not call for it until the assault is\ninitiated or the attack is detected. It should be placed on several locations over a wide\narea to confuse the enemy as to the exact place of the attack. It should be placed beyond\nthe objective to help assaulting Soldiers see and fire at withdrawing or counterattacking\nenemy Soldiers. Infrared illumination is a good capability to light the objective without\nlighting it for enemy forces without night vision devices.  This advantage is degraded\nwhen used against a peer threat with the same night vision capabilities.\n4-174. The platoon leader , squad leaders , and vehicle commanders must know unit\ntactical SOP and develop sound COAs to synchronize the employment of infrared\nillumination devices , target designators , and aiming lights during their assault on the\nobjective. These include using luminous tape or chemical lights to mark personnel and\nusing weapons control restrictions.\n4-175. The platoon leader may use the following techniques to increase control during\nthe assault:\n\uf06c Use no flares, grenades, or obscuration on the objective.\n\uf06c Use mortar or artillery rounds to orient attacking units.\n\uf06c Use a base squad or fire team to pace and guide others.\n\uf06c Reduce intervals between Soldiers and squads.\n4-176. Like a daylight attack , indirect and direct fires are planned for a limited\nvisibility attack but are not executed unless the platoon is detected or is ready to assault.\nSome weapons may fire before the attack and maintain a pattern to deceive the enemy\nor to help cover noise ma de by the platoon ’s movement. This is not done if it will\ndisclose the attack.\n4-177. Obscuration further reduces the enemy’s visibility, particularly if the enemy has\nnight vision devices. The FO fires obscuration rounds close to or on enemy positions ,\nso it does not restrict friendly movement or hinder the reduction of obstacles. Employing \nobscuration on the objective during the assault may make it hard for assaulting Soldiers\nto find enemy fighting positions. If enough thermal sights are available , obscuration on\nthe objective may provide a decisive advantage for a well-trained platoon.\nNote. I f the enemy is equipped with night vision devices , leaders must evaluate \nthe risk of using each technique and ensure the mission is not compromised by \nthe enemy’s ability to detect infrared light sources.',
    'What are the advantages of using infrared illumination in assaults?',
    'How can leaders effectively provide command and control during defensive operations?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: dim_384, dim_256, dim_128 and dim_64
Evaluated with InformationRetrievalEvaluator

Metric	dim_384	dim_256	dim_128	dim_64
cosine_accuracy@1	0.0037	0.0037	0.0037	0.0019
cosine_accuracy@3	0.0131	0.0112	0.0093	0.0075
cosine_accuracy@5	0.0485	0.0373	0.0466	0.0429
cosine_accuracy@10	0.4496	0.4459	0.4366	0.4216
cosine_precision@1	0.0037	0.0037	0.0037	0.0019
cosine_precision@3	0.0044	0.0037	0.0031	0.0025
cosine_precision@5	0.0097	0.0075	0.0093	0.0086
cosine_precision@10	0.045	0.0446	0.0437	0.0422
cosine_recall@1	0.0037	0.0037	0.0037	0.0019
cosine_recall@3	0.0131	0.0112	0.0093	0.0075
cosine_recall@5	0.0485	0.0373	0.0466	0.0429
cosine_recall@10	0.4496	0.4459	0.4366	0.4216
cosine_ndcg@10	0.1501	0.1489	0.1465	0.139
cosine_mrr@10	0.0659	0.0653	0.0646	0.0595
cosine_map@100	0.0862	0.0859	0.0847	0.079

Training Details

Training Dataset

json

Dataset: json
Size: 4,820 training samples
Columns: positive and anchor
Approximate statistics based on the first 1000 samples:
positive anchor
type string string
details
min: 100 tokens
mean: 248.18 tokens
max: 256 tokens

min: 9 tokens
mean: 15.06 tokens
max: 27 tokens

	positive	anchor
type	string	string
details	min: 100 tokens mean: 248.18 tokens max: 256 tokens	min: 9 tokens mean: 15.06 tokens max: 27 tokens

Samples:

positive	anchor
Appendix A A-22 ATP 3-21.8 11 January 2024 A-68. Observed fire. Usually is used when the platoon is in protected defensive positions with engagement ranges more than 2,500 meters for stabilized systems (when attached) and 1,500 meters for unstabilized systems. It can be employed between elements of the platoon, such as the squad lasing and observing while the weapons squad engages. The platoon leader directs one squad to engage. The remaining squads observe fires and prepare to engage on order in case the engaging element consistently misses its targets , experiences a malfunction, or runs low on ammunition. Observed fire allows for mutual observation and assistance while protecting the location of the observing elements. A-69. Sequential fire. Entails the subordinate elements of a unit engaging the same point or area target one after another in an arranged sequence. Sequential fire also can help to prevent the waste of ammunition, as when a platoon waits to see the effects of the ...	`What is the purpose of having one squad engage while others observe in an observed fire scenario?`
Glossary Glossary-4 ATP 3-21.8 11 January 2024 PLD probable line of deployment PPEP personal protective equipment posture RFL restrictive fire line RM risk management ROE rules of engagement RS reduced sensitivity RTO radiotelephone operator S-2 battalion or brigade intelligence staff officer SALUTE size, activity, location, unit, time, and equipment SDM squad-designated marksman SITEMP situation template SLM shoulder-launched munition SOP standard operating procedure STP Soldier training publication TAA tactical assembly area TC training circular TCCC tactical combat casualty care TLP troop leading procedures TM technical manual TRP target reference point U.S. United States WARNORD warning order WCS weapons control status WP white phosphorous SECTION II – TERMS actions on contact A process to help leaders understand what is happening and to take action. (FM 3-90) air-ground operations The simultaneous or synchronized employment of ground forces with avi...	`How is the term SDM used in the military?`
Chapter 1 1-2 ATP 3-21.8 11 January 2024 MISSION, CAPABILITIES, AND LIMITATIONS 1-2. The mission of the Infantry rifle platoon is to close with the enemy using fire and movement to destroy or capture enemy forces , or to repel enemy attacks by fire , close co mbat, and counterattack to control land areas , including populations and resources. The Infantry rifle platoon leader exercises command and control and directs the operation of the platoon and attached units while conducting combined arms warfare throughout the depth of the platoon’s area of operations (AO). Platoon missions , although not inclusive, may include reducing fortified areas , infiltrating and seizing objectives in the enemy’ s rear, eliminating enemy force remnants in restricted terrain , securing key facilities and activities, and conducting operations in support of stability operations tasks in the wake of maneuvering forces. Reconnaissance and surveillance operations and security operations remain a core compe...	`What offensive and defensive actions can an Infantry rifle platoon perform?`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        384,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 64
per_device_eval_batch_size: 16
gradient_accumulation_steps: 8
num_train_epochs: 20
lr_scheduler_type: cosine
warmup_ratio: 0.2
bf16: True
tf32: True
load_best_model_at_end: True
optim: adamw_torch_fused
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 8
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.2
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: True
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	dim_384_cosine_ndcg@10	dim_256_cosine_ndcg@10	dim_128_cosine_ndcg@10	dim_64_cosine_ndcg@10
0.9474	9	-	0.1225	0.1221	0.1145	0.0915
1.0526	10	7.2521	-	-	-	-
2.0	19	-	0.1296	0.1261	0.1157	0.1089
2.1053	20	5.4977	-	-	-	-
2.9474	28	-	0.1294	0.1377	0.1262	0.1090
3.1579	30	4.3477	-	-	-	-
4.0	38	-	0.1330	0.1378	0.1260	0.1126
4.2105	40	3.3767	-	-	-	-
4.9474	47	-	0.1415	0.1388	0.1294	0.1221
5.2632	50	2.6443	-	-	-	-
6.0	57	-	0.1515	0.1395	0.1348	0.1218
6.3158	60	2.0824	-	-	-	-
6.9474	66	-	0.1480	0.1411	0.1335	0.1242
7.3684	70	1.6734	-	-	-	-
8.0	76	-	0.1491	0.1481	0.1428	0.1313
8.4211	80	1.3894	-	-	-	-
8.9474	85	-	0.1449	0.1497	0.1419	0.1341
9.4737	90	1.1443	-	-	-	-
10.0	95	-	0.1466	0.1494	0.1399	0.1396
10.5263	100	1.0121	-	-	-	-
10.9474	104	-	0.1458	0.1477	0.1415	0.1371
11.5789	110	0.8833	-	-	-	-
12.0	114	-	0.1479	0.1474	0.1445	0.1374
12.6316	120	0.8201	-	-	-	-
12.9474	123	-	0.1519	0.1486	0.1458	0.1360
13.6842	130	0.736	-	-	-	-
14.0	133	-	0.1505	0.1471	0.1484	0.1376
14.7368	140	0.6924	-	-	-	-
14.9474	142	-	0.1496	0.1486	0.1451	0.1396
15.7895	150	0.672	-	-	-	-
16.0	152	-	0.1492	0.1489	0.1464	0.1404
16.8421	160	0.6455	-	-	-	-
16.9474	161	-	0.1496	0.1493	0.1468	0.1389
17.8947	170	0.6538	-	-	-	-
18.0	171	-	0.1501	0.1470	0.1461	0.1393
18.9474	180	0.628	0.1501	0.1489	0.1465	0.1390

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.41.2
PyTorch: 2.1.2+cu121
Accelerate: 0.34.2
Datasets: 2.19.1
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

bbmb
/

deep-learning-for-embedding-model-ssilwal-qpham6_army_doc