Model Description
Mixtral Fine-Tuned for Hindi and Hinglish as part of ongoing experiments by bb deep learning systems
- Developed by: bb deep learning systems
- Language(s) (NLP): [English, Hindi, Romanised Hindi]
- License: [cc-by-nc-4.0]
- Finetuned from model: mistralai/Mixtral-8x7B-v0.1
Model Sources [optional]
- Paper: [More Information Coming Soon]
Training Details
Training Data
A mix of [Ultrachat200k] and [rohansolo/BB_HindiHinglishV2] were used for a total of 573,014,566 tokens in Hindi, Romanised Hindi and English.
Training Procedure
Training Loss at the end was
0.8977639613123988
Model was trained using the follwoing Hyperparameters:
warmup_steps: 100 weight_decay: 0.05
num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.0002
lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- w1
- w2
- w3 lora_target_linear: lora_fan_in_fan_out: lora_modules_to_save:
- embed_tokens
- lm_head
The following bitsandbytes
quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Environmental Impact
Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO$_2$eq/kWh. A cumulative of 94 hours of computation was performed on hardware of type A100 SXM4 80 GB (TDP of 400W).
Total emissions are estimated to be 16.24 kgCO$_2$eq of which 0 percents were directly offset.
- **Hardware Type:** [8 x A100 SXM4 80 GB]
- Downloads last month
- 0
Model tree for rohansolo/BB-Mixtral-HindiHinglish-8x7B-v0.1
Base model
mistralai/Mixtral-8x7B-v0.1