|
--- |
|
license: other |
|
library_name: peft |
|
tags: |
|
- axolotl |
|
- generated_from_trainer |
|
base_model: meta-llama/Meta-Llama-3-8B |
|
model-index: |
|
- name: open-aditi-chat-hi-1.25-llama3 |
|
results: [] |
|
--- |
|
|
|
Preview of dataset trained on: https://huggingface.co/datasets/manishiitg/aditi-syn-v2 |
|
|
|
The synthetic dataset (https://huggingface.co/datasets/manishiitg/aditi-syn-v2) and the full data creation pipeline (https://github.com/manishiitg/aditi_dataset) have been open-sourced, enabling transparency and fostering further research in this domain. The dataset is a rich tapestry of Hinglish (a blend of Hindi and English) data, as well as a diverse array of tasks spanning tools, retrieval-augmented generation (RAG), mathematics, and reasoning – all in the Hindi language. |
|
|
|
LMJudge Eval |
|
============ |
|
|
|
https://github.com/manishiitg/IndicLMJudge |
|
|
|
|
|
#### LLM Judge Language: hi |
|
| Model | Language | Score | No# Questions | |
|
| --- | --- | --- | --- | |
|
| mistralai/Mixtral-8x7B-Instruct-v0.1 | hi | 8.7148 | 554 | |
|
| Qwen/Qwen1.5-72B-Chat-AWQ | hi | 8.3695 | 554 | |
|
| manishiitg/open-aditi-v6-llama3 | hi | 8.2659 | 551 | |
|
| Qwen/Qwen1.5-14B-Chat | hi | 8.2404 | 554 | |
|
| google/gemma-7b-it | hi | 7.9152 | 554 | |
|
| manishiitg/open-aditi-v6-gemma | hi | 7.8634 | 549 | |
|
| Qwen/Qwen1.5-7B-Chat | hi | 7.8587 | 554 | |
|
| manishiitg/open-aditi-hi-v3 | hi | 7.7644 | 554 | |
|
| manishiitg/open-aditi-hi-v4 | hi | 7.6150 | 554 | |
|
| manishiitg/open-aditi-hi-v2 | hi | 7.2518 | 554 | |
|
| teknium/OpenHermes-2.5-Mistral-7B | hi | 7.2489 | 554 | |
|
| ai4bharat/Airavata | hi | 6.9468 | 554 | |
|
| 01-ai/Yi-34B-Chat | hi | 6.5801 | 554 | |
|
| manishiitg/open-aditi-hi-v1 | hi | 4.7022 | 554 | |
|
| sarvamai/OpenHathi-7B-Hi-v0.1-Base | hi | 4.2834 | 598 | |
|
| Qwen/Qwen1.5-4B-Chat | hi | 4.1101 | 554 | |
|
|
|
|
|
#### LLM Judge Language: en |
|
| Model | Language | Score | No# Questions | |
|
| --- | --- | --- | --- | |
|
| Qwen/Qwen1.5-14B-Chat | en | 9.1947 | 356 | |
|
| Qwen/Qwen1.5-72B-Chat-AWQ | en | 9.1618 | 356 | |
|
| Qwen/Qwen1.5-7B-Chat | en | 9.1570 | 356 | |
|
| 01-ai/Yi-34B-Chat | en | 9.1368 | 356 | |
|
| mistralai/Mixtral-8x7B-Instruct-v0.1 | en | 9.1306 | 356 | |
|
| manishiitg/open-aditi-v6-gemma | en | 9.1003 | 356 | |
|
| teknium/OpenHermes-2.5-Mistral-7B | en | 9.0230 | 356 | |
|
| manishiitg/open-aditi-v6-llama3 | en | 9.0197 | 356 | |
|
| manishiitg/open-aditi-hi-v3 | en | 8.9615 | 356 | |
|
| manishiitg/open-aditi-hi-v4 | en | 8.9188 | 356 | |
|
| google/gemma-7b-it | en | 8.8191 | 356 | |
|
| Qwen/Qwen1.5-4B-Chat | en | 8.7500 | 356 | |
|
| google/gemma-2b-it | en | 8.4671 | 356 | |
|
| manishiitg/open-aditi-hi-v2 | en | 8.4584 | 356 | |
|
| ai4bharat/Airavata | en | 7.3834 | 356 | |
|
| manishiitg/open-aditi-hi-v1 | en | 6.6559 | 356 | |
|
| sarvamai/OpenHathi-7B-Hi-v0.1-Base | en | 5.9567 | 312 | |
|
|
|
DHARMA TINY EVAL |
|
============ |
|
|
|
#### Language Hi |
|
|
|
| Model | ARC-Easy | bigbench | truthful_qa | BoolQ | winogrande | agieval | ARC-Challenge | MMLU | openbookqa | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| open-aditi-hi-v2 | 0.6245 | 0.4959 | 0.3866 | 0.7192 | 0.5353 | 0.2945 | 0.4828 | 0.3457 | 0.5279 | |
|
| open-aditi-hi-v3 | 0.6803 | 0.4553 | 0.2788 | 0.7385 | 0.5390 | 0.2178 | 0.4914 | 0.3346 | 0.5688 | |
|
| open-aditi-hi-v4 | 0.6989 | 0.4526 | 0.2714 | 0.7231 | 0.5167 | 0.2331 | 0.5302 | 0.3123 | 0.5316 | |
|
| open-aditi-v6-gemma | 0.7212 | 0.4146 | 0.3234 | 0.6923 | 0.4870 | 0.2638 | 0.4957 | 0.3680 | 0.4349 | |
|
| open-aditi-v6-llama3 | 0.5688 | 0.4119 | 0.2268 | 0.6500 | 0.4498 | 0.2331 | 0.4310 | 0.3420 | 0.3792 | |
|
| open-aditi-hi-v1 | 0.4572 | 0.3767 | 0.2230 | 0.6346 | 0.4647 | 0.1840 | 0.3405 | 0.3271 | 0.3532 | |
|
| OpenHermes-2.5-Mistral-7B | 0.3309 | 0.4201 | 0.3197 | 0.6077 | 0.4981 | 0.2331 | 0.3276 | 0.3086 | 0.3086 | |
|
| OpenHathi-7B-Hi-v0.1-Base | 0.2862 | 0.3333 | 0.5130 | 0.6077 | 0.4907 | 0.2301 | 0.3017 | 0.2677 | 0.1933 | |
|
| Airavata | 0.2751 | 0.1274 | 0.2268 | 0.0615 | 0.3866 | 0.1104 | 0.2845 | 0.1450 | 0.3383 | |
|
| gemma-7b-it | 0.1227 | 0.0786 | 0.0743 | 0.1808 | 0.1561 | 0.0491 | 0.1078 | 0.0818 | 0.0855 | |
|
|
|
#### Language En |
|
|
|
| Model | ARC-Easy | bigbench | truthful_qa | BoolQ | winogrande | agieval | ARC-Challenge | MMLU | openbookqa | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| OpenHermes-2.5-Mistral-7B | 0.8922 | 0.5745 | 0.3197 | 0.8346 | 0.6989 | 0.4908 | 0.7802 | 0.5911 | 0.7621 | |
|
| open-aditi-hi-v2 | 0.8625 | 0.5149 | 0.3532 | 0.8192 | 0.6877 | 0.4571 | 0.7500 | 0.5613 | 0.7732 | |
|
| open-aditi-hi-v4 | 0.8959 | 0.5041 | 0.2862 | 0.8423 | 0.6914 | 0.4571 | 0.7716 | 0.5651 | 0.7138 | |
|
| open-aditi-hi-v3 | 0.8773 | 0.4986 | 0.3048 | 0.8385 | 0.6766 | 0.4663 | 0.7371 | 0.5613 | 0.7249 | |
|
| Qwen1.5-7B-Chat | 0.8922 | 0.5122 | 0.2007 | 0.8000 | 0.6654 | 0.4294 | 0.7759 | 0.5799 | 0.7621 | |
|
| open-aditi-v6-gemma | 0.8699 | 0.4959 | 0.2602 | 0.7385 | 0.5465 | 0.4540 | 0.7371 | 0.5167 | 0.6654 | |
|
| open-aditi-v6-llama3 | 0.8810 | 0.4634 | 0.1822 | 0.7577 | 0.5353 | 0.4110 | 0.7457 | 0.5688 | 0.6506 | |
|
| open-aditi-hi-v1 | 0.8104 | 0.3902 | 0.2491 | 0.6962 | 0.5539 | 0.3681 | 0.6379 | 0.5056 | 0.5911 | |
|
| Airavata | 0.7026 | 0.4282 | 0.3123 | 0.7192 | 0.5651 | 0.3313 | 0.5172 | 0.3792 | 0.5093 | |
|
| OpenHathi-7B-Hi-v0.1-Base | 0.4684 | 0.3062 | 0.4758 | 0.6346 | 0.5167 | 0.2577 | 0.3017 | 0.2788 | 0.2714 | |
|
|
|
|
|
Task: BoolQ Metric: score |
|
|
|
Task: ARC-Easy Metric: score |
|
|
|
Task: openbookqa Metric: score |
|
|
|
Task: winogrande Metric: score |
|
|
|
Task: ARC-Challenge Metric: score |
|
|
|
Task: truthful_qa Metric: score |
|
|
|
Task: bigbench Metric: score |
|
|
|
Task: MMLU Metric: score |
|
|
|
Task: agieval Metric: score |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.4.0` |
|
```yaml |
|
base_model: meta-llama/Meta-Llama-3-8B |
|
model_type: AutoModelForCausalLM |
|
tokenizer_type: AutoTokenizer |
|
|
|
load_in_8bit: false |
|
load_in_4bit: true |
|
strict: false |
|
|
|
datasets: |
|
- path: manishiitg/aditi-syn-train-small-v3 |
|
type: completion |
|
|
|
|
|
# 25 has only sythentic data, and has judge removed data |
|
hub_model_id: manishiitg/open-aditi-chat-hi-1.25-llama3 |
|
hf_use_auth_token: true |
|
|
|
wandb_project: open-aditi-chat-hi-1.25-llama3 |
|
|
|
dataset_prepared_path: manishiitg |
|
push_dataset_to_hub: manishiitg |
|
val_set_size: .1 |
|
output_dir: /sky-notebook/manishiitg/open-aditi-chat-hi-1.25-llama3 |
|
|
|
adapter: qlora |
|
lora_model_dir: |
|
save_safetensors: true |
|
|
|
sequence_len: 2048 |
|
sample_packing: true |
|
pad_to_sequence_len: true |
|
eval_sample_packing: false |
|
|
|
lora_r: 32 |
|
lora_alpha: 16 |
|
lora_dropout: 0.05 |
|
lora_target_linear: true |
|
|
|
wandb_entity: |
|
wandb_watch: |
|
wandb_run_id: |
|
wandb_log_model: |
|
|
|
gradient_accumulation_steps: 8 |
|
micro_batch_size: 6 |
|
num_epochs: 1 |
|
optimizer: paged_adamw_32bit |
|
lr_scheduler: cosine |
|
learning_rate: 0.0002 |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: true |
|
fp16: false |
|
tf32: false |
|
|
|
|
|
gradient_checkpointing: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
auto_resume_from_checkpoints: true ## manage check point resume from here |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
|
|
warmup_steps: 10 |
|
evals_per_epoch: 2 |
|
eval_table_size: |
|
eval_table_max_new_tokens: 128 |
|
save_steps: 20 ## increase based on your dataset |
|
save_strategy: steps |
|
debug: |
|
deepspeed: |
|
weight_decay: 0.0 |
|
fsdp: |
|
fsdp_config: |
|
special_tokens: |
|
pad_token: <|end_of_text|> |
|
``` |
|
|
|
</details><br> |
|
|
|
# open-aditi-chat-hi-1.25-llama3 |
|
|
|
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.9727 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0002 |
|
- train_batch_size: 6 |
|
- eval_batch_size: 6 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 8 |
|
- gradient_accumulation_steps: 8 |
|
- total_train_batch_size: 384 |
|
- total_eval_batch_size: 48 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 10 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 1.5388 | 0.01 | 1 | 2.5709 | |
|
| 0.8839 | 0.5 | 88 | 1.9648 | |
|
| 0.88 | 1.0 | 176 | 1.9727 | |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.9.0 |
|
- Transformers 4.40.0.dev0 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.18.0 |
|
- Tokenizers 0.15.0 |