Update README.md

6980b87 verified 10 months ago

8.88 kB

	---
	license: other
	library_name: peft
	tags:
	- axolotl
	- generated_from_trainer
	base_model: meta-llama/Meta-Llama-3-8B
	model-index:
	- name: open-aditi-chat-hi-1.25-llama3
	results: []
	---

	Preview of dataset trained on: https://huggingface.co/datasets/manishiitg/aditi-syn-v2

	The synthetic dataset (https://huggingface.co/datasets/manishiitg/aditi-syn-v2) and the full data creation pipeline (https://github.com/manishiitg/aditi_dataset) have been open-sourced, enabling transparency and fostering further research in this domain. The dataset is a rich tapestry of Hinglish (a blend of Hindi and English) data, as well as a diverse array of tasks spanning tools, retrieval-augmented generation (RAG), mathematics, and reasoning – all in the Hindi language.

	LMJudge Eval
	============

	https://github.com/manishiitg/IndicLMJudge


	#### LLM Judge Language: hi
	\| Model \| Language \| Score \| No# Questions \|
	\| --- \| --- \| --- \| --- \|
	\| mistralai/Mixtral-8x7B-Instruct-v0.1 \| hi \| 8.7148 \| 554 \|
	\| Qwen/Qwen1.5-72B-Chat-AWQ \| hi \| 8.3695 \| 554 \|
	\| manishiitg/open-aditi-v6-llama3 \| hi \| 8.2659 \| 551 \|
	\| Qwen/Qwen1.5-14B-Chat \| hi \| 8.2404 \| 554 \|
	\| google/gemma-7b-it \| hi \| 7.9152 \| 554 \|
	\| manishiitg/open-aditi-v6-gemma \| hi \| 7.8634 \| 549 \|
	\| Qwen/Qwen1.5-7B-Chat \| hi \| 7.8587 \| 554 \|
	\| manishiitg/open-aditi-hi-v3 \| hi \| 7.7644 \| 554 \|
	\| manishiitg/open-aditi-hi-v4 \| hi \| 7.6150 \| 554 \|
	\| manishiitg/open-aditi-hi-v2 \| hi \| 7.2518 \| 554 \|
	\| teknium/OpenHermes-2.5-Mistral-7B \| hi \| 7.2489 \| 554 \|
	\| ai4bharat/Airavata \| hi \| 6.9468 \| 554 \|
	\| 01-ai/Yi-34B-Chat \| hi \| 6.5801 \| 554 \|
	\| manishiitg/open-aditi-hi-v1 \| hi \| 4.7022 \| 554 \|
	\| sarvamai/OpenHathi-7B-Hi-v0.1-Base \| hi \| 4.2834 \| 598 \|
	\| Qwen/Qwen1.5-4B-Chat \| hi \| 4.1101 \| 554 \|


	#### LLM Judge Language: en
	\| Model \| Language \| Score \| No# Questions \|
	\| --- \| --- \| --- \| --- \|
	\| Qwen/Qwen1.5-14B-Chat \| en \| 9.1947 \| 356 \|
	\| Qwen/Qwen1.5-72B-Chat-AWQ \| en \| 9.1618 \| 356 \|
	\| Qwen/Qwen1.5-7B-Chat \| en \| 9.1570 \| 356 \|
	\| 01-ai/Yi-34B-Chat \| en \| 9.1368 \| 356 \|
	\| mistralai/Mixtral-8x7B-Instruct-v0.1 \| en \| 9.1306 \| 356 \|
	\| manishiitg/open-aditi-v6-gemma \| en \| 9.1003 \| 356 \|
	\| teknium/OpenHermes-2.5-Mistral-7B \| en \| 9.0230 \| 356 \|
	\| manishiitg/open-aditi-v6-llama3 \| en \| 9.0197 \| 356 \|
	\| manishiitg/open-aditi-hi-v3 \| en \| 8.9615 \| 356 \|
	\| manishiitg/open-aditi-hi-v4 \| en \| 8.9188 \| 356 \|
	\| google/gemma-7b-it \| en \| 8.8191 \| 356 \|
	\| Qwen/Qwen1.5-4B-Chat \| en \| 8.7500 \| 356 \|
	\| google/gemma-2b-it \| en \| 8.4671 \| 356 \|
	\| manishiitg/open-aditi-hi-v2 \| en \| 8.4584 \| 356 \|
	\| ai4bharat/Airavata \| en \| 7.3834 \| 356 \|
	\| manishiitg/open-aditi-hi-v1 \| en \| 6.6559 \| 356 \|
	\| sarvamai/OpenHathi-7B-Hi-v0.1-Base \| en \| 5.9567 \| 312 \|

	DHARMA TINY EVAL
	============

	#### Language Hi

	\| Model \| ARC-Easy \| bigbench \| truthful_qa \| BoolQ \| winogrande \| agieval \| ARC-Challenge \| MMLU \| openbookqa \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| open-aditi-hi-v2 \| 0.6245 \| 0.4959 \| 0.3866 \| 0.7192 \| 0.5353 \| 0.2945 \| 0.4828 \| 0.3457 \| 0.5279 \|
	\| open-aditi-hi-v3 \| 0.6803 \| 0.4553 \| 0.2788 \| 0.7385 \| 0.5390 \| 0.2178 \| 0.4914 \| 0.3346 \| 0.5688 \|
	\| open-aditi-hi-v4 \| 0.6989 \| 0.4526 \| 0.2714 \| 0.7231 \| 0.5167 \| 0.2331 \| 0.5302 \| 0.3123 \| 0.5316 \|
	\| open-aditi-v6-gemma \| 0.7212 \| 0.4146 \| 0.3234 \| 0.6923 \| 0.4870 \| 0.2638 \| 0.4957 \| 0.3680 \| 0.4349 \|
	\| open-aditi-v6-llama3 \| 0.5688 \| 0.4119 \| 0.2268 \| 0.6500 \| 0.4498 \| 0.2331 \| 0.4310 \| 0.3420 \| 0.3792 \|
	\| open-aditi-hi-v1 \| 0.4572 \| 0.3767 \| 0.2230 \| 0.6346 \| 0.4647 \| 0.1840 \| 0.3405 \| 0.3271 \| 0.3532 \|
	\| OpenHermes-2.5-Mistral-7B \| 0.3309 \| 0.4201 \| 0.3197 \| 0.6077 \| 0.4981 \| 0.2331 \| 0.3276 \| 0.3086 \| 0.3086 \|
	\| OpenHathi-7B-Hi-v0.1-Base \| 0.2862 \| 0.3333 \| 0.5130 \| 0.6077 \| 0.4907 \| 0.2301 \| 0.3017 \| 0.2677 \| 0.1933 \|
	\| Airavata \| 0.2751 \| 0.1274 \| 0.2268 \| 0.0615 \| 0.3866 \| 0.1104 \| 0.2845 \| 0.1450 \| 0.3383 \|
	\| gemma-7b-it \| 0.1227 \| 0.0786 \| 0.0743 \| 0.1808 \| 0.1561 \| 0.0491 \| 0.1078 \| 0.0818 \| 0.0855 \|

	#### Language En

	\| Model \| ARC-Easy \| bigbench \| truthful_qa \| BoolQ \| winogrande \| agieval \| ARC-Challenge \| MMLU \| openbookqa \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| OpenHermes-2.5-Mistral-7B \| 0.8922 \| 0.5745 \| 0.3197 \| 0.8346 \| 0.6989 \| 0.4908 \| 0.7802 \| 0.5911 \| 0.7621 \|
	\| open-aditi-hi-v2 \| 0.8625 \| 0.5149 \| 0.3532 \| 0.8192 \| 0.6877 \| 0.4571 \| 0.7500 \| 0.5613 \| 0.7732 \|
	\| open-aditi-hi-v4 \| 0.8959 \| 0.5041 \| 0.2862 \| 0.8423 \| 0.6914 \| 0.4571 \| 0.7716 \| 0.5651 \| 0.7138 \|
	\| open-aditi-hi-v3 \| 0.8773 \| 0.4986 \| 0.3048 \| 0.8385 \| 0.6766 \| 0.4663 \| 0.7371 \| 0.5613 \| 0.7249 \|
	\| Qwen1.5-7B-Chat \| 0.8922 \| 0.5122 \| 0.2007 \| 0.8000 \| 0.6654 \| 0.4294 \| 0.7759 \| 0.5799 \| 0.7621 \|
	\| open-aditi-v6-gemma \| 0.8699 \| 0.4959 \| 0.2602 \| 0.7385 \| 0.5465 \| 0.4540 \| 0.7371 \| 0.5167 \| 0.6654 \|
	\| open-aditi-v6-llama3 \| 0.8810 \| 0.4634 \| 0.1822 \| 0.7577 \| 0.5353 \| 0.4110 \| 0.7457 \| 0.5688 \| 0.6506 \|
	\| open-aditi-hi-v1 \| 0.8104 \| 0.3902 \| 0.2491 \| 0.6962 \| 0.5539 \| 0.3681 \| 0.6379 \| 0.5056 \| 0.5911 \|
	\| Airavata \| 0.7026 \| 0.4282 \| 0.3123 \| 0.7192 \| 0.5651 \| 0.3313 \| 0.5172 \| 0.3792 \| 0.5093 \|
	\| OpenHathi-7B-Hi-v0.1-Base \| 0.4684 \| 0.3062 \| 0.4758 \| 0.6346 \| 0.5167 \| 0.2577 \| 0.3017 \| 0.2788 \| 0.2714 \|


	Task: BoolQ Metric: score

	Task: ARC-Easy Metric: score

	Task: openbookqa Metric: score

	Task: winogrande Metric: score

	Task: ARC-Challenge Metric: score

	Task: truthful_qa Metric: score

	Task: bigbench Metric: score

	Task: MMLU Metric: score

	Task: agieval Metric: score


	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: meta-llama/Meta-Llama-3-8B
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: true
	strict: false

	datasets:
	- path: manishiitg/aditi-syn-train-small-v3
	type: completion


	# 25 has only sythentic data, and has judge removed data
	hub_model_id: manishiitg/open-aditi-chat-hi-1.25-llama3
	hf_use_auth_token: true

	wandb_project: open-aditi-chat-hi-1.25-llama3

	dataset_prepared_path: manishiitg
	push_dataset_to_hub: manishiitg
	val_set_size: .1
	output_dir: /sky-notebook/manishiitg/open-aditi-chat-hi-1.25-llama3

	adapter: qlora
	lora_model_dir:
	save_safetensors: true

	sequence_len: 2048
	sample_packing: true
	pad_to_sequence_len: true
	eval_sample_packing: false

	lora_r: 32
	lora_alpha: 16
	lora_dropout: 0.05
	lora_target_linear: true

	wandb_entity:
	wandb_watch:
	wandb_run_id:
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 6
	num_epochs: 1
	optimizer: paged_adamw_32bit
	lr_scheduler: cosine
	learning_rate: 0.0002

	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false


	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	auto_resume_from_checkpoints: true ## manage check point resume from here
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 10
	evals_per_epoch: 2
	eval_table_size:
	eval_table_max_new_tokens: 128
	save_steps: 20 ## increase based on your dataset
	save_strategy: steps
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|end_of_text\|>
	```

	</details><br>

	# open-aditi-chat-hi-1.25-llama3

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9727

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 6
	- eval_batch_size: 6
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 384
	- total_eval_batch_size: 48
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.5388 \| 0.01 \| 1 \| 2.5709 \|
	\| 0.8839 \| 0.5 \| 88 \| 1.9648 \|
	\| 0.88 \| 1.0 \| 176 \| 1.9727 \|


	### Framework versions

	- PEFT 0.9.0
	- Transformers 4.40.0.dev0
	- Pytorch 2.1.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.0