--- base_model: Lambent/proto-nova-eidolon-v2alpha0.3-14B tags: - generated_from_trainer - not-for-all-audiences model-index: - name: dpoq results: [] --- This version has been tuned from the fascinating arcee-ai/SuperNova-Medius as root model. Censorship remains notable on this one, just including the Not For All Audiences tag due to dataset. EQ-Bench is about 1 point lower than its ancestor, but fixed a syntax issue. May indicate a bit of expected intelligence loss. Methodology: A bit of custom fine-tuning, with the plurality from the 'filtered' subset of argilla/ifeval-like-data experimentally trained with 'input/output' roles rather than 'user/assistant' (other instruction sampling stayed chatml-style, some continued pretraining added with a bias to older public domain styles); ties merged at full saturation with the original over base Qwen, then this DPO. [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: Lambent/proto-nova-eidolon-v2alpha0.3-14B model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer trust_remote_code: true save_safetensors: true load_in_8bit: false load_in_4bit: true strict: false rl: dpo # total_num_tokens: datasets: - path: Lambent/ai-deconditioning-synthesized-dpo split: train type: chatml.prompt_pairs - path: jondurbin/gutenberg-dpo-v0.1 split: train type: chatml.prompt_pairs - path: nbeerbower/gutenberg2-dpo split: train type: chatml.prompt_pairs - path: unalignment/toxic-dpo-v0.2 split: train type: chatml.prompt_pairs - path: vicgalle/configurable-system-prompt-multitask split: train type: chatml.prompt_pairs dataset_prepared_path: prepared-dpo output_dir: ./dpoq val_set_size: 0.01 seed: 1 sequence_len: 2048 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false adapter: qlora lora_model_dir: lora_r: 256 lora_alpha: 256 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: peft_use_dora: true wandb_project: eidolon-qwen2.5-qlora-dpo wandb_entity: wandb_watch: wandb_name: wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 2 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00001 #cosine_min_lr_ratio: 0.1 #cosine_constant_lr_ratio: 0.95 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 16 evals_per_epoch: 8 saves_per_epoch: 8 save_total_limit: 2 debug: deepspeed: weight_decay: 0.001 fsdp: fsdp_config: ```

# dpoq This model is a fine-tuned version of [Lambent/proto-nova-eidolon-v2alpha0.3-14B](https://huggingface.co/Lambent/proto-nova-eidolon-v2alpha0.3-14B) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 16 - training_steps: 124 ### Training results ### Framework versions - PEFT 0.13.2 - Transformers 4.45.2 - Pytorch 2.3.1+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1