gemma-7b-openhermes

image/jpeg

gemma-7b-openhermes is a variant of the Gemma 7B language model, which has been further fine-tuned on the OpenHermes-2.5 preference dataset using QLoRA.


Usage

Chat Template

The instruction-tuned models use a chat template that must be adhered to for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.

Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "abideen/gemma-7b-openhermes"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,
)

chat = [{ "role": "user", "content": "What is a Language Model?" }]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

After the prompt is ready, generation can be performed like this:

inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=250)
print(tokenizer.decode(outputs[0]))

Inputs and outputs

  • Input: Text string, such as a question, a prompt, or a document to be summarized.
  • Output: Generated English-language text in response to the input, such as an answer to a question, or a summary of a document.

πŸ† Evaluation results

Nous Benchmark

Agieval

Task Version Metric Value StdErr
agieval_aqua_rat 0 acc 24.80 _ 2.72
agieval_aqua_rat 0 acc_norm 24.80 _ 2.72
agieval_logiqa_en 0 acc 20.89 _ 1.59
agieval_logiqa_en 0 acc_norm 23.35 _ 1.66
agieval_lsat_ar 0 acc 21.74 _ 2.73
agieval_lsat_ar 0 acc_norm 20.43 _ 2.66
agieval_lsat_lr 0 acc 15.49 _ 1.60
agieval_lsat_lr 0 acc_norm 20.59 _ 1.79
agieval_lsat_rc 0 acc 17.10 _ 2.30
agieval_lsat_rc 0 acc_norm 17.84 _ 2.34
agieval_sat_en 0 acc 29.61 _ 3.19
agieval_sat_en 0 acc_norm 29.61 _ 3.19
agieval_sat_en_without_passage 0 acc 26.21 _ 3.07
agieval_sat_en_without_passage 0 acc_norm 24.76 _ 3.01
agieval_sat_math 0 acc 22.73 _ 2.83
agieval_sat_math 0 acc_norm 22.73 _ 2.83
Average: 22.29

GPT4ALL

Task Version Metric Value StdErr
arc_challenge 0 acc 20.14 _ 1.17
arc_challenge 0 acc_norm 22.87 _ 1.23
arc_easy 0 acc 32.37 _ 0.96
arc_easy 0 acc_norm 31.61 _ 0.95
boolq 1 acc 45.78 _ 0.87
hellaswag 0 acc 32.03 _ 0.47
hellaswag 0 acc_norm 35.18 _ 0.48
openbookqa 0 acc 17.8 _ 1.71
openbookqa 0 acc_norm 29.8 _ 2.05
piqa 0 acc 54.46 _ 1.16
piqa 0 acc_norm 54.57 _ 1.16
winogrande 0 acc 48.30 _ 1.40
Average: 32.00

TruthfulQA

Task Version Metric Value Std Err
truthfulqa_mc 1 mc1 30.11 1.61
truthfulqa_mc 1 mc2 47.69 1.61
Average: 38.90

Openllm Benchmark

Task Version Metric Value Stderr
arc_challenge 0 acc 48.12 Β± 1.46
acc_norm 51.27 Β± 1.46
hellaswag 0 acc 55.4 Β± 0.49
acc_norm 71.92 Β± 0.42
gsm8k 0 acc 29.87 Β± 1.2
winogrande 0 acc 68.19 Β± 1.3
mmlu 0 acc 53.62 Β± 0.6

Average: 73.5%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 30.23 Β± 1.60
mc2 47.17 Β± 1.63

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

πŸ“ Axolotl Configuration

base_model: google/gemma-7b-it
model_type: GemmaForCausalLM
tokenizer_type: GemmaTokenizer
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true
strict: false

rl: dpo
chat_template: chatml
datasets:
  - path: mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
    split: train
    type: chatml.intel
dataset_prepared_path:
val_set_size: 0.01
output_dir: ./out

adapter: qlora
lora_model_dir:

sequence_len: 1800
sample_packing: false
pad_to_sequence_len: false

lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:

wandb_project: gemma
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 5e-7

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: true

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 100
evals_per_epoch: 1
eval_table_size:
eval_table_max_new_tokens: 128
save_steps: 1000
max_steps: 1000
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu118
  • Datasets 2.17.0
  • Tokenizers 0.15.0
  • axolotl: 0.4.0

Built with Axolotl

Downloads last month
13
Safetensors
Model size
8.54B params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for QueryloopAI/gemma-7b-openhermes

Base model

google/gemma-7b
Finetuned
google/gemma-7b-it
Finetuned
(23)
this model

Dataset used to train QueryloopAI/gemma-7b-openhermes