metadata

license: mit
library_name: transformers
datasets:
  - teknium/OpenHermes-2.5
model-index:
  - name: phi-2-OpenHermes-2.5
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 59.81
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 74.85
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 55.51
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.86
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 75.06
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 41.17
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=g-ronimo/phi-2-OpenHermes-2.5
          name: Open LLM Leaderboard

microsoft/phi-2 + teknium/OpenHermes-2.5

Training

QLoRA rank 32, LR 2e-5, 1 epoch
effective batch size: 200
max. seq. length: 1024 tokens
code in code/

Evals

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
g-ronimo/phi-2-OpenHermes-2.5	30.27	71.18	43.87	35.9	45.3
minghaowu/phi-2-OpenHermes-2.5	27.95	67.55	48.07	36.17	44.94
phi-2	27.96	70.84	44.46	35.17	44.61

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

modelpath="g-ronimo/phi-2-OpenHermes-2.5"

model = AutoModelForCausalLM.from_pretrained(
    modelpath,    
    torch_dtype=torch.bfloat16,
    device_map="auto",
    # attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(modelpath) 

messages = [
    {"role": "system", "content": "answer like a pirate"},
    {"role": "user", "content": "what does it mean to be successful?"},
]
        
input_tokens = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")
output_tokens = model.generate(input_tokens, max_new_tokens=500)
output = tokenizer.decode(output_tokens[0])

print(output)

Ahoy there, matey! To me, being successful means having the wind in your sails and reaching the treasure you've been dreaming of. It's about setting sail on a journey with clear goals, working hard, facing challenges head-on, and never losing sight of what truly matters. So, set your compass right, hoist your Jolly Roger high, and let's embark on this adventure together! ⚓️💰⛵️

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	58.38
AI2 Reasoning Challenge (25-Shot)	59.81
HellaSwag (10-Shot)	74.85
MMLU (5-Shot)	55.51
TruthfulQA (0-shot)	43.86
Winogrande (5-shot)	75.06
GSM8k (5-shot)	41.17