mythalion-13b / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
b0027f8 verified
|
raw
history blame
6.53 kB
metadata
language:
  - en
license: llama2
tags:
  - text generation
  - instruct
datasets:
  - PygmalionAI/PIPPA
  - Open-Orca/OpenOrca
  - Norquinal/claude_multiround_chat_30k
  - jondurbin/airoboros-gpt4-1.4.1
  - databricks/databricks-dolly-15k
pipeline_tag: text-generation
inference: false
model-index:
  - name: mythalion-13b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 61.26
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 83.81
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 56.53
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 46.56
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 77.43
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 13.27
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=PygmalionAI/mythalion-13b
          name: Open LLM Leaderboard

Mythalion 13B

A merge of Pygmalion-2 13B and MythoMax 13B

Model Details

The long-awaited release of our new models based on Llama-2 is finally here. This model was created in collaboration with Gryphe, a mixture of our Pygmalion-2 13B and Gryphe's Mythomax L2 13B.

Finer details of the merge are available in our blogpost. According to our testers, this model seems to outperform MythoMax in RP/Chat. Please make sure you follow the recommended generation settings for SillyTavern here for the best results!

This model is freely available for both commercial and non-commercial use, as per the Llama-2 license.

Prompting

This model can be prompted using both the Alpaca and Pygmalion formatting.

Alpaca formatting:

### Instruction:
<prompt>

### Response:
<leave a newline blank for model to respond>

Pygmalion/Metharme formatting:

<|system|>Enter RP mode. Pretend to be {{char}} whose persona follows:
{{persona}}

You shall reply to the user while staying in character, and generate long responses.
<|user|>Hello!<|model|>{model's response goes here}

The model has been trained on prompts using three different roles, which are denoted by the following tokens: <|system|>, <|user|> and <|model|>.

The <|system|> prompt can be used to inject out-of-channel information behind the scenes, while the <|user|> prompt should be used to indicate user input. The <|model|> token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to form a conversation history.

Limitations and biases

The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.

As such, it was not fine-tuned to be safe and harmless: the base model and this fine-tune have been trained on data known to contain profanity and texts that are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading.

Acknowledgements

We would like to thank SpicyChat for sponsoring the training for the Pygmalion-2 13B model.

Built with Axolotl

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 56.48
AI2 Reasoning Challenge (25-Shot) 61.26
HellaSwag (10-Shot) 83.81
MMLU (5-Shot) 56.53
TruthfulQA (0-shot) 46.56
Winogrande (5-shot) 77.43
GSM8k (5-shot) 13.27