metadata
language:
- en
license: cc-by-nc-4.0
model-index:
- name: MN-12B-Lyra-v3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 44.86
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 25.87
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 7.18
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.69
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 9.04
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 24.99
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Sao10K/MN-12B-Lyra-v3
name: Open LLM Leaderboard
Ungated. Thanks for the patience!
Mistral-NeMo-12B-Lyra-v3, built on top of Lyra-v2a2, which itself was built upon Lyra-v2a1.
Model Versioning
Lyra-v1 [Merge of Custom Roleplay & Instruct Trains, on Different Formats]
|
| [Additional SFT on 10% of Previous Data, Mixed]
v
Lyra-v2a1
|
| [Low Rank SFT Step + Tokenizer Diddling]
v
Lyra-v2a2
|
| [RL Step Performed on Multiturn Sets, Magpie-style Responses by Lyra-v2a2 for Rejected Data]
v
Lyra-v3
This uses a custom ChatML-style prompting Format!
-> What can go wrong?
[INST]system
This is the system prompt.[/INST]
[INST]user
Instructions placed here.[/INST]
[INST]assistant
The model's response will be here.[/INST]
Why this? I had used the wrong configs by accident. The format was meant for an 8B pruned NeMo train, instead it went to this. Oops.
Recommended Samplers:
Temperature: 0.7 - 1.2
min_p: 0.1 - 0.2 # Crucial for NeMo
Recommended Stopping Strings:
<|im_end|>
</s>
Blame messed up Training Configs, oops?
Training Metrics:
- Trained on 4xH100 SXM for 6 Hours.
- Trained for 2 Epochs.
- Effective Global Batch Size: 128.
- Dataset Used: A custom, cleaned mix of Stheno-v3.4's Dataset, focused mainly on multiturn.
Extras
Image Source: AI-Generated with FLUX.1 Dev.
have a nice day.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 19.27 |
IFEval (0-Shot) | 44.86 |
BBH (3-Shot) | 25.87 |
MATH Lvl 5 (4-Shot) | 7.18 |
GPQA (0-shot) | 3.69 |
MuSR (0-shot) | 9.04 |
MMLU-PRO (5-shot) | 24.99 |