metadata
license: apache-2.0
library_name: transformers
model-index:
- name: Gemma-2-Ataraxy-Gemmasutra-9B-slerp
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 76.49
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 42.25
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 1.74
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 10.74
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.39
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 35.63
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: ENEM Challenge (No Images)
type: eduagarcia/enem_challenge
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 75.65
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BLUEX (No Images)
type: eduagarcia-temp/BLUEX_without_images
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 64.26
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: OAB Exams
type: eduagarcia/oab_exams
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 53.76
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 RTE
type: assin2
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 93.21
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 STS
type: eduagarcia/portuguese_benchmark
split: test
args:
num_few_shot: 15
metrics:
- type: pearson
value: 80.91
name: pearson
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: FaQuAD NLI
type: ruanchaves/faquad-nli
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 77.39
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HateBR Binary
type: ruanchaves/hatebr
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 87.61
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: PT Hate Speech Binary
type: hate_speech_portuguese
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 66.84
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: tweetSentBR
type: eduagarcia/tweetsentbr_fewshot
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 66.14
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp
name: Open Portuguese LLM Leaderboard
Gemma-2-Ataraxy-Gemmasutra-9B-slerp
๐ป Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 29.87 |
IFEval (0-Shot) | 76.49 |
BBH (3-Shot) | 42.25 |
MATH Lvl 5 (4-Shot) | 1.74 |
GPQA (0-shot) | 10.74 |
MuSR (0-shot) | 12.39 |
MMLU-PRO (5-shot) | 35.63 |
Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found here and on the ๐ Open Portuguese LLM Leaderboard
Metric | Value |
---|---|
Average | 73.97 |
ENEM Challenge (No Images) | 75.65 |
BLUEX (No Images) | 64.26 |
OAB Exams | 53.76 |
Assin2 RTE | 93.21 |
Assin2 STS | 80.91 |
FaQuAD NLI | 77.39 |
HateBR Binary | 87.61 |
PT Hate Speech Binary | 66.84 |
tweetSentBR | 66.14 |