license: llama3.1
model-index:
- name: Llama-Spark
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 79.11
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 29.77
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 1.06
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 6.6
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 2.62
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 30.23
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
name: Open LLM Leaderboard
Llama-Spark is a powerful conversational AI model developed by Arcee.ai. It's built on the foundation of Llama-3.1-8B and merges the power of our Tome Dataset with Llama-3.1-8B-Instruct, resulting in a remarkable conversationalist that punches well above its 8B parameter weight class.
GGUFs available here
Model Description
Llama-Spark is our commitment to consistently delivering the best-performing conversational AI in the 6-9B parameter range. As new base models become available, we'll continue to update and improve Spark to maintain its leadership position.
This model is a successor to our original Arcee-Spark, incorporating advancements and learnings from our ongoing research and development.
Intended Uses
Llama-Spark is intended for use in conversational AI applications, such as chatbots, virtual assistants, and dialogue systems. It excels at engaging in natural and informative conversations.
Training Information
Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using of the Tome Dataset and merged with Llama-3.1-8B-Instruct.
Evaluation Results
Please note that these scores are consistantly higher than the OpenLLM leaderboard, and should be compared to their relative performance increase not weighed against the leaderboard.
Acknowledgements
We extend our deepest gratitude to PrimeIntellect for being our compute sponsor for this project.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 24.90 |
IFEval (0-Shot) | 79.11 |
BBH (3-Shot) | 29.77 |
MATH Lvl 5 (4-Shot) | 1.06 |
GPQA (0-shot) | 6.60 |
MuSR (0-shot) | 2.62 |
MMLU-PRO (5-shot) | 30.23 |