File size: 4,233 Bytes
b2aed30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
language:
  - en
  - fr
  - de
  - es
  - it
  - pt
  - zh
  - ja
  - ru 
  - ko
---

# Mistral-Large-218B-Instruct

![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6604e5b21eb292d6df393365%2FP-BGJ5Ba2d1NkpdGXNThe.png%3C%2Fspan%3E)

Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.

Self-merged from the original Mistral Large 2, see mergekit config below.

## Key features
- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
- Large Context: Features a large 128k context window for handling extensive input.

## Metrics

Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.

**Base Pretrained Benchmarks**

| Benchmark | Score |
| --- | --- |
| MMLU | 84.0% |

**Base Pretrained Multilingual Benchmarks (MMLU)**
| Benchmark | Score |
| --- | --- |
| French | 82.8% |
| German | 81.6% |
| Spanish | 82.7% |
| Italian | 82.7% |
| Dutch | 80.7% |
| Portuguese | 81.6% |
| Russian | 79.0% |
| Korean | 60.1% |
| Japanese | 78.8% |
| Chinese | 74.8% |

**Instruction Benchmarks**

| Benchmark | Score |
| --- | --- |
| MT Bench | 8.63 |
| Wild Bench | 56.3 |
| Arena Hard| 73.2 |

**Code & Reasoning Benchmarks**
| Benchmark | Score |
| --- | --- |
| Human Eval | 92% |
| Human Eval Plus| 87% |
| MBPP Base| 80% |
| MBPP Plus| 69% |

**Math Benchmarks**

| Benchmark | Score |
| --- | --- |
| GSM8K | 93% |
| Math Instruct (0-shot, no CoT) | 70% |
| Math Instruct (0-shot, CoT)| 71.5% |

## Usage

This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.

## Hardware Requirements

Given the size of this model (218B parameters), it requires substantial computational resources for inference:
- Recommended: 8xH100 (640GB)
- Alternatively: Distributed inference setup across multiple machines.

## Limitations

- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
- Due to its size, inference may be computationally expensive and require significant hardware resources.
- As with all large language models, it may exhibit biases present in its training data.
- The model's outputs should be critically evaluated, especially for sensitive applications.

## Notes

This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [leafspark/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF/)

Compatible `mergekit` config:
```yaml
slices:
- sources:
  - layer_range: [0, 20]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [10, 30]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [20, 40]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [30, 50]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [40, 60]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [50, 70]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [60, 80]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [70, 87]
    model: mistralai/Mistral-Large-Instruct-2407
merge_method: passthrough
dtype: bfloat16
```