leafspark commited on
Commit
f723e45
·
verified ·
1 Parent(s): c6d7757

docs: reorganize readme

Browse files
Files changed (1) hide show
  1. README.md +28 -78
README.md CHANGED
@@ -3,103 +3,53 @@ license: other
3
  license_name: mrl
4
  license_link: https://mistral.ai/licenses/MRL-0.1.md
5
  language:
6
- - en
7
- - fr
8
- - de
9
- - es
10
- - it
11
- - pt
12
- - zh
13
- - ja
14
- - ru
15
- - ko
 
16
  ---
17
 
18
  # Mistral-Large-218B-Instruct
19
 
20
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
21
 
22
- Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
23
-
24
- Self-merged from the original Mistral Large 2, see mergekit config below.
25
 
26
  ## Key features
27
- - Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
28
- - Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
29
- - Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
30
- - Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
31
- - Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
32
- - Mistral Research License: Allows usage and modification for research and non-commercial purposes.
33
- - Large Context: Features a large 128k context window for handling extensive input.
34
-
35
- ## Metrics
36
-
37
- Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
38
-
39
- **Base Pretrained Benchmarks**
40
-
41
- | Benchmark | Score |
42
- | --- | --- |
43
- | MMLU | 84.0% |
44
-
45
- **Base Pretrained Multilingual Benchmarks (MMLU)**
46
- | Benchmark | Score |
47
- | --- | --- |
48
- | French | 82.8% |
49
- | German | 81.6% |
50
- | Spanish | 82.7% |
51
- | Italian | 82.7% |
52
- | Dutch | 80.7% |
53
- | Portuguese | 81.6% |
54
- | Russian | 79.0% |
55
- | Korean | 60.1% |
56
- | Japanese | 78.8% |
57
- | Chinese | 74.8% |
58
-
59
- **Instruction Benchmarks**
60
-
61
- | Benchmark | Score |
62
- | --- | --- |
63
- | MT Bench | 8.63 |
64
- | Wild Bench | 56.3 |
65
- | Arena Hard| 73.2 |
66
-
67
- **Code & Reasoning Benchmarks**
68
- | Benchmark | Score |
69
- | --- | --- |
70
- | Human Eval | 92% |
71
- | Human Eval Plus| 87% |
72
- | MBPP Base| 80% |
73
- | MBPP Plus| 69% |
74
-
75
- **Math Benchmarks**
76
-
77
- | Benchmark | Score |
78
- | --- | --- |
79
- | GSM8K | 93% |
80
- | Math Instruct (0-shot, no CoT) | 70% |
81
- | Math Instruct (0-shot, CoT)| 71.5% |
82
-
83
- ## Usage
84
-
85
- This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
86
 
87
  ## Hardware Requirements
88
 
89
  Given the size of this model (218B parameters), it requires substantial computational resources for inference:
90
  - Recommended: 8xH100 (640GB)
91
- - Alternatively: Distributed inference setup across multiple machines.
92
 
93
  ## Limitations
94
 
95
- - This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
96
- - Due to its size, inference may be computationally expensive and require significant hardware resources.
97
- - As with all large language models, it may exhibit biases present in its training data.
98
- - The model's outputs should be critically evaluated, especially for sensitive applications.
99
 
100
  ## Notes
101
 
102
- This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [mradermacher/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-GGUF)
 
 
 
 
 
103
 
104
  Compatible `mergekit` config:
105
  ```yaml
 
3
  license_name: mrl
4
  license_link: https://mistral.ai/licenses/MRL-0.1.md
5
  language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - zh
13
+ - ja
14
+ - ru
15
+ - ko
16
+ pipeline_tag: text-generation
17
  ---
18
 
19
  # Mistral-Large-218B-Instruct
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
22
 
23
+ Mistral-Large-218B-Instruct is a dense Large Language Model (LLM) with 218 billion parameters. Self-merged from the original Mistral Large 2.
 
 
24
 
25
  ## Key features
26
+ - 218 billion parameters
27
+ - Multi-lingual support for dozens of languages
28
+ - Trained on 80+ coding languages
29
+ - 128k context window
30
+ - Mistral Research License: Allows usage and modification for research and non-commercial purposes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Hardware Requirements
33
 
34
  Given the size of this model (218B parameters), it requires substantial computational resources for inference:
35
  - Recommended: 8xH100 (640GB)
36
+ - Alternatively: Distributed inference setup across multiple machines
37
 
38
  ## Limitations
39
 
40
+ - No built-in moderation mechanisms
41
+ - Computationally expensive inference
42
+ - May exhibit biases present in training data
43
+ - Outputs should be critically evaluated for sensitive applications
44
 
45
  ## Notes
46
 
47
+ This was just a fun testing model, merged with the `merge.py` script in the base of the repo.
48
+
49
+ ## Quants
50
+
51
+ GGUF: [mradermacher/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-GGUF)
52
+ imatrix GGUF: [mradermacher/Mistral-Large-218B-Instruct-i1-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-i1-GGUF)
53
 
54
  Compatible `mergekit` config:
55
  ```yaml