juliehunter
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,7 @@ pipeline_tag: text-generation
|
|
20 |
* [Training Details](#training-details)
|
21 |
* [Training Data](#training-data)
|
22 |
* [Preprocessing](#preprocessing)
|
|
|
23 |
* [Training Procedure](#training-procedure)
|
24 |
<!-- * [Evaluation](#evaluation) -->
|
25 |
* [Testing the model](#testing-the-model)
|
@@ -56,7 +57,32 @@ Lucie-7B-Instruct is trained on the following datasets:
|
|
56 |
* Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
|
57 |
* Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
|
58 |
|
|
|
|
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
### Training procedure
|
62 |
|
|
|
20 |
* [Training Details](#training-details)
|
21 |
* [Training Data](#training-data)
|
22 |
* [Preprocessing](#preprocessing)
|
23 |
+
* [Instruction template](#instruction-template)
|
24 |
* [Training Procedure](#training-procedure)
|
25 |
<!-- * [Evaluation](#evaluation) -->
|
26 |
* [Testing the model](#testing-the-model)
|
|
|
57 |
* Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
|
58 |
* Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
|
59 |
|
60 |
+
### Instruction template:
|
61 |
+
Lucie-7B-Instruct was trained on the chat template from Llama 3.1 with the sole difference that `<|begin_of_text|>` is replaced with `<s>`. The resulting template:
|
62 |
|
63 |
+
```
|
64 |
+
<s><|start_header_id|>system<|end_header_id|>
|
65 |
+
|
66 |
+
{SYSTEM}<|eot_id|><|start_header_id|>user<|end_header_id|>
|
67 |
+
|
68 |
+
{INPUT}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
69 |
+
|
70 |
+
{OUTPUT}<|eot_id|>
|
71 |
+
```
|
72 |
+
|
73 |
+
|
74 |
+
An example:
|
75 |
+
|
76 |
+
|
77 |
+
```
|
78 |
+
<s><|start_header_id|>system<|end_header_id|>
|
79 |
+
|
80 |
+
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
|
81 |
+
|
82 |
+
Give me three tips for staying in shape.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
83 |
+
|
84 |
+
1. Eat a balanced diet and be sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.<|eot_id|>
|
85 |
+
```
|
86 |
|
87 |
### Training procedure
|
88 |
|