juliehunter commited on
Commit
e89804f
·
verified ·
1 Parent(s): 77095fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -20,6 +20,7 @@ pipeline_tag: text-generation
20
  * [Training Details](#training-details)
21
  * [Training Data](#training-data)
22
  * [Preprocessing](#preprocessing)
 
23
  * [Training Procedure](#training-procedure)
24
  <!-- * [Evaluation](#evaluation) -->
25
  * [Testing the model](#testing-the-model)
@@ -56,7 +57,32 @@ Lucie-7B-Instruct is trained on the following datasets:
56
  * Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
57
  * Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
58
 
 
 
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ### Training procedure
62
 
 
20
  * [Training Details](#training-details)
21
  * [Training Data](#training-data)
22
  * [Preprocessing](#preprocessing)
23
+ * [Instruction template](#instruction-template)
24
  * [Training Procedure](#training-procedure)
25
  <!-- * [Evaluation](#evaluation) -->
26
  * [Testing the model](#testing-the-model)
 
57
  * Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
58
  * Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
59
 
60
+ ### Instruction template:
61
+ Lucie-7B-Instruct was trained on the chat template from Llama 3.1 with the sole difference that `<|begin_of_text|>` is replaced with `<s>`. The resulting template:
62
 
63
+ ```
64
+ <s><|start_header_id|>system<|end_header_id|>
65
+
66
+ {SYSTEM}<|eot_id|><|start_header_id|>user<|end_header_id|>
67
+
68
+ {INPUT}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
69
+
70
+ {OUTPUT}<|eot_id|>
71
+ ```
72
+
73
+
74
+ An example:
75
+
76
+
77
+ ```
78
+ <s><|start_header_id|>system<|end_header_id|>
79
+
80
+ You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
81
+
82
+ Give me three tips for staying in shape.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
83
+
84
+ 1. Eat a balanced diet and be sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.<|eot_id|>
85
+ ```
86
 
87
  ### Training procedure
88