OpenLLM-France
/

Lucie-7B-Instruct

Text Generation

Inference Endpoints

Model card Files Files and versions Community

juliehunter commited on about 11 hours ago

Commit

e89804f

·

verified ·

1 Parent(s): 77095fa

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -20,6 +20,7 @@ pipeline_tag: text-generation
 * [Training Details](#training-details)
   * [Training Data](#training-data)
   * [Preprocessing](#preprocessing)
   * [Training Procedure](#training-procedure)
 <!-- * [Evaluation](#evaluation) -->
 * [Testing the model](#testing-the-model)
@@ -56,7 +57,32 @@ Lucie-7B-Instruct is trained on the following datasets:
 * Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
 * Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
 ### Training procedure

 * [Training Details](#training-details)
   * [Training Data](#training-data)
   * [Preprocessing](#preprocessing)
+  * [Instruction template](#instruction-template)
   * [Training Procedure](#training-procedure)
 <!-- * [Evaluation](#evaluation) -->
 * [Testing the model](#testing-the-model)
 * Filtering by language: Magpie-Gemma and Wildchat were filtered to keep only English and French samples, respectively.
 * Filtering by keyword: Examples containing assistant responses were filtered out from the four synthetic datasets if the responses contained a keyword from the list [filter_strings](https://github.com/OpenLLM-France/Lucie-Training/blob/98792a1a9015dcf613ff951b1ce6145ca8ecb174/tokenization/data.py#L2012). This filter is designed to remove examples in which the assistant is presented as model other than Lucie (e.g., ChatGPT, Gemma, Llama, ...).
+### Instruction template:
+Lucie-7B-Instruct was trained on the chat template from Llama 3.1 with the sole difference that `<|begin_of_text|>` is replaced with `<s>`. The resulting template:
+```
+<s><|start_header_id|>system<|end_header_id|>
+{SYSTEM}<|eot_id|><|start_header_id|>user<|end_header_id|>
+{INPUT}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+{OUTPUT}<|eot_id|>
+```
+An example:
+```
+<s><|start_header_id|>system<|end_header_id|>
+You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
+Give me three tips for staying in shape.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+1. Eat a balanced diet and be sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.<|eot_id|>
+```
 ### Training procedure