Jeronymous
commited on
Commit
·
82c9573
1
Parent(s):
1d10e82
Add links to dataset and code
Browse files
README.md
CHANGED
@@ -142,7 +142,7 @@ prompt = """\
|
|
142 |
|
143 |
### Training Data
|
144 |
|
145 |
-
The training dataset
|
146 |
|
147 |
Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
|
148 |
|
@@ -151,7 +151,7 @@ Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution
|
|
151 |
| Parliamentary Proceedings | 135M | 54% | Assemblée Nationale |
|
152 |
| Theatre | 2.7M | 23% | Théâtre Gratuit |
|
153 |
| Meetings | 1.0M | 16.6% | SUMM-RE, LinTO |
|
154 |
-
| Debates | 326k | 5.4% |
|
155 |
| Presentations, Conversations | 58k | 1% | LinTO |
|
156 |
|
157 |
Training data was augmented with the following techniques:
|
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
|
|
165 |
|
166 |
### Training Procedure
|
167 |
|
168 |
-
The training code
|
169 |
|
170 |
Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
|
171 |
See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.
|
|
|
142 |
|
143 |
### Training Data
|
144 |
|
145 |
+
The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
|
146 |
|
147 |
Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
|
148 |
|
|
|
151 |
| Parliamentary Proceedings | 135M | 54% | Assemblée Nationale |
|
152 |
| Theatre | 2.7M | 23% | Théâtre Gratuit |
|
153 |
| Meetings | 1.0M | 16.6% | SUMM-RE, LinTO |
|
154 |
+
| Debates | 326k | 5.4% | FREDSum |
|
155 |
| Presentations, Conversations | 58k | 1% | LinTO |
|
156 |
|
157 |
Training data was augmented with the following techniques:
|
|
|
165 |
|
166 |
### Training Procedure
|
167 |
|
168 |
+
The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
|
169 |
|
170 |
Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
|
171 |
See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.
|