OpenLLM-France
/

Claire-7B-Apache-0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Jeronymous commited on Nov 29, 2023

Commit

82c9573

·

1 Parent(s): 1d10e82

Add links to dataset and code

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -142,7 +142,7 @@ prompt = """\
 ### Training Data
-The training dataset will be made available soon.
 Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
@@ -151,7 +151,7 @@ Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution
 | Parliamentary Proceedings               | 135M       | 54%                          | Assemblée Nationale                       |
 | Theatre                                 | 2.7M       | 23%                          | Théâtre Gratuit                           |
 | Meetings                                |   1.0M     | 16.6%                        | SUMM-RE, LinTO                            |
-| Debates                                 |   326k     |  5.4%                        | FreDSum                                   |
 | Presentations, Conversations            |    58k     |  1%                          | LinTO                                     |
 Training data was augmented with the following techniques:
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
 ### Training Procedure
-The training code will be made available soon.
 Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
 See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.

 ### Training Data
+The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
 Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
 | Parliamentary Proceedings               | 135M       | 54%                          | Assemblée Nationale                       |
 | Theatre                                 | 2.7M       | 23%                          | Théâtre Gratuit                           |
 | Meetings                                |   1.0M     | 16.6%                        | SUMM-RE, LinTO                            |
+| Debates                                 |   326k     |  5.4%                        | FREDSum                                   |
 | Presentations, Conversations            |    58k     |  1%                          | LinTO                                     |
 Training data was augmented with the following techniques:
 ### Training Procedure
+The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
 Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
 See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.