Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,23 @@ This is a monolingual German language model trained using the [CLP-Transfer](htt
|
|
17 |
|
18 |
You can try out the model at [European Language Grid](https://live.european-language-grid.eu/catalogue/tool-service/20825/try%20out/).
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
## Training dataset
|
21 |
|
22 |
- ca. 50B German tokens
|
@@ -37,4 +54,27 @@ You can try out the model at [European Language Grid](https://live.european-lang
|
|
37 |
|
38 |
## Evaluation
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
You can try out the model at [European Language Grid](https://live.european-language-grid.eu/catalogue/tool-service/20825/try%20out/).
|
19 |
|
20 |
+
### How to use
|
21 |
+
|
22 |
+
You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
|
23 |
+
set a seed for reproducibility:
|
24 |
+
|
25 |
+
```python
|
26 |
+
>>> from transformers import pipeline, set_seed
|
27 |
+
>>> generator = pipeline('text-generation', model='malteos/bloom-6b4-clp-german')
|
28 |
+
>>> set_seed(42)
|
29 |
+
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=3)
|
30 |
+
|
31 |
+
[{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
|
32 |
+
{'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
|
33 |
+
{'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},]
|
34 |
+
```
|
35 |
+
|
36 |
+
|
37 |
## Training dataset
|
38 |
|
39 |
- ca. 50B German tokens
|
|
|
54 |
|
55 |
## Evaluation
|
56 |
|
57 |
+
Validation PPL compared to from-scratch training (the lower the better):
|
58 |
+
|
59 |
+
<img alt="Tokens vs PPL" src="https://github.com/malteos/clp-transfer/raw/main/german-6b-ppl.png">
|
60 |
+
|
61 |
+
Additional evaluations can be found in [our paper](https://arxiv.org/abs/2301.09626).
|
62 |
+
|
63 |
+
## How to cite
|
64 |
+
|
65 |
+
If you are using our code or models, please cite [our paper](https://arxiv.org/abs/2301.09626):
|
66 |
+
|
67 |
+
```bibtex
|
68 |
+
@misc{Ostendorff2023clp,
|
69 |
+
doi = {10.48550/ARXIV.2301.09626},
|
70 |
+
author = {Ostendorff, Malte and Rehm, Georg},
|
71 |
+
title = {Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning},
|
72 |
+
publisher = {arXiv},
|
73 |
+
year = {2023}
|
74 |
+
}
|
75 |
+
|
76 |
+
```
|
77 |
+
|
78 |
+
## License
|
79 |
+
|
80 |
+
[BigScience BLOOM RAIL 1.0](https://bigscience.huggingface.co/blog/the-bigscience-rail-license)
|