metadata
license: bigscience-bloom-rail-1.0
datasets:
- oscar
language:
- de
library_name: transformers
pipeline_tag: text-generation
BLOOM-CLP German (6.4B parameters)
This is a monolingual German language model trained using the CLP-Transfer method based on BLOOM-7b1.
You can try out the model at European Language Grid.
Training dataset
- ca. 50B German tokens
- Web-crawled content from the German subset OSCAR v22.01 (excluding content tagged as header, footer, noisy, or adult)
- Web-crawled content from the GC4 Corpus (including only the head and middle parts)
- German court decisions from Open Legal Data
Code
Hardware
- 32xA100-40GB GPUs
Evaluation
TBA (see paper)