malteos's picture
Create README.md
c0c608e
|
raw
history blame
1.1 kB
metadata
license: bigscience-bloom-rail-1.0
datasets:
  - oscar
language:
  - de
library_name: transformers
pipeline_tag: text-generation

BLOOM-CLP German (6.4B parameters)

This is a monolingual German language model trained using the CLP-Transfer method based on BLOOM-7b1.

You can try out the model at European Language Grid.

Training dataset

  • ca. 50B German tokens
  • Web-crawled content from the German subset OSCAR v22.01 (excluding content tagged as header, footer, noisy, or adult)
  • Web-crawled content from the GC4 Corpus (including only the head and middle parts)
  • German court decisions from Open Legal Data

Code

Hardware

  • 32xA100-40GB GPUs

Evaluation

TBA (see paper)