|
--- |
|
language: |
|
- en |
|
- de |
|
license: apache-2.0 |
|
library_name: transformers |
|
datasets: |
|
- FreedomIntelligence/sharegpt-deutsch |
|
- mayflowergmbh/oasst_de |
|
- mayflowergmbh/dolly_15k_de |
|
- mayflowergmbh/openschnabeltier_de |
|
- mayflowergmbh/ultrachat_de |
|
- WizardLM/WizardLM_evol_instruct_V2_196k |
|
- mayflowergmbh/evol_instruct_de |
|
- mayflowergmbh/alpaca-gpt4_de |
|
- mayflowergmbh/dolphin_de |
|
- mayflowergmbh/airoboros_de |
|
pipeline-tag: text-generation |
|
model-index: |
|
- name: ende-chat-0.0.5 |
|
results: [] |
|
--- |
|
|
|
|
|
# Model Card for EnDe-chat-0.0.5 |
|
|
|
Preliminary LoRA finetune of Mistral-7B for German and English quality text. |
|
|
|
This version has an **extended tokenizer**, to make the model able to handle longer input. |
|
|
|
This is an experiment to improve the German capabilities of Mistral with |
|
continued finetuning. The finetuning also includes English data, in order to |
|
retain the English capabilities, to allow the model to be used for translation |
|
and for answering German questions on English documents and vice versa. |
|
|
|
Unfortunately, the compute available for this experiment (2xV100) was not at |
|
all sufficient for the amount of training data we would have liked to include. |
|
|
|
After continued pretraining, this model has received instruction finetuning. |
|
|
|
# Table of Contents |
|
|
|
- [Model Details](#model-details) |
|
- [Model Description](#model-description) |
|
- [Uses](#uses) |
|
- [Out-of-Scope Use](#out-of-scope-use) |
|
- [Bias, Risks, and Limitations](#bias-risks-and-limitations) |
|
- [Recommendations](#recommendations) |
|
- [Training Details](#training-details) |
|
- [Training Data](#training-data) |
|
- [Training Procedure](#training-procedure) |
|
- [Evaluation](#evaluation) |
|
- [Examples](#examples) |
|
|
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
LoRA finetune of Mistral-7B for German and English quality text. |
|
|
|
- **Developed by:** Erich Schubert |
|
- **Model type:** Language model |
|
- **Language(s) (NLP):** deu, eng |
|
- **License:** apache-2.0 |
|
- **Parent Model:** mistralai/Mistral-7B-v0.1 |
|
- **Resources for more information:** n/a |
|
|
|
|
|
# Uses |
|
|
|
Model finetuned for chat in German and English. |
|
|
|
## Out-of-Scope Use |
|
|
|
The model has not received alignment or instruction finetuning, this is intended as a chat foundation model. |
|
|
|
|
|
# Bias, Risks, and Limitations |
|
|
|
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. |
|
|
|
|
|
## Recommendations |
|
|
|
Further finetuning necessary! |
|
|
|
|
|
# Training Details |
|
|
|
## Training Data |
|
|
|
Pretrained on proprietary text collected from the internet, with a focus on |
|
quality German and English text. |
|
|
|
Typical benchmarking data should not be present in this data set. |
|
|
|
This is no longer as clear for the finetuning data sets, but the |
|
amount of data and compute for instruction tuning was much less. |
|
|
|
|
|
## Training Procedure |
|
|
|
Initial LoRA finetuning with LLaMA-Factory using a mixture of **English and |
|
German** data, with a focus on data quality. |
|
|
|
Unfortunately, I could use 100x as much GPU power as I had available for this |
|
experiment, and had to heavily subsample the data. As is, this is largely a |
|
proof of concept to see if we can improve model quality with better data. |
|
|
|
This version then received basic chat/instruction training with |
|
``` |
|
--stage sft \ |
|
--model_name_or_path ende-0.0.5c3 \ |
|
--finetuning_type lora \ |
|
--template default \ |
|
--dataset_dir data \ |
|
--dataset sharegpt-deutsch,oasst_de,dolly_15k_de,openschnabeltier_de,ultrachat_de,evol_instruct,evol_instruct_de,alpaca-gpt4_de,dolphin_de,airoboros_de \ |
|
--cutoff_len 1024 \ |
|
--learning_rate 5e-05 \ |
|
--num_train_epochs 1.0 \ |
|
--per_device_train_batch_size 4 \ |
|
--gradient_accumulation_steps 8 \ |
|
--lr_scheduler_type cosine \ |
|
--neftune_noise_alpha 0 \ |
|
--lora_target all \ |
|
--lora_rank 8 \ |
|
--lora_dropout 0 \ |
|
--fp16 True \ |
|
``` |
|
|
|
Unfortunately, **most of this fine-tuning data is just automatically |
|
translated from English**. I do not think this leads to particularly |
|
high-quality data. |
|
|
|
# Evaluation |
|
|
|
Not fully evaluated, as it has not been completely trained. |
|
|
|
Also, I believe that our **benchmarks tend to be misleading**. |
|
In particular the huggingface leaderboard is flooded with overfitted models |
|
with little to no value. Real-world performance may be task specific and |
|
needs to be evaluated carefully on a case basis. I hope some will find |
|
this model to be useful! |
|
|
|
**You are welcome to contribute evaluation scores!** |
|
|
|
|