File size: 1,988 Bytes
b24ac4b fbf337f b24ac4b fbf337f d6af853 fbf337f 081f106 fbf337f 081f106 fbf337f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
---
# Model Card for Mistral7B-v0.1-coco-caption-de
This model is a fine-tuned model of the Mistral7B-v0.1 completion model and meant to produce german COCO like captions.
The [coco-karpathy-opus-de dataset](https://huggingface.co/datasets/Jotschi/coco-karpathy-opus-de) was used to tune the model for german image caption generation.
## Model Details
### Prompt format
The completion model is trained with the prompt prefix `Bildbeschreibung: `
Examples:
```xml
>>> Bildbeschreibung:
2 Hunde sitzen auf einer Bank neben einer Pflanze
>>> Bildbeschreibung: Wasser
fall und Felsen vor dem Gebäude mit Blick auf den Fluss.
>>> Bildbeschreibung: Ein grünes Auto mit roten
Reflektoren parkte auf dem Parkplatz.
```
### Model Description
- **Developed by:** [Jotschi](https://huggingface.co/Jotschi)
- **License:** [Apache License](https://www.apache.org/licenses/LICENSE-2.0)
- **Finetuned from model:** [Mistral7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
## Uses
The model is meant to be used in conjunction with a [BLIP2](https://huggingface.co/docs/transformers/model_doc/blip-2) Q-Former to enable image captioning, visual question answering (VQA) and chat-like conversations.
## Training Details
The preliminary [training script](https://github.com/Jotschi/lavis-experiments/tree/master/mistral-deepspeed) uses PEFT and DeepSpeed to execute the traininng.
### Training Data
* [coco-karpathy-opus-de dataset](https://huggingface.co/datasets/Jotschi/coco-karpathy-opus-de)
### Training Procedure
The model was trained using PEFT 4Bit Q-LoRA with the following parameters:
* rank: 256
* alpha: 16
* steps: 8500
* bf16: True
* lr_scheduler_type: cosine
* warmup_ratio: 0.03
* gradient accumulation steps: 2
* batch size: 4
* Input sequence length: 512
* Learning Rate: 2.0e-5
#### Postprocessing
The merged model was saved using `PeftModel` API.
### Framework versions
- PEFT 0.8.2
|