File size: 2,399 Bytes
25e806f e7c1c98 e79a50f e7c1c98 e79a50f e7c1c98 09ad330 e7c1c98 09ad330 e7c1c98 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: apache-2.0
datasets:
- wmt/wmt14
language:
- de
- en
pipeline_tag: text2text-generation
---
This is a custom huggingface model port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.
## Usage:
```python
model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'
```
(remember the `trust_remote_code=True` because of custom modeling file)
## Training:
| Parameter | Value |
|----------------------|-------------------------------------------------------------------------------------------------|
| Dataset | WMT14-de-en |
| Translation Pairs | 4.5M (83M tokens total) |
| Epochs | 25 |
| Batch Size | 16 |
| Accumulation Batch | 8 |
| Effective Batch Size | 128 (16 * 8) |
| Training Script | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py) |
| Optimiser | Adam (learning rate = 0.0001) |
| Loss Type | Cross Entropy |
| Final Test Loss | 1.9 |
| GPU. | RTX 4070 (12GB) |
|