File size: 2,929 Bytes

25e806f
 
 
 
 
 
 
 
 
 
617105b
 
 
 
84d0908
e7c1c98
e79a50f
e7c1c98
e79a50f
 
 
 
 
 
 
e7c1c98
09ad330
e7c1c98
 
 
 
5dbd5ce
09ea48b
e7c1c98
 
 
 
 
09ad330
09ea48b
09ad330
 
a024350
09ea48b
 
 
 
 
 
a024350
09ea48b
 
 
e7c1c98

---
license: apache-2.0
datasets:
- wmt/wmt14
language:
- de
- en
pipeline_tag: text2text-generation
---

<p align="center">
  <img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F62a7d1e152aa8695f9209345%2FP-TlY6ia0gLJeJxBA_04g.gif%26quot%3B%3C%2Fspan%3E />
</p>
<hr>

This is a custom huggingface model port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.

## Usage:
```python
model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'
```
(remember the `trust_remote_code=True` because of custom modeling file)
## Training:
| Parameter            | Value                                                                                           |
|----------------------|-------------------------------------------------------------------------------------------------|
| Dataset              | WMT14-de-en                                                                                     |
| Translation Pairs    | 4.5M (135M tokens total)                                                                         |
| Epochs               | 24                                                                                              |
| Batch Size           | 16                                                                                              |
| Accumulation Batch   | 8                                                                                               |
| Effective Batch Size | 128 (16 * 8)                                                                                    |
| Training Script      | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py)             |
| Optimiser            | Adam (learning rate = 0.0001)                                                                   |
| Loss Type            | Cross Entropy |
| Final Test Loss      | 1.87 |
| GPU.                 | RTX 4070 (12GB) |

<p align="center" style="width:500px;max-width:100%;">
  <img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F62a7d1e152aa8695f9209345%2F0p4eEHiYFaeaibjk_Rf1y.png%26quot%3B%3C%2Fspan%3E />
</p>


## Results

<p align="center" style="width:500px;max-width:100%;">
  <img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F62a7d1e152aa8695f9209345%2FGip1Ox-M1_z3qdafGGh3-.png%26quot%3B%3C%2Fspan%3E />
</p>