Update README.md
Browse files
README.md
CHANGED
@@ -8,9 +8,9 @@ language:
|
|
8 |
pipeline_tag: text2text-generation
|
9 |
---
|
10 |
|
11 |
-
This is a huggingface port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.
|
12 |
|
13 |
-
Usage:
|
14 |
```python
|
15 |
model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
|
16 |
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
|
@@ -18,4 +18,17 @@ text = 'This is my cat'
|
|
18 |
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
|
19 |
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
20 |
# Output: ' Das ist meine Katze.'
|
21 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
pipeline_tag: text2text-generation
|
9 |
---
|
10 |
|
11 |
+
This is a custom huggingface model port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.
|
12 |
|
13 |
+
## Usage:
|
14 |
```python
|
15 |
model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
|
16 |
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
|
|
|
18 |
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
|
19 |
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
|
20 |
# Output: ' Das ist meine Katze.'
|
21 |
+
```
|
22 |
+
(remember the `trust_remote_code=True` because of custom modeling fiel)
|
23 |
+
## Training:
|
24 |
+
| Parameter | Value |
|
25 |
+
|----------------------|-------------------------------------------------------------------------------------------------|
|
26 |
+
| Dataset | WMT14-de-en |
|
27 |
+
| Translation Pairs | 4.5M (83M tokens total) |
|
28 |
+
| Epochs | 25 |
|
29 |
+
| Batch Size | 16 |
|
30 |
+
| Accumulation Batch | 8 |
|
31 |
+
| Effective Batch Size | 128 (16 * 8) |
|
32 |
+
| Training Script | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py) |
|
33 |
+
| Optimiser | Adam (learning rate = 0.0001) |
|
34 |
+
|