ubaada
/

original-transformer

Text2Text Generation

original_transformer

Model card Files Files and versions Community

ubaada commited on Nov 9, 2024

Commit

e7c1c98

·

verified ·

1 Parent(s): e79a50f

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -8,9 +8,9 @@ language:
 pipeline_tag: text2text-generation
 ---
-This is a huggingface port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.
-Usage:
 ```python
 model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
@@ -18,4 +18,17 @@ text = 'This is my cat'
 output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
 tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
 # Output: ' Das ist meine Katze.'
-```

 pipeline_tag: text2text-generation
 ---
+This is a custom huggingface model port of the [PyTorch implementation of the original transformer](https://github.com/ubaada/scratch-transformer) model from 2017 introduced in the paper "[Attention Is All You Need](https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf)". This is the 65M parameter base model version trained to do English-to-German translations.
+## Usage:
 ```python
 model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
 output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
 tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
 # Output: ' Das ist meine Katze.'
+```
+(remember the `trust_remote_code=True` because of custom modeling fiel)
+## Training:
+| Parameter            | Value                                                                                           |
+|----------------------|-------------------------------------------------------------------------------------------------|
+| Dataset              | WMT14-de-en                                                                                     |
+| Translation Pairs    | 4.5M (83M tokens total)                                                                         |
+| Epochs               | 25                                                                                              |
+| Batch Size           | 16                                                                                              |
+| Accumulation Batch   | 8                                                                                               |
+| Effective Batch Size | 128 (16 * 8)                                                                                    |
+| Training Script      | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py)             |
+| Optimiser            | Adam (learning rate = 0.0001)                                                                   |