ubaada's picture
Update README.md
09ad330 verified
|
raw
history blame
2.4 kB
metadata
license: apache-2.0
datasets:
  - wmt/wmt14
language:
  - de
  - en
pipeline_tag: text2text-generation

This is a custom huggingface model port of the PyTorch implementation of the original transformer model from 2017 introduced in the paper "Attention Is All You Need". This is the 65M parameter base model version trained to do English-to-German translations.

Usage:

model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'

(remember the trust_remote_code=True because of custom modeling file)

Training:

Parameter Value
Dataset WMT14-de-en
Translation Pairs 4.5M (83M tokens total)
Epochs 25
Batch Size 16
Accumulation Batch 8
Effective Batch Size 128 (16 * 8)
Training Script train.py
Optimiser Adam (learning rate = 0.0001)
Loss Type Cross Entropy
Final Test Loss 1.9
GPU. RTX 4070 (12GB)