qwerrwe / src /axolotl /prompt_strategies /alpaca_instruct.py
winglian's picture
pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too
2809f3f
raw
history blame
299 Bytes
from axolotl.prompt_tokenizers import AlpacaPromptTokenizingStrategy
from axolotl.prompters import AlpacaPrompter, PromptStyle
def load(tokenizer, cfg):
return AlpacaPromptTokenizingStrategy(
AlpacaPrompter(PromptStyle.instruct), tokenizer, cfg.train_on_inputs, cfg.sequence_len
)