FIM-Tokens not marked special
#4
by
ruediste
- opened
Hi
I debugged the tokenizer stack for a few hours until I discovered that the FIM tokens are not marked special (<|fim_prefix|>,<|fim_middle|>,<|fim_suffix|>, etc). Any reason for this? Below an excerpt from tokenizer.json
{
"id": 151660,
"content": "<|fim_middle|>",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": false,
"special": false
},