<s> token
#65
by
Muennighoff
- opened
The <s>
token (bos token) is never used during pre-training right? (@stas maybe?)
Afaik we only use </s>
(eos token) sparingly after documents
Want to try using <s>
as a sep token for fine-tuning cc
@TimeRobber
Never is a strong word because if the pretraining dataset holds some <s>
occurences it's going to be considered as <bos>
but I'd say there shouldn't be many tokens in pretraining dataset that match that.