Model won't stop generating [llama.cpp / koboldcpp]

#3
by DreamGenX - opened

Tracking this issue, which affects GGUF quants in most backends / UIs. The root cause is that most backends / UIs don't render sppecial tokens.

The way this manifests is that adding <|im_end|> as stop string does not work (as if the backend renders special tokens as empty) even when skip_special_tokens=false.

If you use Aphrodite, vLLM or latest koboldcpp, then things should work. Just make sure to:

  1. Set "skip special tokens" to false.
  2. Add <|im_end|> as a stopping string

Many tools have issues still, here are possible workarounds:

  • Use custom stop strings and add the following text names= and user. In SillyTavern, this is called Custom Stopping Strings
  • Use ./gguf-py/scripts/gguf-set-metadata.py /path/to/llama-3.gguf tokenizer.ggml.eos_token_id 128009 on the GGUF to set the <|im_end|> as the stop token id
  • Replace tokenizer and generation config files using this repo and redo the quantization: https://huggingface.co/dreamgen/opus-llama-3-tokens-alt

image.png

I also have this issue when using your unquantized model, that it never generates a stop token. I'm using Transformers in Textgen WebUI to load the model in bf16, so it's not just KoboldCPP or gguf problem.

Could this be related to the original Llama3 Instruct model also doing this?
When using the original Llama3 Instruct model, I change the two .json files to make it correctly produce the stop tokens as described in this reddit comment
For the finetune, this doesn't work though, it still doesn't produce a stop token.

Edit:
Oh nevermind, it does work when I change them to <|im_end|>
So "eos_token": "<|im_end|>", in the file tokenzer_config.json
and "eos_token": { "content": "<|im_end|>", in the file special_tokens_map.json
That fixes it for me

@JayhC Yeah, you should be setting <|im_end|>as a stop sequence / stop string / stop token. This works correctly with e.g. vllm, but not with koboldcpp, because koboldcpp does not render them correctly.
You should also be able to work around it by setting:

https://huggingface.co/dreamgen/opus-v1.2-llama-3-8b/blob/bc12aabe05ea277bf207a9b35ef819623961352f/tokenizer_config.json#L2055 to "<|im_end|>"
https://huggingface.co/dreamgen/opus-v1.2-llama-3-8b/blob/bc12aabe05ea277bf207a9b35ef819623961352f/special_tokens_map.json#L9 and here as well

I do not want this to be default though, because I often generate multiple turns at once in multi-character scenarios (and the turns are separate by "<|im_end|>")

The bug here is that for koboldcpp, even if I set the stop string to be <|im_end|>, it's ignored.

I have the same behaviour in ooba too, so it's not just kobold issue

They're discussing this issue in the llama3 support pr. It seems like there's something weird with how the model's tokenizer is handled by llamacpp: https://github.com/ggerganov/llama.cpp/pull/6745

DreamGen org

Yeah, somehow the special tokens are rendered as empty, which breaks stop strings: https://github.com/ggerganov/llama.cpp/issues/6770
It works in Aphro and vLLM, which use HF tokenizers for encoding/decoding token ids. It might also work in ooba in some of the hybrid HF setups (like llamacpp_HF or transformer) -- but I did not try.

Maybe?

https://huggingface.co/Lewdiculous/opus-v1.2-llama-3-8b-GGUF-IQ-Imatrix

Uploaded again with Opus' provided configs.

DreamGen org

Thank you @Lewdiculous <3

How did you fix it?

I think temporary workarounds that might work:

  • Change the <|im_end|> and <|im_start|> tokens manually before and mark them as not special (this might affect tokenization -- not sure): https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2066914808
  • Change the EOS token id to 128009 -- can be done manually before by modifying tokenizer config and special tokens config, or there's some GGUF tool to do it as well: ./gguf-py/scripts/gguf-set-metadata.py /path/to/llama-3.gguf tokenizer.ggml.eos_token_id 128009

@DreamGenX

Yeah!

It still needs to be evaluated, hence the experimental warnings.

llama-3-config ChaoticNeutrals/Llama3-Corrections

I fixed it apparently by adding "text names=", "user" as Custom Stop Strings in Oobabooga like described here.

DreamGen org

Based on this comment and similar to @Nitral-AI 's repo, I have created this patch and will test it out now https://huggingface.co/dreamgen/opus-llama-3-tokens-alt

Also, kobolcpp might have a fix! https://github.com/LostRuins/koboldcpp/releases/tag/v1.63

Added support for special tokens in stop_sequences. Thus, if you set <|eot_id|> as a stop sequence and it can be tokenized into a single token, it will just work and function like the EOS token, allowing multiple EOS-like tokens.

DreamGenX pinned discussion

I use LM Studio. The problem remains.

DreamGen org

Try setting custom stop strings as suggested at the top.

Sign up or log in to comment