Model will need to be requantized, rope issues for long context

#2
by treehugg3 - opened

There are issues with the GGUF generation for this model type that means long context prompts will turn into garbage output.

See the discussion here:
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/discussions/2

To fix, you need this PR, yet to be merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/11008. Users don't need to update their llama.cpp, just the GGUF needs to be modified or regenerated with the right settings. Also this fixes mistakes in the vocabulary settings of the GGUF.

Ah beautiful, appreciate the heads up :)

FYI: That pull request has been merged.

should be fixed @KeyboardMasher @treehugg3 :)

Sign up or log in to comment