bartowski/Llama-3_1-Nemotron-51B-Instruct-GGUF · Model will need to be requantized, rope issues for long context

5 days ago

There are issues with the GGUF generation for this model type that means long context prompts will turn into garbage output.

See the discussion here:
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/discussions/2

To fix, you need this PR, yet to be merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/11008. Users don't need to update their llama.cpp, just the GGUF needs to be modified or regenerated with the right settings. Also this fixes mistakes in the vocabulary settings of the GGUF.

bartowski

Owner 5 days ago

Ah beautiful, appreciate the heads up :)

KeyboardMasher

4 days ago

FYI: That pull request has been merged.

bartowski

Owner 3 days ago

should be fixed @KeyboardMasher @treehugg3 :)