For the bandwidth limited ones <3

GGUFs for HanNayeoniee/LHK_DPO_v1

For a general representation of how quantization level influences output quality, check any model card from TheBloke, or see this table. Note those benchmarks were done on Llama models, and are probably not recent. Also I don't know how the MOE architecture influences those results but you got the idea!

So about the model, I just played with it 40min so far (Q5_K_M, ChatML template, TGWUI, ratherly short context size) but from what I saw, this model was really impressive ๐Ÿ‘ I should rather say quite astonishing!

[Edit: every quants are now tested and validated]

The coherence seems remarkably well maintained. To illustrate, see this sequence of interactions with the model.

HanNayeoniee/LHK_DPO_v1 was trained via Direct Preference Optimization(DPO) from TomGrc/FusionNet_7Bx2_MoE_14B.

Thanks for the community and sincere congrats to HanNayeoniee and TomGrc!

Downloads last month
65
GGUF
Model size
12.9B params
Architecture
llama

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.