GGUF q4_k_m quantization of below.
QLORA on SpicyBoros 2.2 Llama 7B v2 using synthetic Q&A Dataset. a little bit under one epoch, since my GTX1080 decided to OOM a tiny bit before training end and I am using checkpoint made at 450/465 step. I've been running into a lot of issues, so I am happy to even get that far, most of my QLORA attempts had loss go to 0 and deepspeed was forcibly closing training after roughly 0.3 epoch.
My intention with this QLORA is mostly to try to train something usable and cool locally on normal desktop without going to runpod. I tried training q4_0 quant with cpu-lora in llama.cpp (https://rentry.org/cpu-lora) but it's been a miss, it's about 20x slower on 11400f than on poorman's GTX 1080.
The model can be used to ask questions about basic economic concepts, responses will have a viewpoint similar to the one expressed by Thomas Sowell in his book Basic Economics.
Prompt format:
Reader: {prompt} '\nThomas:\n' {response}
I was training on the sequence length of 1024, but I conversed with the model up to 4000 tokens and it was still coherent and in character. Even though the training date I used is only single turn, model has no issue with multi-turn conversations. Much of that is thanks to the fine-tuning done earlier by amazing Jon Durbin.
Known issues:
tokenization didn't happen as I expected, so you can see a lot of /n, ' and ' characters in places where you shouldn't really see them. For example, most responses, if using the right prompt format, will have character ' at the end of response
- Downloads last month
- 1