MaziyarPanahi/calme-2.3-llama3.1-70b · Feedback after several months of use.

4 days ago

Hey Maziyar,

This model is honestly one of the best balanced model I ever tried, with a quite unique flavor (I enjoyed Rhys 78b calme 2.4 also), and I'm usually not very pleased with Llama 3.x 70b finetunes. I'm using this model for several months already as one of my main models, and I wondered if you had plan to update the Calme finetune of Llama 3.1 70b, or make a Llama 3.3 70b version.

MaziyarPanahi

Owner 3 days ago

Hi @Nexesenex

First off, I’m really glad to hear this model has been helpful to you—thank you for sharing your experience! I’d love to run similar experiments with the new Llama 3.3 70B and update the previous ones as well. While I’m confident I can improve reasoning and math, I’m curious—are there any other areas you think I should focus on that are often missing from most 70B+ models? I’m eager to create more unique models, but my own use cases are somewhat limited. Your feedback and insights would be incredibly helpful in broadening my perspective.

Nexesenex

about 17 hours ago

Hello Maziyar.

First, before use, I benched your model, and I had a quasi-stock behavior on the casual ARC and PPL benchs, so it meant that your finetune didn't bear an overfit, a quite important parameter for me.
Then, I chatted with it, and I was basically delighted to have a model as smart as L3.1 70b, but more creative and quite less elusive in its answers, so reflecting more of the base model capabilities beside those you enhanced via your finetune. I found it somehow a bit inferior to Rhys78b calme 2.4 in the creative field (RP scenarii, engaged SFW conversation), but that's a subjective view.
It's hard for me to tell you what to improve, but I think the angle on "intelligent roleplay", where smut is not involved but engaging conversations without a positive bias are, is a good angle to work both the RP and the general conversational abilities of a serious model.
I'm for example making quite a lot of websearches to have quick news reviews put in perspective by the AI, and the overall bias of a model influences quite sensibly the way the gathered informations are synthetized and extrapolated. Negative Llama have made a step in that direction.
https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B
Also, you might want to submit your best models on this leaderboard, which is interesting and correspond to the use cases I mentioned among others : https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard