This one is definitely better
Generally all testing was done at ~0.7-0.8 temp, 0.05minP. Going over temp 1 definitely makes it very dumb. I used the Q4M quant.
Meth
Positive:
No repetition issues, fairly creative, good even at assistant tasks (my domain knowledge assistant do still have a persona def). Retains most of the Wizard 8x22b smarts, perhaps a bit dumber. It does exceptionally well continuing long context RP (~16k tokens) - doesn't forget card details, lore book injections, history, etc. Creative moist, which wizard lacked.
Negative:
Slop does still shine through, it's no secret the 8x22b instruct is overbaked.
Starting from a fresh RP with no examples, it tends to do:
I blush and look down at my hands <snip>
I look up <snip>
I bite my lower lip <snip>
I quickly look away <snip>
I look back <snip>
I wouldn't use it without sample messages or unless I'm continuing an existing RP.
Mistral:
Not sure why but it's a complete disaster. I get constant repetition with this format in RP and slop + nonsense purple prose overload (which contradicts the card definition). e.g.
I look up at him, my eyes shining with unshed tears. I can feel my heart racing in my chest, the beat echoing in my ears. I know I'm being foolish, know that I'm risking everything by asking this of him. But I can't help it. I need him, need to feel his presence in my life, even if it's just through the barrier of my skin.
I take a deep breath, steeling myself for his response. I know it's a lot to ask, know that I'm putting him in an impossible position. But I can't help it. I need him, need to feel his arms around me, even if it's just for a moment.
I wait, my heart pounding in my chest. I know I'm being selfish, know that I'm risking everything by asking this of him. But I can't help it. I need him, need to feel his presence in my life, even if it's just through the barrier of my skin.
I look up at him, my eyes wide and pleading. I know I'm being foolish, know that I'm risking everything by asking this of him. But I can't help it. I need him, need to feel his arms around me, even if it's just for a moment.
(This is a re-roll of the 'fresh RP' / no example messages from above, but in Meth it actually had meaningful content with dialogue and variety. Mistral also seems to fail at continuing long context RP).
I also tried Vicuna (the recommended wizard format) and, while not broken, it did seem much dumber and I gave up on it pretty quick so not much to report on that front.
Anecdotally, it seems slightly less creative than the b
version but that's more of a vibe. It's definitely usable with Meth + 0.7-0.8 temp. One thing I really like is that it's still capable of executing proper (and useful) CoT blocks and adhering to their results - something which tunes can struggle with (e.g. Hanami).
I typically use Wizard in a langchain pipeline doing world-sim updates for some RPs because it's cheap, fast, and decently smart. I use either Claude models for agent response writing and editing in this pipeline or Behemoth / Monstral (which I actually prefer, but the speed kills it as a viable agent model in those slots).
tl;dr I think it's major progress and I'd consider it a usable model (with Meth). I will probably test a bit more though I wouldn't replace Monstral with this yet.
tl;dr I think it's major progress and I'd consider it a usable model (with Meth). I will probably test a bit more though I wouldn't replace Monstral with this yet.
So if performance was no issue you prefer Monstral over the other 123Bs?
The intention of this tune was to create a faster alternative to Behemoth; one that can run better on high capacity, low speed setups like a Mac.
Unfortunately, I won't be exploring 8x22B any further. I'm going to release this officially as-is and won't look back.
Thanks for testing, really appreciate it!
So if performance was no issue you prefer Monstral over the other 123Bs?
Monstral is my favorite 123b currently, definitely. I'm not sure it's a big enough difference between it and Behemoth that I'd be able to tell them apart in a blind test. Maybe it's placebo, but it does seem to have more variety in its re-rolls. I'm 100% confident I could tell apart Behemoth and Magnum v4 though. Red Squadron has much, much faster inference and I was hoping to use it as an alternative but it's currently too much of a downgrade in terms of prose.
Thanks for testing, really appreciate it!
My pleasure, love your work.
So if performance was no issue you prefer Monstral over the other 123Bs?
Monstral is my favorite 123b currently, definitely. I'm not sure it's a big enough difference between it and Behemoth that I'd be able to tell them apart in a blind test. Maybe it's placebo, but it does seem to have more variety in its re-rolls. I'm 100% confident I could tell apart Behemoth and Magnum v4 though. Red Squadron has much, much faster inference and I was hoping to use it as an alternative but it's currently too much of a downgrade in terms of prose.
Okay, Behemoth is my 'daily driver'. I was curious because I have not tried Monstral. I thought I knew all the Largestral tunes but I guess I was wrong.
Okay, Behemoth is my 'daily driver'. I was curious because I have not tried Monstral. I thought I knew all the Largestral tunes but I guess I was wrong.
Yeah I’d say it’s definitely worth trying out. It’s marsupial’s merge of behemoth and magnum v4. I really don’t like magnum at all but the resulting merge feels more like a more varied behemoth without the slop of magnum.
Okay, Behemoth is my 'daily driver'. I was curious because I have not tried Monstral. I thought I knew all the Largestral tunes but I guess I was wrong.
Yeah I’d say it’s definitely worth trying out. It’s marsupial’s merge of behemoth and magnum v4. I really don’t like magnum at all but the resulting merge feels more like a more varied behemoth without the slop of magnum.
Interesting! I find Magnum to be like Krispy Kreme doughnuts. The first I think it's amazing. By the third I want to throw up.