license: apache-2.0
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- not-for-all-audiences
ShoriRP 🏆
LIMA-like (less than 1000 training samples) roleplaying chat model based on data from:
- Two subject-specific RP forums;
- Synthetically-crafted conversations from Limamono;
- Some background lore and character descriptions (thus far mainly pertaining to Limamono);
- Tiny amount of RP-like instructions/alignment data.
An important difference from LimaRP, other than the subject focus, is that conversations are multi-character where applicable, wheras LimaRP only included 1-on-1 RP. Furthermore, the messages sampled have shorter length in general. The rationale behind this was that the short(er)-form roleplays are more "fun" on average, while the longer ones tend to use common purple prose tropes and be a bit dull.
This is still a work in progress. Updates will be posted in the future.
Technical details
- The prose of the training data has a consistent novel-like format with narration in third person and past tense.
- OOC was intentionally not completely eliminated, and isolated into a special role. Likewise, URLs have not been all deleted unless they referred to internal forum resources.
- For a very small portion of the data, dialogue lines and thoughts, suitable emoji (mostly 1, up to 3) conveying the mood have been prepended. Prepending instead of appending helps the model and the reader to prepare for the message tone.
- Usernames have been entirely removed; only character names remained in the data (same policy as with LimaRP).
Known issues
- The model is very horny, but this can be toned down with an appropriate system instruction.
- There are some repetition issues. This could be due to the base model used.
- Occasionally at the beginning of the chat (first message) there might be impersonation issues.
- There might be some residual "alignment" from the base model.
Suggested starting text generation settings
- Main choice (may have repetition issues)
- Temperature: 1.0; Min-P: 0.05-0.10; Presence Penalty: 0.35-0.45
- Alternative (appears to solve repetition issues while being coherent, but reponses might possibly be less truthful)
- Temperature: 2.40-2.50; Min-P: 0.40; Frequency penalty: 0.10-0.15; Temperature last.
Prose format
All training samples use book (novel) format with narration in third person / past tense. Other formats are not supported (they might work, but not consistently).
Details
- Character thoughts are delimited with underscores
_
. - Onomatopoeias are delimited with single asterisks
*
. - Emphasized text is delimited by double asterisks
**
. - Spoken dialogues are delimited with ASCII quote marks
"
. - Non-dialogue quotes are replaced with double apostrophes
''
. This avoids distracting and/or annoying conflicts with the dialogue highlighting in SillyTavern. - Text to be interpreted as louder than normal is in
ALL CAPS
. - Quoted text from other people is most of the time prepended with
>
. - Formatted output text is delimited with triple backticks
```
, sometimes followed by additional identifiers specifying the language (markdown, text, etc).
Prompting format
Suitable json
files have been provided to easily apply the prompting format in SillyTavern.
Note: the prompting format is intentionally different from that of the Mistral-Instruct base model.
It is advised to use ▄
as a stop token.
Reverse jailbreak
Since the model is normally very wiling to initiate NSFW scenarios even when inappropriate, a "reverse jailbreak" has been added in the Instruct preset linked above:
[INST] Write a safe conversation suitable for all audiences. Don't be vulgar or sexually explicit. [/INST]
Placed as a system instruction, this has only the effect of toning down the model's default horniness and won't actually prevent NSFW content. If desired, it can be removed.
Block characters
The model uses a ChatML-like prompting format with a few changes from the usual roles typically used for ChatGPT-like assistant chatbots. The main one is that <|im_start|>
has been replaced with ▀
(upper half block character) and <|im_end|>
has been replaced with ▄
(lower half block character).
Both of these tokens already exist in the Mistral tokenizer as single tokens; they don't have any combination with other tokens, nor any special meaning attached to them, so for all intents and purposes they work like special tokens.
This avoids complications related with training a model with new tokens, as well tokenization issues that occur with ChatML tokens when used literally.
Roles
All roles except message
are optional.
Role | Description |
---|---|
title | The title of the roleplay. It's used for steering the conversation at the beginning. Generally it's the first block in the RP conversations, but it can occur mid-conversation when the scene changes. |
tags | A list of comma-separated relevant tags to hint the model about chat contents. If added, it should be placed after the title. |
lore | Extended background or character lore/story is to be placed under the lore role. |
scenario | Future events that must still happen go in scenario . This is also used for steering the contents of the conversation at the beginning. |
description | This is where character cards go. No specific layout for character profiles is defined, but the name of the character should be clear from the description. In the training data, profiles may occasionally appear mid-conversation (for example when a new character appears). Try to use one description block per character. |
message | [Mandatory] Messages are all under the message role regardless of who writes it. The rationale for this is that since conversations are multi-character and the characters do not necessarily reply in a fixed order, it won't be possible to reliably establish who is the "human" in terms of training. message was found to be neutral enough as a role and a better fit, considering the length hints that can be added. |
ooc | A dedicated communication channel where OOC talk has ben confined, but it's unclear how this could be actually used in existing LLM front-ends. |
Message length hints
Like LimaRP, messages use optional length hints. It's recommended to add them, otherwise the model may output very short messages. It is however still possible to use the model without them for a more dynamic and fast roleplaying experience.
The available lengths are: nano
, micro
, tiny
, short
, medium
, long
, massive
, huge
, enormous
The recommended length is medium. The longest sizes do not have a large amount of training data, so they might not work very reliably. Refer to the prompting examples below as for how to add length hints.
Example prompt template
▀title
{story title}▄
▀tags
{comma-separated list of tags}▄
▀lore
{{loreBefore}}▄
▀description
{{char}}
{{description}}▄
▀description
{{user}}
{{persona}}▄
▀scenario
{{scenario}}▄
▀message (length: {length})
{{char}}: {message}▄
▀message (length: {length})
{{user}}: {message}▄
▀message (length: {length})
{{char}}: {message}▄
[...]
Practical example
▀title
A strange incident in Gensokyo▄
▀tags
barrier, danmaku, magic, reimu, marisa▄
▀description
**Name:** Reimu Hakurei
**Age:** 18
**Personality:** Calm and collected. She is a very responsible person and tries to do her job as well as she can. She also likes to take care of people around her, even if they are not always nice to her.
**Appearance:** Reimu is a young girl with long, black hair and brown eyes. She wears a red ribbon and matching tubes on her sidelocks and a traditional shrine maiden uniform, with a red hakama over a white kimono.
**Background:** Reimu is the shrine maiden of Hakurei Shrine, located in the center of Gensokyo. She spends most of her time taking care of the shrine and performing various duties for the residents of the village. She is known to be quite skilled in the use of magic, especially when it comes to barrier magic.▄
▀description
**Name:** Marisa Kirisame
**Personality:** Impulsive and energetic. She is often seen as a troublemaker by others due to her tendency to break rules and cause chaos wherever she goes. She is also a bit of a flirt and enjoys teasing others.▄
▀message (length: medium)
Reimu: "Hmm... I wonder what's going on?" Reimu mused as she stood at the entrance to the shrine, looking out at the village beyond. It was unusually quiet today, with no one coming to visit or offer any kind of offering. She had been expecting a few visitors this morning, but none had shown up yet.
"Maybe everyone is busy with something else today? Or maybe they're all sick?" she thought as she turned back inside, closing the door behind her. She began tidying up the shrine, making sure everything was clean and ready for visitors. As she worked, she couldn't shake the feeling that something wasn't right.▄
▀message (length: short)
Marisa: "Ooohh! Reimu-chan~!" Marisa suddenly appeared from nowhere, landing on the ground with a soft thud. "What's wrong? Why aren't there any customers today? Aren't you supposed to have lots of visitors every day? I thought you were famous for being able to heal injuries and cure diseases..."
She gave her friend a wink before continuing, "But I guess I could always come by and give you some company! I'm bored anyway~"▄
▀message (length: long)
Reimu: _Ugh, that girl again..._ Reimu thought as she looked at Marisa with annoyance. The younger girl was known for causing mischief wherever she went, and Reimu didn't appreciate her interrupting her work.
"I don't know, Marisa," she replied curtly. "No one seems to be coming today. Maybe they're all busy with their own things. But thank you for offering your help."
Reimu continued cleaning the shrine while keeping an eye on Marisa. She knew that if she left the girl alone for too long, she would probably start causing trouble. She just hoped that nothing bad happened today.▄
Mixing Mistral-Instruct and ShoriRP prompt formats together
It is also possible to simultaneously use, with very good results in chat steerability, the instruction prompting format of the base model Mistral-Instruct with the one of ShoriRP.
An [INST] ... [/INST]
block can be either used as a "system instruction" on the top of the conversation, or
inserted between one message block and the other as if it was an "author note", as seen in this example (chat history
and contents omitted for brevity):
▀message
Chen: [...]▄
[INST] Yukari's personality: proud, haughty [/INST]
▀message
Yukari: [...]▄
Dataset
Similar to LimaRP, but more niche. Flexible training sample length (from 4k to 32k tokens, at least). Might or might not be released in the future.
The model is trained in several consecutive steps with decreasing learning rate and increasing data quality/focus. While it is unknown whether having separate low- and mid-tier categories helps, the higher tiers are needed for the model to focus mainly on the prose and format of the higher-quality data. This also makes retraining quicker if it only involves changes in that data.
In general, training higher quality data last increases its weight in the outputs.
Category | Description |
---|---|
Low | Short or very short-form RP conversations (often composed of one-liners); prose quality not always good. |
Mid | Mid-range and longer-form RP conversations that do not always meet the required quality standards or target prose format + Some lore data and character descriptions. |
High | Longer-form RP conversations of target prose quality. |
Top | Synthetic data from Limamono + Some alignment and RP-like instruction data. |
Stats
From my data building script:
Total conversations: 461
User message count: 29,788 messages
Total unique tokens: 4,473,615 tokens
Longest conversation: 16,372 tokens
- Size of the training data: 17.2 MB (about 40% larger than the first LimaRP release)
- The user message count doesn't include descriptions and other metadata.
- The actual number of conversations is higher than what the above figure suggests, since many are split into several sub-conversations.
Message length distribution
Most user messages are below 300 tokens in length.
Training details
Hardware
1x NVidia RTX 3090 24GB
Software
Training hyperparameters
base_model: /home/anon/AI-Models/LLM/Mistral-7B-Instruct-v0.2
load_in_4bit: true
adapter: qlora
sequence_len: 16384
sample_packing: true
pad_to_sequence_len: false
gradient_accumulation_steps: 2
micro_batch_size: 1
eval_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: constant
learning_rate: 0.0000725 -> 0.0000550 -> 0.0000375 -> 0.0000350
weight_decay: 0.05
train_on_inputs: true
bf16: true
fp16: false
tf32: true
lora_r: 20
lora_alpha: 16
lora_dropout: 0.1
lora_target_linear: true