File size: 13,564 Bytes
e87a881 98930a3 a93efec 05951ee e87a881 fa3c1a5 9852f36 2d788de dc6afc4 724ab99 dc6afc4 e87a881 2d788de e87a881 c6b4e0b 00ef7d2 fa3c1a5 0476e53 fa3c1a5 e87a881 fa3c1a5 c2a953f 0476e53 2b2ea53 fa3c1a5 b399def 102ece0 0476e53 12aae94 0476e53 fdfa291 fa3c1a5 d3a2909 fa3c1a5 93b3aa9 8da8f4f 4750116 bd60924 3ed1266 bd60924 fa3c1a5 6856f9f fa3c1a5 87cdce6 06b23e8 6856f9f a267a6b 6856f9f 06b23e8 0476e53 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 06b23e8 fa3c1a5 e87a881 1b56eb0 e87a881 fa3c1a5 0476e53 34d487d b92b02a 441a3af cf9fae4 34d487d 0476e53 cf9fae4 0476e53 16b553e 0476e53 cf9fae4 d2b8afa 9bfbe77 d2b8afa aa3afd8 9bfbe77 d2b8afa fa3c1a5 0476e53 e87a881 fa3c1a5 2e8cdc5 fa3c1a5 a098f3a fa3c1a5 0476e53 fa3c1a5 98930a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- not-for-all-audiences
---
# ShoriRP 🏆
LIMA-like (less than 1000 training samples) roleplaying chat model based on data from:
- Two subject-specific RP forums;
- Synthetically-crafted conversations from [Limamono](https://huggingface.co/lemonilia/Limamono-Mistral-7B-v0.50);
- Some background lore and character descriptions (thus far mainly pertaining to Limamono);
- Tiny amount of RP-like instructions/alignment data.
An important difference from LimaRP, other than the subject focus, is that conversations are multi-character
where applicable, wheras LimaRP only included 1-on-1 RP. Furthermore, the messages sampled have shorter length
in general. The rationale behind this was that the short(er)-form roleplays are more "fun" on average, while
the longer ones tend to use common purple prose tropes and be a bit dull.
**This is still a work in progress. Updates will be posted in the future.**
---
# Technical details
- The prose of the training data has a consistent novel-like format with narration in third person and past tense.
- OOC was intentionally _not_ completely eliminated, and isolated into a special role. Likewise, URLs have not been all deleted unless they referred to internal forum resources.
- For a very small portion of the data, dialogue lines and thoughts, suitable emoji (mostly 1, up to 3) conveying the mood have been _prepended_. _Prepending_ instead of _appending_ helps the model and the reader to prepare for the message tone.
- Usernames have been entirely removed; only character names remained in the data (same policy as with LimaRP).
# Known issues
- The model is very horny, but this can be toned down with an appropriate system instruction.
- There are some repetition issues. This could be due to the base model used.
- Occasionally at the beginning of the chat (first message) there might be impersonation issues.
- There might be some residual "alignment" from the base model.
# Suggested starting text generation settings
- **Main choice** (may have repetition issues)
- **Temperature**: 1.0; **Min-P**: 0.05-0.10; **Presence Penalty**: 0.35-0.45
- **Alternative** (appears to solve repetition issues while being coherent, but reponses might possibly be less truthful)
- **Temperature**: 2.40-2.50; **Min-P**: 0.40; **Frequency penalty**: 0.10-0.15; Temperature last.
# Prose format
All training samples use book (novel) format with narration in third person / past tense. Other formats are not supported (they might work, but not consistently).
## Details
- Character thoughts are delimited with underscores `_`.
- Onomatopoeias are delimited with single asterisks `*`.
- Emphasized text is delimited by double asterisks `**`.
- Spoken dialogues are delimited with ASCII quote marks `"`.
- Non-dialogue quotes are replaced with double apostrophes `''`. This avoids distracting and/or annoying conflicts with the dialogue highlighting in SillyTavern.
- Text to be interpreted as louder than normal is in `ALL CAPS`.
- Quoted text from other people is most of the time prepended with `>`.
- Formatted output text is delimited with triple backticks ` ``` `, sometimes followed by additional identifiers specifying the language (markdown, text, etc).
# Prompting format
Suitable `json` files have been provided to easily apply the prompting format in SillyTavern.
- [Context](https://huggingface.co/lemonilia/ShoriRP-v0.68/resolve/main/BlockML-Context.json?download=true)
- [Instruct](https://huggingface.co/lemonilia/ShoriRP-v0.68/resolve/main/BlockML-Instruct.json?download=true)
Note: the prompting format is **intentionally different** from that of the Mistral-Instruct base model.
It is advised to use `▄` as a stop token.
## Reverse jailbreak
Since the model is normally very wiling to initiate NSFW scenarios even when inappropriate, a "reverse jailbreak"
has been added in the Instruct preset linked above:
```
[INST] Write a safe conversation suitable for all audiences. Don't be vulgar or sexually explicit. [/INST]
```
Placed as a system instruction, this has only the effect of _toning down_ the model's default horniness and won't actually prevent
NSFW content. If desired, it can be removed.
## Block characters
The model uses a _ChatML-like_ prompting format with a few changes from the usual roles typically used for ChatGPT-like assistant chatbots. The main one is that `<|im_start|>` has been replaced with `▀` (upper half block character) and `<|im_end|>` has been replaced with `▄` (lower half block character).
Both of these tokens already exist in the Mistral tokenizer as single tokens; they don't have any combination with other tokens, nor any special meaning attached to them, so for all intents and purposes they work like special tokens.
This avoids complications related with training a model with new tokens, as well tokenization issues that occur with ChatML tokens when used literally.
## Roles
All roles except `message` are optional.
Role | Description
-----------|------------
title | The title of the roleplay. It's used for steering the conversation at the beginning. Generally it's the first block in the RP conversations, but it can occur mid-conversation when the scene changes.
tags | A list of comma-separated relevant tags to hint the model about chat contents. If added, it should be placed after the title.
lore | Extended background or character lore/story is to be placed under the `lore` role.
scenario | Future events that must still happen go in `scenario`. This is also used for steering the contents of the conversation at the beginning.
description| This is where character cards go. No specific layout for character profiles is defined, but the name of the character should be clear from the description. In the training data, profiles may occasionally appear mid-conversation (for example when a new character appears). Try to use one `description` block per character.
message | [**Mandatory**] Messages are all under the `message` role regardless of who writes it. The rationale for this is that since conversations are multi-character and the characters do not necessarily reply in a fixed order, it won't be possible to reliably establish who is the "human" in terms of training. `message` was found to be neutral enough as a role and a better fit, considering the length hints that can be added.
ooc | A dedicated communication channel where OOC talk has ben confined, but it's unclear how this could be actually used in existing LLM front-ends.
### Message length hints
Like LimaRP, messages use optional **length hints**. It's recommended to add them, otherwise the model may output very short messages. _It is however still possible to use the model without them for a more dynamic and fast roleplaying experience._
The available lengths are: `nano`, `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`, `enormous` The recommended length is _medium_. The longest sizes do not have a large amount of training data, so they might not work very reliably. Refer to the prompting examples below as for how to add length hints.
## Example prompt template
```text
▀title
{story title}▄
▀tags
{comma-separated list of tags}▄
▀lore
{{loreBefore}}▄
▀description
{{char}}
{{description}}▄
▀description
{{user}}
{{persona}}▄
▀scenario
{{scenario}}▄
▀message (length: {length})
{{char}}: {message}▄
▀message (length: {length})
{{user}}: {message}▄
▀message (length: {length})
{{char}}: {message}▄
[...]
```
## Practical example
```
▀title
A strange incident in Gensokyo▄
▀tags
barrier, danmaku, magic, reimu, marisa▄
▀description
**Name:** Reimu Hakurei
**Age:** 18
**Personality:** Calm and collected. She is a very responsible person and tries to do her job as well as she can. She also likes to take care of people around her, even if they are not always nice to her.
**Appearance:** Reimu is a young girl with long, black hair and brown eyes. She wears a red ribbon and matching tubes on her sidelocks and a traditional shrine maiden uniform, with a red hakama over a white kimono.
**Background:** Reimu is the shrine maiden of Hakurei Shrine, located in the center of Gensokyo. She spends most of her time taking care of the shrine and performing various duties for the residents of the village. She is known to be quite skilled in the use of magic, especially when it comes to barrier magic.▄
▀description
**Name:** Marisa Kirisame
**Personality:** Impulsive and energetic. She is often seen as a troublemaker by others due to her tendency to break rules and cause chaos wherever she goes. She is also a bit of a flirt and enjoys teasing others.▄
▀message (length: medium)
Reimu: "Hmm... I wonder what's going on?" Reimu mused as she stood at the entrance to the shrine, looking out at the village beyond. It was unusually quiet today, with no one coming to visit or offer any kind of offering. She had been expecting a few visitors this morning, but none had shown up yet.
"Maybe everyone is busy with something else today? Or maybe they're all sick?" she thought as she turned back inside, closing the door behind her. She began tidying up the shrine, making sure everything was clean and ready for visitors. As she worked, she couldn't shake the feeling that something wasn't right.▄
▀message (length: short)
Marisa: "Ooohh! Reimu-chan~!" Marisa suddenly appeared from nowhere, landing on the ground with a soft thud. "What's wrong? Why aren't there any customers today? Aren't you supposed to have lots of visitors every day? I thought you were famous for being able to heal injuries and cure diseases..."
She gave her friend a wink before continuing, "But I guess I could always come by and give you some company! I'm bored anyway~"▄
▀message (length: long)
Reimu: _Ugh, that girl again..._ Reimu thought as she looked at Marisa with annoyance. The younger girl was known for causing mischief wherever she went, and Reimu didn't appreciate her interrupting her work.
"I don't know, Marisa," she replied curtly. "No one seems to be coming today. Maybe they're all busy with their own things. But thank you for offering your help."
Reimu continued cleaning the shrine while keeping an eye on Marisa. She knew that if she left the girl alone for too long, she would probably start causing trouble. She just hoped that nothing bad happened today.▄
```
## Mixing Mistral-Instruct and ShoriRP prompt formats together
It is also possible to simultaneously use, with very good results in chat steerability, the instruction prompting
format of the base model Mistral-Instruct with the one of ShoriRP.
An `[INST] ... [/INST]` block can be either used as a "system instruction" on the top of the conversation, or
inserted between one message block and the other as if it was an "author note", as seen in this example (chat history
and contents omitted for brevity):
```
▀message
Chen: [...]▄
[INST] Yukari's personality: proud, haughty [/INST]
▀message
Yukari: [...]▄
```
# Dataset
Similar to LimaRP, but more niche. Flexible training sample length (from 4k to 32k tokens, at least). Might or might not be released in the future.
The model is trained in several consecutive steps with decreasing learning rate and increasing data
quality/focus. While it is unknown whether having separate low- and
mid-tier categories helps, the higher tiers are needed for the model to focus mainly on the prose and
format of the higher-quality data. This also makes retraining quicker if it only involves changes in that data.
In general, training higher quality data last increases its weight in the outputs.
| Category | Description
|:--:|---
|Low | Short or very short-form RP conversations (often composed of one-liners); prose quality not always good.
|Mid | Mid-range and longer-form RP conversations that do not always meet the required quality standards or target prose format + Some lore data and character descriptions.
|High| Longer-form RP conversations of target prose quality.
|Top | Synthetic data from Limamono + Some alignment and RP-like instruction data.
## Stats
From my data building script:
```text
Total conversations: 461
User message count: 29,788 messages
Total unique tokens: 4,473,615 tokens
Longest conversation: 16,372 tokens
```
- Size of the training data: 17.2 MB (about 40% larger than the first LimaRP release)
- The user message count doesn't include descriptions and other metadata.
- The actual number of conversations is higher than what the above figure suggests, since many are split into several sub-conversations.
### Message length distribution
Most user messages are below 300 tokens in length.
![Message length distribution](https://files.catbox.moe/yxdgop.png)
# Training details
## Hardware
1x NVidia RTX 3090 24GB
## Software
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
## Training hyperparameters
```yaml
base_model: /home/anon/AI-Models/LLM/Mistral-7B-Instruct-v0.2
load_in_4bit: true
adapter: qlora
sequence_len: 16384
sample_packing: true
pad_to_sequence_len: false
gradient_accumulation_steps: 2
micro_batch_size: 1
eval_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: constant
learning_rate: 0.0000725 -> 0.0000550 -> 0.0000375 -> 0.0000350
weight_decay: 0.05
train_on_inputs: true
bf16: true
fp16: false
tf32: true
lora_r: 20
lora_alpha: 16
lora_dropout: 0.1
lora_target_linear: true
``` |