nintwentydo commited on
Commit
012c25f
·
verified ·
1 Parent(s): a797a2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -13
README.md CHANGED
@@ -22,22 +22,64 @@ pipeline_tag: image-text-to-text
22
  # Pixtral-Large-Instruct-2411 🧡
23
 
24
  Transformers implementation of [Pixtral-Large-Instruct-2411](https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411).
25
-
26
-
27
- ## Tokenizer And Prompt Template
28
- Using conversion of v7m1 tokenizer with 32k vocab size.
29
-
30
- Chat template in chat_template.json uses the v7 instruct template:
31
-
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ```
33
- <s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]
 
 
 
 
 
 
34
  ```
35
 
36
- ## Notes
37
- *- tool use hasn't been implemented in the template yet. I'll add this in later.*
38
- *- I've added extra stop tokens between consecutive user messages. Helps contexts where there'll be multiple speakers etc but your milage may vary.*
39
- *- If you have a better implementation of the tokenizer let me know and I'm happy to swap it out.*
40
- *- As always pls respect the model license.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
 
43
  ## Quantizations
 
22
  # Pixtral-Large-Instruct-2411 🧡
23
 
24
  Transformers implementation of [Pixtral-Large-Instruct-2411](https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411).
25
+
26
+ ***21 Dec 2024:** This model has been a LOT of fun to experiment and learn with. Model card updated below with changes made to this repo
27
+ over the last week.*
28
+
29
+ ## Architecture Differences to Pixtral 12B
30
+ Pixtral 12B has bias keys for the multi_modal_projector layers, whereas Pixtral Large does not. Instead of including with low/zero values
31
+ this conversion does not include those bias keys, aligning with the keys present in the original Pixtral Large upload from Mistral. The
32
+ model's config.json file includes `"multimodal_projector_bias": false` to flag this. *n.b. If anyone in the community confirms initializing
33
+ these keys with zero values is the better way to go I'm happy to reupload without them excluded.*
34
+
35
+ ## Tokenizer
36
+ This model uses a conversion of the Mistral v7m1 tokenizer. Pixtral 12B and Large use different tokenizers with different vocab sizes,
37
+ so make sure you use the right tokenizer.
38
+
39
+ ## Prompting / Chat Template
40
+ The included chat_template.json supports all of Mistral's defined features with some of my own additions.
41
+
42
+ I believe this implementation should give quite a lot of flexibility for using the model, and in my testing has worked quite well.
43
+
44
+ Example *(line breaks added for readability)*
45
  ```
46
+ <s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT]
47
+ [INST] [IMG]<user message>
48
+ [AVAILABLE_TOOLS] [<tool definitions>][/AVAILABLE_TOOLS][/INST]
49
+ [IMG]<assistant response>
50
+ [TOOL_CALLS] [<tool calls>][/TOOL_CALLS]
51
+ [TOOL_RESULTS] <tool results including images>[/TOOL_RESULTS]
52
+ </s>[INST] <user message>[/INST]
53
  ```
54
 
55
+ **System Prompts**:
56
+ Messages with role "system" will be parsed as `[SYSTEM_PROMPT] <content>[/SYSTEM_PROMPT]` anywhere they appear in chat history.
57
+
58
+ This appears to work pretty well for passing extra instructions at various depths, and keeps instructions separate from conversation.
59
+
60
+ **Allowing Non-Alternating Roles**:
61
+ Multiple user messages in a row can be provided, and each will be separated with `[INST][/INST]`. This could work well in group conversation
62
+ settings, or environments where multiple user messages can be provided before the model is invoked. Having a `[/INST]` breaking each one up
63
+ appeared to help prevent the model thinking it needs to respond to every previous message and focus on the last message, while still retaining
64
+ knowledge of what messages sit before it.
65
+
66
+ **Image Inputs Everywhere**:
67
+ Images can now be sent in user, assistant, and tool result messages. And seems to actually work. I did tests like including an image on an
68
+ assistant reply 10-15 messages back in the conversation, asked the assistant to recall what image they previously sent, and it was able to
69
+ accurately describe it.
70
+
71
+ Having this flexibility could allow for interesting applications, for example if you were to define a tool definition for image generation:
72
+ - tool is invoked and calls image generation api/model
73
+ - image returned inside tool result message
74
+ - model responds with a message with context of the image generated
75
+ - you can have further conversation about the generated image, or make revisions with the model actually knowing what was created
76
+
77
+ ## Usage
78
+ When loading in transformers you'll probably want to add some handling to ensure the lack of mmproj bias is respected for it to handle
79
+ vision input properly.
80
+
81
+ Most of my testing has been using TabbyAPI and ExLlamaV2 (dev branch) with working vision input.
82
+ <img src="https://huggingface.co/nintwentydo/Pixtral-Large-Instruct-2411/resolve/main/image-input-example.jpg">
83
 
84
 
85
  ## Quantizations