Update README.md
Browse files
README.md
CHANGED
@@ -6,17 +6,30 @@ tags:
|
|
6 |
- lazymergekit
|
7 |
- hydra-project/ChatHercules-2.5-Mistral-7B
|
8 |
- Nitral-Archive/Prima-Pastacles-7b
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
|
|
|
12 |
|
13 |
-
|
14 |
-
* [hydra-project/ChatHercules-2.5-Mistral-7B](https://huggingface.co/hydra-project/ChatHercules-2.5-Mistral-7B)
|
15 |
-
* [Nitral-Archive/Prima-Pastacles-7b](https://huggingface.co/Nitral-Archive/Prima-Pastacles-7b)
|
16 |
|
17 |
-
|
18 |
|
19 |
```yaml
|
|
|
20 |
slices:
|
21 |
- sources:
|
22 |
- model: hydra-project/ChatHercules-2.5-Mistral-7B
|
@@ -33,5 +46,107 @@ parameters:
|
|
33 |
value: [1, 0.5, 0.7, 0.3, 0]
|
34 |
- value: 0.5
|
35 |
dtype: bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
-
|
|
|
6 |
- lazymergekit
|
7 |
- hydra-project/ChatHercules-2.5-Mistral-7B
|
8 |
- Nitral-Archive/Prima-Pastacles-7b
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
base_model:
|
12 |
+
- hydra-project/ChatHercules-2.5-Mistral-7B
|
13 |
+
- Nitral-Archive/Prima-Pastacles-7b
|
14 |
+
library_name: transformers
|
15 |
---
|
16 |
+
# Mistral-2.5-Prima-Hercules-Fusion-7B
|
17 |
+
|
18 |
+
**Mistral-2.5-Prima-Hercules-Fusion-7B** is a sophisticated language model crafted by merging **hydra-project/ChatHercules-2.5-Mistral-7B** with **Nitral-Archive/Prima-Pastacles-7b** using the **spherical linear interpolation (SLERP)** method. This fusion leverages the conversational depth of Hercules and the contextual adaptability of Prima, resulting in a model that excels in dynamic assistant applications and multi-turn conversations.
|
19 |
+
|
20 |
+
## π Merged Models
|
21 |
+
|
22 |
+
This model merge incorporates the following:
|
23 |
|
24 |
+
- [**hydra-project/ChatHercules-2.5-Mistral-7B**](https://huggingface.co/hydra-project/ChatHercules-2.5-Mistral-7B): Serves as the primary model, renowned for its exceptional conversational abilities and robust language comprehension.
|
25 |
+
- [**Nitral-Archive/Prima-Pastacles-7b**](https://huggingface.co/Nitral-Archive/Prima-Pastacles-7b): Enhances contextual adaptability and task-switching capabilities, providing intuitive context management for diverse applications.
|
26 |
|
27 |
+
## 𧩠Merge Configuration
|
|
|
|
|
28 |
|
29 |
+
The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**. This method ensures a seamless blend of architectural layers from both source models, optimizing their unique strengths for enhanced performance.
|
30 |
|
31 |
```yaml
|
32 |
+
# Mistral-2.5-Prima-Hercules-Fusion-7B Merge Configuration
|
33 |
slices:
|
34 |
- sources:
|
35 |
- model: hydra-project/ChatHercules-2.5-Mistral-7B
|
|
|
46 |
value: [1, 0.5, 0.7, 0.3, 0]
|
47 |
- value: 0.5
|
48 |
dtype: bfloat16
|
49 |
+
```
|
50 |
+
|
51 |
+
### Key Parameters
|
52 |
+
|
53 |
+
- **Self-Attention Filtering** (`self_attn`): Modulates the blending across self-attention layers, allowing the model to balance attention mechanisms from both source models effectively.
|
54 |
+
- **MLP Filtering** (`mlp`): Fine-tunes the integration within Multi-Layer Perceptrons, ensuring optimal neural network layer performance.
|
55 |
+
- **Global Weight (`t.value`)**: Applies a universal interpolation factor to layers not explicitly filtered, maintaining an even blend between models.
|
56 |
+
- **Data Type (`dtype`)**: Utilizes `bfloat16` to maintain computational efficiency while preserving high precision.
|
57 |
+
|
58 |
+
## π Performance Highlights
|
59 |
+
|
60 |
+
- **Enhanced Multi-Turn Conversation Handling**: Improved context retention facilitates more coherent and contextually aware multi-turn interactions.
|
61 |
+
- **Dynamic Assistant Applications**: Excels in role-play and scenario-based interactions, providing nuanced and adaptable responses.
|
62 |
+
- **Balanced Integration**: Combines the conversational depth of Hercules with the contextual adaptability of Prima for versatile performance across various tasks.
|
63 |
+
|
64 |
+
## π― Use Case & Applications
|
65 |
+
|
66 |
+
**Mistral-2.5-Prima-Hercules-Fusion-7B** is designed to excel in environments that demand both conversational prowess and specialized task execution. Ideal applications include:
|
67 |
+
|
68 |
+
- **Advanced Conversational Agents**: Powering chatbots and virtual assistants with nuanced understanding and responsive capabilities.
|
69 |
+
- **Educational Tools**: Assisting in tutoring systems, providing explanations, and facilitating interactive learning experiences.
|
70 |
+
- **Content Generation**: Creating high-quality, contextually relevant content for blogs, articles, and marketing materials.
|
71 |
+
- **Technical Support**: Offering precise and efficient support in specialized domains such as IT, healthcare, and finance.
|
72 |
+
- **Role-Playing Scenarios**: Enhancing interactive storytelling and simulation-based training with dynamic and contextually aware responses.
|
73 |
+
|
74 |
+
## π Usage
|
75 |
+
|
76 |
+
To utilize **Mistral-2.5-Prima-Hercules-Fusion-7B**, follow the steps below:
|
77 |
+
|
78 |
+
### Installation
|
79 |
+
|
80 |
+
First, install the necessary libraries:
|
81 |
+
|
82 |
+
```bash
|
83 |
+
pip install -qU transformers accelerate
|
84 |
+
```
|
85 |
+
|
86 |
+
### Inference
|
87 |
+
|
88 |
+
Below is an example of how to load and use the model for text generation:
|
89 |
+
|
90 |
+
```python
|
91 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
92 |
+
import torch
|
93 |
+
|
94 |
+
# Define the model name
|
95 |
+
model_name = "ZeroXClem/Mistral-2.5-Prima-Hercules-Fusion-7B"
|
96 |
+
|
97 |
+
# Load the tokenizer
|
98 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
99 |
+
|
100 |
+
# Load the model
|
101 |
+
model = AutoModelForCausalLM.from_pretrained(
|
102 |
+
model_name,
|
103 |
+
torch_dtype=torch.bfloat16,
|
104 |
+
device_map="auto"
|
105 |
+
)
|
106 |
+
|
107 |
+
# Initialize the pipeline
|
108 |
+
text_generator = pipeline(
|
109 |
+
"text-generation",
|
110 |
+
model=model,
|
111 |
+
tokenizer=tokenizer,
|
112 |
+
torch_dtype=torch.bfloat16,
|
113 |
+
device_map="auto"
|
114 |
+
)
|
115 |
+
|
116 |
+
# Define the input prompt
|
117 |
+
prompt = "Explain the significance of artificial intelligence in modern healthcare."
|
118 |
+
|
119 |
+
# Generate the output
|
120 |
+
outputs = text_generator(
|
121 |
+
prompt,
|
122 |
+
max_new_tokens=150,
|
123 |
+
do_sample=True,
|
124 |
+
temperature=0.7,
|
125 |
+
top_k=50,
|
126 |
+
top_p=0.95
|
127 |
+
)
|
128 |
+
|
129 |
+
# Print the generated text
|
130 |
+
print(outputs[0]["generated_text"])
|
131 |
+
```
|
132 |
+
|
133 |
+
### Notes
|
134 |
+
|
135 |
+
- **Fine-Tuning**: This merged model requires fine-tuning for optimal performance in specific applications.
|
136 |
+
- **Resource Requirements**: Ensure that your environment has sufficient computational resources, especially if deploying on GPU-enabled hardware for faster inference.
|
137 |
+
|
138 |
+
|
139 |
+
## π License
|
140 |
+
|
141 |
+
This model is open-sourced under the **Apache-2.0 License**.
|
142 |
+
|
143 |
+
## π‘ Tags
|
144 |
+
|
145 |
+
- `merge`
|
146 |
+
- `mergekit`
|
147 |
+
- `slerp`
|
148 |
+
- `Mistral`
|
149 |
+
- `hydra-project/ChatHercules-2.5-Mistral-7B`
|
150 |
+
- `Nitral-Archive/Prima-Pastacles-7b`
|
151 |
|
152 |
+
---
|