File size: 8,747 Bytes
7c4dcc5 915d2f3 7c4dcc5 1f54334 7c4dcc5 1f54334 7c4dcc5 1f54334 7c4dcc5 fae756a 7c4dcc5 ca6bb49 7c4dcc5 ca6bb49 7c4dcc5 78cf0e1 7c4dcc5 78cf0e1 7c4dcc5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
license: apache-2.0
datasets:
- avemio/GRAG-CPT-HESSIAN-AI
language:
- en
- de
base_model:
- ThomasComics/Phi-3-mini-128k-instruct-LLaMAfied
pipeline_tag: question-answering
tags:
- German
- RAG
- Retrieval
- Question-Answering
- Summarization
- Reasoning
---
<img src="https://www.grag.ai/wp-content/uploads/2024/12/GRAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# GRAG-PHI-3-mini-4B-CPT-HESSIAN-AI
<!-- Provide a quick summary of what the model is/does. -->
**GRAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
Our GRAG-PHI-CPT model are trained on this **[GRAG-CPT](https://huggingface.co/datasets/avemio/GRAG-CPT-HESSIAN-AI) dataset.**
## Model Details
The core models released in this batch are the following:
| Size | Training Tokens |
|------|--------|
| [GRAG-PHI-CPT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI) | 507.47 million |
| [GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | 2.03 billion |
| [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | 2.0577 billion |
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Avemio AI Team
- **Supported by:** Hessian AI
- **Model type:** a Transformer style autoregressive language model.
- **Language(s) (NLP):** German, English
- **License:** The code and model are released under Apache 2.0.
- **Contact:** [[email protected]](mailto:[email protected])
### Model Sources
<!-- Provide the basic links for the model. -->
- **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/GRAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
- **Repositories:**
- Training: [Colab-Notebook](https://colab.research.google.com/drive/1U6aP3vIkABaCm7doGV1waHgTLvXNGbBp?usp=sharing)
- Evaluation code:
- [GRAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/GRAG-LLM-HARD-BENCHMARK.git)
- [GRAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/GRAG-LLM-EASY-BENCHMARK.git)
- **Technical blog post:**
<!-- - **Press release:** TODO -->
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Inference
Quickly get inference running with the following required installation:
Now, proceed as usual with HuggingFace:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "avemio/GRAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Hello mein Name ist", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Fine-tuning
We are providing a comprehensive Google Colab notebook to guide users through the process of fine-tuning our model, complete with detailed instructions, essential dependencies, and configurable settings.
[Colab-Notebook](https://colab.research.google.com/drive/1U6aP3vIkABaCm7doGV1waHgTLvXNGbBp?usp=sharing).
## Model Details
### Data
For training data details, please see the [GRAG-CPT-Dataset](https://huggingface.co/datasets/avemio/GRAG-CPT-HESSIAN-AI) documentation.
#### Description
CPT – Continued Pre-Training
Our CPT (Continued Pre-Training) approach is designed to enhance language models' ability to perform specific tasks through structured instruction-based learning. Drawing inspiration from "Instruction Pre-Training: Language Models are Supervised Multitask Learners," our methodology focuses on priming base models with semi-structured examples to improve their performance across three key tasks. Our training dataset comprises approximately 420,000 German language samples and 200,000 English examples, with the deliberate emphasis on German content aimed at expanding the model's German language vocabulary and capabilities.
Context-Based Question Answering
This task trains models to generate accurate responses by considering both the question and its accompanying context. For example, when analyzing cancer counseling center benefits, the model learns to extract and synthesize relevant information from provided context to formulate comprehensive answers. The training examples follow a clear structure: Question > Context > Context-based Answer.
Structured Reasoning
The reasoning task develops the model's ability to break down complex problems and arrive at solutions through systematic thinking. Training examples present problems with clear subheadings (Task, Approach, Solution) to encourage structured analysis. As shown in the music festival scheduling example, this format helps the model learn to consider multiple constraints and develop logical solutions step by step.
Intelligent Summarization
The summarization task teaches models to distill complex information into clear, organized summaries while preserving key details. Training examples demonstrate how to transform detailed explanations into well-structured bullet points or concise summaries.
### Architecture
| Parameter | GRAG-PHI-CPT |
|-----------------------|-----------------------------------------------------------------------------------------------|
| **d_model** | 4096 |
| **num heads** | 32 |
| **num layers** | 32 |
| **MLP ratio** | 3.5 |
| **LayerNorm type** | RMSNorm |
| **pos embeddings** | RoPE |
| **attention variant**| Multi-head attention with 32 key-value heads |
| **biases** | none |
| **block type** | Sequential |
| **activation** | SiLU |
| **sequence length** | 131072 |
| **weight typing** | bfloat16
### Hyperparameters
| Parameter | GRAG-PHI-CPT |
|---------------------------|--------------------|
| **warmup steps** | 50 |
| **peak LR** | 5.0E-07 |
| **weight decay** | 0.1 |
| **LR schedule** | linear |
| **gradient reduce dtype** | FP32 |
| **optimizer state dtype** | FP32 |
## Environmental Impact
GRAG-PHI-CPT, running on NVIDIA A100 with 40 GPUs for 2 days, has an approximate power consumption as follows:
It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
| Model | GPU Type | Power Consumption From GPUs |
|----------------|---------------------|-----------------------------|
| GRAG-PHI-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00576MWh MWh |
## Bias, Risks, and Limitations
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
Otherwise, many facts from GRAG-MISTRAL-CPT or any LLM will often not be true, so they should be checked.
## Model Card Contact
For errors in this model card, please contact ([[email protected]](mailto:[email protected])). |