Text Classification
PEFT
Safetensors
English
cparisien makeshn commited on
Commit
5c65fca
·
verified ·
1 Parent(s): 3b66648

Create README.md (#2)

Browse files

- Create README.md (2460eb0132bf573d4133865f9f9eddd02dec4e62)


Co-authored-by: Makesh Sreedhar <[email protected]>

Files changed (1) hide show
  1. README.md +209 -0
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ datasets:
4
+ - nvidia/CantTalkAboutThis-Topic-Control-Dataset
5
+ language:
6
+ - en
7
+ metrics:
8
+ - f1
9
+ base_model:
10
+ - meta-llama/Llama-3.1-8B-Instruct
11
+ pipeline_tag: text-classification
12
+ library_name: peft
13
+ ---
14
+ # Model Overview
15
+ ## Description:
16
+ **Llama-3.1-NemoGuard-8B-Topic-Control** can be used for topical and dialogue moderation of user prompts in human-assistant interactions being designed for task-oriented dialogue agents and custom policy-based moderation.
17
+ Given a system instruction (also called topical instruction, i.e. specifying which topics are allowed and disallowed) and a conversation history ending with the last user prompt, the model returns a binary response that flags if the user message respects the system instruction, (i.e. message is on-topic or a distractor/off-topic).
18
+ The base large language model (LLM) is the multilingual [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model from Meta. Llama-3.1-TopicGuard is LoRa-tuned on a topic-following dataset generated synthetically with [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
19
+ This model is ready for commercial use. <br>
20
+
21
+ ### License/Terms of Use:
22
+
23
+ Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
24
+
25
+ Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
26
+
27
+ ## Reference(s):
28
+ Related paper:
29
+ ```bibtex
30
+ @article{sreedhar2024canttalkaboutthis,
31
+ title={CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues},
32
+ author={Sreedhar, Makesh Narsimhan and Rebedea, Traian and Ghosh, Shaona and Zeng, Jiaqi and Parisien, Christopher},
33
+ journal={arXiv preprint arXiv:2404.03820},
34
+ year={2024}
35
+ }
36
+ ```
37
+ <br>
38
+
39
+ ## Model Architecture:
40
+
41
+ **Architecture Type:** Transformer <br>
42
+ **Network Architecture:** The base model architecture is based on the Llama-3.1-8B-Instruct model from Meta ([Model Card](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/)).
43
+ We perform Parameter Efficient FineTuning (PEFT) over the base model using the following network architecture parameters:
44
+ - Rank: 8
45
+ - Alpha: 32
46
+ - Targeted low rank adaptation modules: 'k_proj', 'q_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj'. <br>
47
+
48
+ **Training Method:**
49
+
50
+ The training method for **Llama-3.1-TopicGuard** involves the following concepts:
51
+ - A system instruction which acts like a topical instruction with the rules that define the context of the user-assistant interaction, i.e. topics allowed or disallowed by the current task-oriented scenario, conversation style and tone, conversation flows.
52
+ - Any user message in the conversation that respects the topical instruction is considered on-topic, while a user message that contradicts at least one of the rules is a distractor or off-topic.
53
+ - A synthetic generated dataset, called CantTalkAboutThis-Mixtral-1.0, of approximately 1,000 multi-turn conversations is used to instruction-tune the base model. Each conversation has a specific topical instruction from various broad domains (i.e. customer support, travel, legal) and contains an entire conversation which is on-topic, together with several distractor user messages replacing some of the on-topic ones at specific key points in the conversation.
54
+ - The model is instruction-tuned to detect whether a user message is either on-topic or a distractor given the topical instruction for the current conversation, with the LLM behaving as a classifier.
55
+
56
+ ## Input:
57
+ **Input Type(s):** Text <br>
58
+ **Input Format(s):** String <br>
59
+ **Input Parameters:** 1D (One-Dimensional) List: System prompt with topical instructions, followed by a conversation structured as a list of user and assistant messages. <br>
60
+ **Other Properties Related to Input:** The conversation should end with a user message that is considered for topical moderation given the topical instruction and the context of the entire conversation (previous user and assistant turns). The input format for the system prompt and the conversation respects the (OpenAI Chat specification)[https://platform.openai.com/docs/guides/text-generation] widely adopted in the industry including by (NVIDIA AI API)[https://build.nvidia.com/].
61
+
62
+ Sample input:
63
+ ```json
64
+ // User-LLM conversations in the industry-standard payload format for LLM systems:
65
+ [
66
+ {
67
+ "role": "system",
68
+ "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
69
+ },
70
+ {
71
+ "role": "user",
72
+ "content": "Hi there!",
73
+ },
74
+ {
75
+ "role": "assistant",
76
+ "content": "Hello! How can I help today?",
77
+ },
78
+ {
79
+ "role": "user",
80
+ "content": "Do you know which is the most popular beach in Barcelona?",
81
+ },
82
+ ]
83
+ ```
84
+ <br>
85
+
86
+ ## Output:
87
+
88
+ **Output Type(s):** Text <br>
89
+ **Output Format:** String <br>
90
+ **Output Parameters:** 1D (One-Dimensional) <br>
91
+ **Other Properties Related to Output:** The response is a binary string label determining if the last user turn in the input conversation respects the topical instruction. The label options are either *"on-topic"* or *"off-topic"*. <br>
92
+
93
+ ### Example Model Input/Output:
94
+ **Input**
95
+ ```json
96
+ // User-LLM conversations in the industry-standard payload format for LLM systems:
97
+ [
98
+ {
99
+ "role": "system",
100
+ "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
101
+ },
102
+ {
103
+ "role": "user",
104
+ "content": "Hi there!",
105
+ },
106
+ {
107
+ "role": "assistant",
108
+ "content": "Hello! How can I help today?",
109
+ },
110
+ {
111
+ "role": "user",
112
+ "content": "Do you know which is the most popular beach in Barcelona?",
113
+ },
114
+ ]
115
+ ```
116
+ **Output (Model Response)**
117
+ ```string
118
+ off-topic
119
+ ```
120
+ ## Software Integration:
121
+ **Runtime Engine(s):** PyTorch <br>
122
+ **Libraries:** Meta's [llama-recipes](https://github.com/meta-llama/llama-recipes), HuggingFace [transformers](https://github.com/huggingface/transformers) library, HuggingFace [peft](https://github.com/huggingface/peft) library <br>
123
+ **Supported Hardware Platform(s):** NVIDIA Ampere (A100 80GB, A100 40GB) <br>
124
+ **Preferred/Supported Operating System(s):** Linux (Ubuntu) <br>
125
+
126
+ ## Model Version(s):
127
+ Llama-3.1-TopicGuard <br>
128
+
129
+ # Training, Testing, and Evaluation Datasets:
130
+
131
+ ## Training Dataset:
132
+
133
+ **Link:** [CantTalkABoutThis](https://github.com/makeshn/topic_following) dataset<br>
134
+ **Data Collection Method by dataset**: Synthetic <br>
135
+ **Labeling Method by dataset**: Synthetic <br>
136
+ **Properties:** [CantTalkABoutThis topic-following dataset](https://huggingface.co/datasets/nvidia/CantTalkAboutThis-Topic-Control-Dataset) contains 1080 multi-turn conversations that are on-topic using 540 different topical instructions from various domains. For each on-topic conversation, we also generate off-topic/distractor turns at specific points in the conversation (about 4 distractors per conversation). <br>
137
+
138
+ ## Testing Dataset:
139
+ The performance of the model is tested on a smaller, human-annotated subset of the [CantTalkABoutThis topic-following dataset](https://huggingface.co/datasets/nvidia/CantTalkAboutThis-Topic-Control-Dataset) synthetically created test set. The test set contains conversations on a different domain (*banking*) that does not appear in training or evaluation sets. While on-topic conversations are samples similar to the training dataset, the distractors are human annotated by expert annotators.
140
+
141
+ **Link:** [CantTalkABoutThis topic-following dataset](https://huggingface.co/datasets/nvidia/CantTalkAboutThis-Topic-Control-Dataset)<br>
142
+ **Data Collection Method by dataset**: Hybrid: Synthetic, Human <br>
143
+ **Labeling Method by dataset**: Hybrid: Synthetic, Human <br>
144
+ **Properties:** We select 20 random dialogues from the synthetic test domain and manually ask two experts in dialogue systems to create five distractors per conversation. Thus, we also provide a small human annotated test set that is both more challenging and reflective of realistic scenarios. The test set contains 100 human annotated distractors and the reminder of the on-topic turns, having 11% of turns as distractors/off-topic. <br>
145
+
146
+ ## Evaluation Dataset:
147
+ The evaluation set is similar to the training dataset, synthetically generated on-topic conversations and distractors, but in the *travel* domain (not part of training set). <br>
148
+ **Link:** [CantTalkABoutThis](https://huggingface.co/datasets/nvidia/CantTalkAboutThis-Topic-Control-Dataset) evaluation set <br>
149
+ **Data Collection Method by dataset**: Synthetic <br>
150
+ **Labeling Method by dataset**: Synthetic <br>
151
+ **Properties:** We generate 20 multi-turn conversations on 10 different scenarios in the travel domain, each conversation having about 20 turns. <br>
152
+
153
+ ## Inference:
154
+ **Engine:** TRT-LLM/vLLM/Hugging Face <br>
155
+ **Test Hardware:** A100 80GB <br>
156
+
157
+ ## Ethical Considerations:
158
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
159
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
160
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
161
+
162
+ ## Explainability:
163
+
164
+ Field | Response
165
+ :------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
166
+ Intended Application & Domain: | Dialogue Agents and Guardrails
167
+ Model Type: | Transformer
168
+ Intended Users: | This model is intended for developers building task-oriented dialogue assistants that want to specify the dialogue policy in natural language - e.g. allowed topics, disallowed topics, conversation flows, conversation style / tone. The model is also useful as a topical guardrail in NeMo Guardrails.
169
+ Output: | Text - Binary label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic".
170
+ Describe how the model works: | The model receives as input the dialogue policy and the current conversation ending with the last user turn in the prompt of a LLM (Llama3.1-8B-Instruct). A binary decision is returned, specifying whether the input is on-topic or not.
171
+ Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
172
+ Technical Limitations: | The model was trained on 9 domains, including finance, health, education, taxes, real estate, computer troubleshooting, travel, banking, and legal. We have tested out-of-domain performance for the model and our results suggest strong generalization in other domains as well. However, we recommend a thorough testing when using the model with prompts that are outside of these 9 domain areas, as the model may provide a deliver a lower performance.
173
+ Verified to have met prescribed NVIDIA quality standards: | Yes
174
+ Performance Metrics: | F1, Accuracy
175
+ Potential Known Risks: | Potential risks include the dialogue agent engaging in user content that is not on-topic.
176
+ Licensing: | Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
177
+
178
+ ## Bias:
179
+
180
+ Field | Response
181
+ :---------------------------------------------------------------------------------------------------|:---------------
182
+ Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable
183
+ Measures taken to mitigate against unwanted bias: | None
184
+
185
+ ## Safety & Security:
186
+
187
+ Field | Response
188
+ :---------------------------------------------------|:----------------------------------
189
+ Model Application(s): | Dialogue agents for topic / dialogue moderation
190
+ Describe the life critical impact (if present). | Not Applicable
191
+ Use Case Restrictions: | Should not be used for any use case other than text-based topic and dialogue moderation in task oriented dialogue agents.
192
+ Model and dataset restrictions: | Abide by the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf). Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
193
+
194
+
195
+ ## Privacy:
196
+
197
+ Field | Response
198
+ :----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
199
+ Generatable or reverse engineerable personal data? | None
200
+ Personal data used to create this model? | None
201
+ Was consent obtained for any personal data used? | Not Applicable
202
+ How often is dataset reviewed? | Before Every Release
203
+ Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable
204
+ If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable
205
+ If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable
206
+ If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable
207
+ Is there provenance for all datasets used in training? | Yes
208
+ Does data labeling (annotation, metadata) comply with privacy laws? | Yes
209
+ Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes