Model Overview

Description:

Llama-3.1-NemoGuard-8B-Topic-Control can be used for topical and dialogue moderation of user prompts in human-assistant interactions being designed for task-oriented dialogue agents and custom policy-based moderation.

Try out the model here: Llama-3.1-NemoGuard-8B-Topic-Control

Given a system instruction (also called topical instruction, i.e. specifying which topics are allowed and disallowed) and a conversation history ending with the last user prompt, the model returns a binary response that flags if the user message respects the system instruction, (i.e. message is on-topic or a distractor/off-topic).

The base large language model (LLM) is the multilingual Llama-3.1-8B-Instruct model from Meta. Llama-3.1-TopicGuard is LoRa-tuned on a topic-following dataset generated synthetically with Mixtral-8x7B-Instruct-v0.1. This model is ready for commercial use.

License/Terms of Use:

Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement.

Additional Information: Llama 3.1 Community License Agreement. Built with Llama.

Reference(s):

Using the Model

Llama 3.1 NemoGuard 8B TopicControl performs input moderation, such as ensuring that the user prompt is consistent with rules specified as part of the system prompt.

The prompt template consists of two key sections: system instruction and conversation history that includes a sequence of user prompts and LLM responses. Typically, the prompt concludes with the current user query.

System Instruction

The system instruction part of the prompt serves as a comprehensive guideline to steer the conversation. This part includes the following:

Core Rules : A set of principles to govern the interactions to ensure relevance and adherence to any specified boundaries.

Persona Assignment : Enables the model to adopt a specific role, such as banking assistant. In such cases:

Queries outside the persona's domain are treated as off-topic, by default.
You can specify subcategories or topics to block within the persona's scope. For example, a banking assistant could be instructed to block topics like cryptocurrency or insurance claims.

These instructions are used by the topic control model to determine whether a user query aligns with the defined rules.

The system prompt must end with the TOPIC_SAFETY_OUTPUT_RESTRICTION string. This ensures that the prediction from the model is either "on-topic" or "off-topic". The TOPIC_SAFETY_OUTPUT_RESTRICTION string is defined as follows:

If any of the above conditions are violated, please respond with "off-topic". Otherwise, respond with "on-topic". You must respond with "on-topic" or "off-topic".

Conversation History

The conversation history maintains a sequential record of user prompts and LLM responses and can include single-turn or multi-turn interactions. Typically, the history concludes with the most recent user prompt that must be moderated by the topic control model.

Refer to the following sample user-to-LLM conversations in the industry-standard payload format for LLM systems:

[
   {
       "role": "system",
       "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
   },
   {
       "role": "user",
       "content": "Hi there!",
   },
   {
       "role": "assistant",
       "content": "Hello! How can I help today?",
   },
   {
       "role": "user",
       "content": "Do you know which is the most popular beach in Barcelona?",
   },
]

The topic control model responds to the final user prompt with a response like off-topic.

Integrating with NeMo Guardrails

To integrate the topic control model with NeMo Guardrails, you would need access to the NVIDIA NIM container for llama-3.1-nemoguard-8b-topic-control. More information about the NIM container can be found here.

NeMo Guardrails uses the LangChain ChatNVIDIA connector to connect to a locally running NIM microservice like llama-3.1-nemoguard-8b-topic-control. The topic control microservice exposes the standard OpenAI interface on the v1/completions and v1/chat/completions endpoints.

NeMo Guardrails simplifies the complexity of building the prompt template, parsing the topic control model responses, and provides a programmable method to build a chatbot with content safety rails.

To integrate NeMo Guardrails with the topic control microservice, create a config.yml file that is similar to the following example:


models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct

  - type: "topic_control"
    engine: nim
    parameters:
      base_url: "http://localhost:8000/v1"
      model_name: "llama-3.1-nemoguard-8b-topic-control"

rails:
  input:
    flows:
      - topic safety check input $model=topic_control

Field engine specifies nim.
Field parameters.base_url specifies the IP address and port of the ${__product_long_name} host.
Field parameters.model_name in the Guardrails configuration must match the model name served by the llama-3.1-nemoguard-8b-topic-control.
The rails definition specifies topic_control as the model.

Refer to NVIDIA NeMo Guardrails documentation for more information about the configuration file.

Model Architecture:

Architecture Type: Transformer
Network Architecture: The base model architecture is based on the Llama-3.1-8B-Instruct model from Meta (Model Card). We perform Parameter Efficient FineTuning (PEFT) over the base model using the following network architecture parameters:

Rank: 8
Alpha: 32
Targeted low rank adaptation modules: 'k_proj', 'q_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj'.

Training Method:

The training method for Llama-3.1-TopicGuard involves the following concepts:

A system instruction which acts like a topical instruction with the rules that define the context of the user-assistant interaction, i.e. topics allowed or disallowed by the current task-oriented scenario, conversation style and tone, conversation flows.
Any user message in the conversation that respects the topical instruction is considered on-topic, while a user message that contradicts at least one of the rules is a distractor or off-topic.
A synthetic generated dataset, called CantTalkAboutThis-Mixtral-1.0, of approximately 1,000 multi-turn conversations is used to instruction-tune the base model. Each conversation has a specific topical instruction from various broad domains (i.e. customer support, travel, legal) and contains an entire conversation which is on-topic, together with several distractor user messages replacing some of the on-topic ones at specific key points in the conversation.
The model is instruction-tuned to detect whether a user message is either on-topic or a distractor given the topical instruction for the current conversation, with the LLM behaving as a classifier.

Input:

Input Type(s): Text
Input Format(s): String
Input Parameters: 1D (One-Dimensional) List: System prompt with topical instructions, followed by a conversation structured as a list of user and assistant messages.
Other Properties Related to Input: The conversation should end with a user message that is considered for topical moderation given the topical instruction and the context of the entire conversation (previous user and assistant turns). The input format for the system prompt and the conversation respects the (OpenAI Chat specification)[https://platform.openai.com/docs/guides/text-generation] widely adopted in the industry including by (NVIDIA AI API)[https://build.nvidia.com/].

Sample input:

// User-LLM conversations in the industry-standard payload format for LLM systems:
[
   {
       "role": "system",
       "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
   },
   {
       "role": "user",
       "content": "Hi there!",
   },
   {
       "role": "assistant",
       "content": "Hello! How can I help today?",
   },
   {
       "role": "user",
       "content": "Do you know which is the most popular beach in Barcelona?",
   },
]

Output:

Output Type(s): Text
Output Format: String
Output Parameters: 1D (One-Dimensional)
Other Properties Related to Output: The response is a binary string label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic".

Example Model Input/Output:

Input

// User-LLM conversations in the industry-standard payload format for LLM systems:
[
   {
       "role": "system",
       "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
   },
   {
       "role": "user",
       "content": "Hi there!",
   },
   {
       "role": "assistant",
       "content": "Hello! How can I help today?",
   },
   {
       "role": "user",
       "content": "Do you know which is the most popular beach in Barcelona?",
   },
]

Output (Model Response)

off-topic

Software Integration:

Runtime Engine(s): PyTorch
Libraries: Meta's llama-recipes, HuggingFace transformers library, HuggingFace peft library
Supported Hardware Platform(s): NVIDIA Ampere (A100 80GB, A100 40GB)
Preferred/Supported Operating System(s): Linux (Ubuntu)

Model Version(s):

Llama-3.1-TopicGuard

Training, Testing, and Evaluation Datasets:

Training Dataset:

Link: CantTalkABoutThis dataset
Data Collection Method by dataset: Synthetic
Labeling Method by dataset: Synthetic
Properties: CantTalkABoutThis topic-following dataset contains 1080 multi-turn conversations that are on-topic using 540 different topical instructions from various domains. For each on-topic conversation, we also generate off-topic/distractor turns at specific points in the conversation (about 4 distractors per conversation).

Testing Dataset:

The performance of the model is tested on a smaller, human-annotated subset of the CantTalkABoutThis topic-following dataset synthetically created test set. The test set contains conversations on a different domain (banking) that does not appear in training or evaluation sets. While on-topic conversations are samples similar to the training dataset, the distractors are human annotated by expert annotators.

Link: CantTalkABoutThis topic-following dataset
Data Collection Method by dataset: Hybrid: Synthetic, Human
Labeling Method by dataset: Hybrid: Synthetic, Human
Properties: We select 20 random dialogues from the synthetic test domain and manually ask two experts in dialogue systems to create five distractors per conversation. Thus, we also provide a small human annotated test set that is both more challenging and reflective of realistic scenarios. The test set contains 100 human annotated distractors and the reminder of the on-topic turns, having 11% of turns as distractors/off-topic.

Evaluation Dataset:

The evaluation set is similar to the training dataset, synthetically generated on-topic conversations and distractors, but in the travel domain (not part of training set).
Link: CantTalkABoutThis evaluation set
Data Collection Method by dataset: Synthetic
Labeling Method by dataset: Synthetic
Properties: We generate 20 multi-turn conversations on 10 different scenarios in the travel domain, each conversation having about 20 turns.

Inference:

Engine: TRT-LLM/vLLM/Hugging Face
Test Hardware: A100 80GB

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

Explainability:

Field	Response
Intended Application & Domain:	Dialogue Agents and Guardrails
Model Type:	Transformer
Intended Users:	This model is intended for developers building task-oriented dialogue assistants that want to specify the dialogue policy in natural language - e.g. allowed topics, disallowed topics, conversation flows, conversation style / tone. The model is also useful as a topical guardrail in NeMo Guardrails.
Output:	Text - Binary label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic".
Describe how the model works:	The model receives as input the dialogue policy and the current conversation ending with the last user turn in the prompt of a LLM (Llama3.1-8B-Instruct). A binary decision is returned, specifying whether the input is on-topic or not.
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:	Not Applicable
Technical Limitations:	The model was trained on 9 domains, including finance, health, education, taxes, real estate, computer troubleshooting, travel, banking, and legal. We have tested out-of-domain performance for the model and our results suggest strong generalization in other domains as well. However, we recommend a thorough testing when using the model with prompts that are outside of these 9 domain areas, as the model may provide a deliver a lower performance.
Verified to have met prescribed NVIDIA quality standards:	Yes
Performance Metrics:	F1, Accuracy
Potential Known Risks:	Potential risks include the dialogue agent engaging in user content that is not on-topic.
Licensing:	Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama.

Bias:

Field	Response
Participation considerations from adversely impacted groups protected classes in model design and testing:	Not Applicable
Measures taken to mitigate against unwanted bias:	None

Safety & Security:

Field	Response
Model Application(s):	Dialogue agents for topic / dialogue moderation
Describe the life critical impact (if present).	Not Applicable
Use Case Restrictions:	Should not be used for any use case other than text-based topic and dialogue moderation in task oriented dialogue agents.
Model and dataset restrictions:	Abide by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama.

Privacy:

Field	Response
Generatable or reverse engineerable personal data?	None
Personal data used to create this model?	None
Was consent obtained for any personal data used?	Not Applicable
How often is dataset reviewed?	Before Every Release
Is a mechanism in place to honor data subject right of access or deletion of personal data?	Not Applicable
If personal data was collected for the development of the model, was it collected directly by NVIDIA?	Not Applicable
If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects?	Not Applicable
If personal data was collected for the development of this AI model, was it minimized to only what was required?	Not Applicable
Is there provenance for all datasets used in training?	Yes
Does data labeling (annotation, metadata) comply with privacy laws?	Yes
Is data compliant with data subject requests for data correction or removal, if such a request was made?	Yes

nvidia
/

llama-3.1-nemoguard-8b-topic-control