alish commited on
Commit
6504675
·
1 Parent(s): c7c0881

added media

Browse files
Files changed (4) hide show
  1. README.md +25 -14
  2. assets/cover.jpg +0 -0
  3. assets/logo.svg +17 -0
  4. assets/results.png +0 -0
README.md CHANGED
@@ -12,7 +12,14 @@ base_model:
12
 
13
  # NAVI verifiers (Nace Automated Verification Intelligence)
14
 
15
- NAVI (Nace Automated Verification Intelligence) is a solution for policy alignment verification designed to review various types of text against documents and policies, and identify violating content. It is specifically optimized for enterprise applications requiring compliance verification for automated text generation. To push policy verification in the open-source community, we release NAVI-small-preview, an open-weights version of the model we have deployed on the platform. NAVI-small-preview is centered around verifying specifically assitant outputs against some policy documents. The full solution is available through [NAVI platform and API](https://naviml.com/).
 
 
 
 
 
 
 
16
 
17
  - **Developed by:** Nace.AI
18
  - **Model type:** Policy Alignment Verifier
@@ -108,7 +115,7 @@ NAVI utilizes latest advances in Knowledge Augmentation and Memory in order to i
108
 
109
  #### Training Hyperparameters
110
 
111
- - **Training regime:** We have trained the Lora adapter using all linear modules for all Transformer layers with rank 16, alpha 32, learning rate 5e-5, effective batch size 32. Trained on 8 A100s for 6 epochs under 3 hours using Pytorch Distributed Data Parallel.
112
 
113
  ## Evaluation
114
 
@@ -116,7 +123,7 @@ NAVI utilizes latest advances in Knowledge Augmentation and Memory in order to i
116
 
117
  #### Testing Data
118
 
119
- We have manually collected Policy Alignment Verification (PAV) dataset consisting of across different use cases for evaluation. We open source the public subset of the dataset. Here we diclose the performance on the public subset, containing 125 examples across six industry-specific scenarios: AT&T, Airbnb, Cadence Bank, Delta Airlines, Verisk, and Walgreens.
120
 
121
  #### Factors
122
 
@@ -128,17 +135,21 @@ F1 score was used to measure performance, prioritizing detection of noncomplianc
128
 
129
  ### Results
130
 
131
- NAVI-small-preview achieved an F1 score of 86.8% on public subset of PAV dataset, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness verification.
132
-
133
- | Model | F1 Score | Avg Latency (ms) |
134
- |--------------------------|----------|------------------|
135
- | NAVI-small-preview | 86.8 | - |
136
- | NAVI | 90.4 | 387.62 |
137
- | AWS Bedrock Guardrail | 76.5 | 342.79 |
138
- | Azure Groundedness | 71.2 | 232.71 |
139
- | NeMo (GPT-4o) | 71.2 | 2669.68 |
140
- | GPT-4o (few-shot) | 75.0 | 904.46 |
141
- | Sonnet 3.5 (few-shot) | 75.5 | 2926.69 |
 
 
 
 
142
 
143
  ## Model Card Contact
144
 
 
12
 
13
  # NAVI verifiers (Nace Automated Verification Intelligence)
14
 
15
+ ![NAVI Cover](assets/cover.jpg)
16
+ ![NAVI Logo](assets/logo.svg)
17
+
18
+ NAVI (Nace Automated Verification Intelligence) is a policy alignment verification solution designed to analyze text for compliance with documents and policies, identifying any violations. Optimized for enterprise applications, it supports automated compliance checks for text generation. To encourage open-source adoption, we offer NAVI-small-preview, an open-weights version of the deployed model, focused on verifying assistant outputs against policy documents. The full solution is accessible via the [NAVI platform and API](https://naviml.com/).
19
+
20
+ The chart below illustrates NAVI's strong performance, with the full model achieving an F1 score of 90.4%, outperforming all competitors. NAVI-small-preview also demonstrates impressive results, providing an open-source option with significant improvements over baseline models while maintaining reliable policy alignment verification.
21
+
22
+ ![Results](assets/results.png)
23
 
24
  - **Developed by:** Nace.AI
25
  - **Model type:** Policy Alignment Verifier
 
115
 
116
  #### Training Hyperparameters
117
 
118
+ - **Training regime:** We perform thorough hyperparameter search during finetuning. The resulting model is a Lora adapter that uses all linear modules for all Transformer layers with rank 16, alpha 32, learning rate 5e-5, effective batch size 32. Trained with 8 A100s for 6 epochs using Pytorch Distributed Data Parallel.
119
 
120
  ## Evaluation
121
 
 
123
 
124
  #### Testing Data
125
 
126
+ We curated the Policy Alignment Verification (PAV) dataset to evaluate diverse policy verification use cases, releasing a public subset of 125 examples spanning six industry-specific scenarios: AT&T, Airbnb, Cadence Bank, Delta Airlines, Verisk, and Walgreens. This open-sourced subset ensures transparency and facilitates benchmarking of model performance. We evaluate our models and alternative solutions on this test set.
127
 
128
  #### Factors
129
 
 
135
 
136
  ### Results
137
 
138
+ The table below shows performance of models evaluated on the public subset of PAV dataset. NAVI-small-preview achieved an F1 score of 86.8%, outperforming all tested alternatives except full-scale NAVI. We evaluate against general-purpose solutions like Claude and Open AI models, as well as some guardrails focusing on groundedness to demonstrate a clear distinction of policy verification from the more common groundedness verification.
139
+
140
+ | Model | F1 Score (%) | Precision (%) | Recall (%) | Accuracy (%) |
141
+ |-----------------------|--------------|---------------|------------|--------------|
142
+ | Llama-3.1-Storm-8B | 66.7 | 86.4 | 54.3 | 69.6 |
143
+ | NAVI-small-preview | 86.8 | 80.5 | 94.3 | 84.0 |
144
+ | NAVI | **90.4** | **93.8** | **87.1** | **89.6** |
145
+ | Sonnet 3.5 | 83.2 | 85.1 | 81.4 | 81.6 |
146
+ | GPT-4o | 80.5 | 73.8 | 88.6 | 76.0 |
147
+ | AWS Bedrock Guardrail | 74.8 | 87.1 | 65.6 | 67.2 |
148
+ | Azure Groundedness | 75.0 | 62.3 | 94.3 | 64.8 |
149
+ | NeMo (GPT-4o) | 69.0 | 67.2 | 70.9 | 72.0 |
150
+
151
+
152
+ \*NAVI-small-preview is not deployed anywhere, latency calculation was obtained from running PAV evaluation with vLLM on 1 80GB A100 gpu.
153
 
154
  ## Model Card Contact
155
 
assets/cover.jpg ADDED
assets/logo.svg ADDED
assets/results.png ADDED