rogkesavan
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ At Nidum, we believe in pushing the boundaries of innovation by providing advanc
|
|
19 |
---
|
20 |
|
21 |
[![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
|
22 |
-
**Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/
|
23 |
|
24 |
---
|
25 |
### Key Features
|
@@ -81,42 +81,26 @@ The following fine-tuning datasets are leveraged to enhance specific model capab
|
|
81 |
|
82 |
---
|
83 |
|
84 |
-
### Benchmarks
|
85 |
|
86 |
After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
|
87 |
|
88 |
-
####
|
89 |
-
We present **GPQA**, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
|
90 |
-
|
91 |
-
| **Category** | **Metric** | **LLaMA 3B** | **Nidum 3B** |
|
92 |
-
|---------------------------------------|------------------------------|--------------|--------------|
|
93 |
-
| **gpqa_diamond_cot_n_shot** | Exact Match (Flexible) | 0 | 0.2 |
|
94 |
-
| | Accuracy | 0.1 | 0.2 |
|
95 |
-
| **gpqa_diamond_generative_n_shot** | Exact Match (Flexible) | 0.3 | 0.5 |
|
96 |
-
| **gpqa_diamond_zeroshot** | Accuracy | 0.2 | 0.3 |
|
97 |
-
| **gpqa_extended_cot_n_shot** | Exact Match (Flexible) | 0.2 | 0 |
|
98 |
-
| **gpqa_extended_cot_zeroshot** | Exact Match (Flexible) | 0.2 | 0.3 |
|
99 |
-
| **gpqa_extended_generative_n_shot** | Exact Match (Flexible) | 0.1 | 0.2 |
|
100 |
-
| **gpqa_extended_n_shot** | Accuracy | 0.2 | 0.2 |
|
101 |
-
| **gpqa_extended_zeroshot** | Accuracy | 0.1 | 0.1 |
|
102 |
-
| **gpqa_main_cot_n_shot** | Exact Match (Flexible) | 0 | 0.1 |
|
103 |
-
| **gpqa_main_cot_zeroshot** | Exact Match (Flexible) | 0.2 | 0.2 |
|
104 |
-
| **gpqa_main_generative_n_shot** | Exact Match (Flexible) | 0.2 | 0.2 |
|
105 |
-
| **gpqa_main_n_shot** | Accuracy | 0.4 | 0.3 |
|
106 |
-
| **gpqa_main_zeroshot** | Accuracy | 0.3 | 0.4 |
|
107 |
|
108 |
-
|
109 |
-
|
110 |
-
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
-
|
113 |
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
| **hellaswag/acc_norm** | 0.3 | 0.4 |
|
118 |
-
| **hellaswag/acc_norm_stderr** | 0.15275 | 0.1633 |
|
119 |
-
| **hellaswag/acc_stderr** | 0.15275 | 0.1633 |
|
120 |
|
121 |
---
|
122 |
|
|
|
19 |
---
|
20 |
|
21 |
[![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
|
22 |
+
**Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/NidumAI-Inc)
|
23 |
|
24 |
---
|
25 |
### Key Features
|
|
|
81 |
|
82 |
---
|
83 |
|
84 |
+
### Benchmarks
|
85 |
|
86 |
After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
|
87 |
|
88 |
+
#### Benchmark Summary Table
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
+
| **Benchmark** | **Metric** | **LLaMA 3B** | **Nidum 3B** | **Observation** |
|
91 |
+
|-------------------|-----------------------------------|--------------|--------------|-----------------------------------------------------------------------------------------------------|
|
92 |
+
| **GPQA** | Exact Match (Flexible) | 0.3 | 0.5 | Nidum 3B demonstrates significant improvement, particularly in **generative tasks**. |
|
93 |
+
| | Accuracy | 0.4 | 0.5 | Consistent improvement, especially in **zero-shot** scenarios. |
|
94 |
+
| **HellaSwag** | Accuracy | 0.3 | 0.4 | Better performance in **common sense reasoning** tasks. |
|
95 |
+
| | Normalized Accuracy | 0.3 | 0.4 | Enhanced ability to understand and predict context in sentence completion. |
|
96 |
+
| | Normalized Accuracy (Stderr) | 0.15275 | 0.1633 | Slightly improved consistency in normalized accuracy. |
|
97 |
+
| | Accuracy (Stderr) | 0.15275 | 0.1633 | Shows robustness in reasoning accuracy compared to LLaMA 3B. |
|
98 |
|
99 |
+
---
|
100 |
|
101 |
+
### Insights:
|
102 |
+
1. **GPQA Results**: Fine-tuning on uncensored data has boosted **Nidum 3B's Exact Match and Accuracy**, particularly excelling in **generative** and **zero-shot** tasks involving domain-specific knowledge.
|
103 |
+
2. **HellaSwag Results**: **Nidum 3B** consistently outperforms **LLaMA 3B** in **common sense reasoning benchmarks**, indicating enhanced contextual and semantic understanding.
|
|
|
|
|
|
|
104 |
|
105 |
---
|
106 |
|