rogkesavan commited on
Commit
89c2688
·
verified ·
1 Parent(s): 4162b03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -31
README.md CHANGED
@@ -19,7 +19,7 @@ At Nidum, we believe in pushing the boundaries of innovation by providing advanc
19
  ---
20
 
21
  [![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
22
- **Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/orgs/NidumAI-Inc/repositories)
23
 
24
  ---
25
  ### Key Features
@@ -81,42 +81,26 @@ The following fine-tuning datasets are leveraged to enhance specific model capab
81
 
82
  ---
83
 
84
- ### Benchmarks
85
 
86
  After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
87
 
88
- #### GPQA: Evaluating Domain Expertise
89
- We present **GPQA**, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
90
-
91
- | **Category** | **Metric** | **LLaMA 3B** | **Nidum 3B** |
92
- |---------------------------------------|------------------------------|--------------|--------------|
93
- | **gpqa_diamond_cot_n_shot** | Exact Match (Flexible) | 0 | 0.2 |
94
- | | Accuracy | 0.1 | 0.2 |
95
- | **gpqa_diamond_generative_n_shot** | Exact Match (Flexible) | 0.3 | 0.5 |
96
- | **gpqa_diamond_zeroshot** | Accuracy | 0.2 | 0.3 |
97
- | **gpqa_extended_cot_n_shot** | Exact Match (Flexible) | 0.2 | 0 |
98
- | **gpqa_extended_cot_zeroshot** | Exact Match (Flexible) | 0.2 | 0.3 |
99
- | **gpqa_extended_generative_n_shot** | Exact Match (Flexible) | 0.1 | 0.2 |
100
- | **gpqa_extended_n_shot** | Accuracy | 0.2 | 0.2 |
101
- | **gpqa_extended_zeroshot** | Accuracy | 0.1 | 0.1 |
102
- | **gpqa_main_cot_n_shot** | Exact Match (Flexible) | 0 | 0.1 |
103
- | **gpqa_main_cot_zeroshot** | Exact Match (Flexible) | 0.2 | 0.2 |
104
- | **gpqa_main_generative_n_shot** | Exact Match (Flexible) | 0.2 | 0.2 |
105
- | **gpqa_main_n_shot** | Accuracy | 0.4 | 0.3 |
106
- | **gpqa_main_zeroshot** | Accuracy | 0.3 | 0.4 |
107
 
108
- ---
109
-
110
- #### HellaSwag: Common Sense Reasoning Benchmark
 
 
 
 
 
111
 
112
- HellaSwag evaluates a language model's ability to reason using common sense through sentence completion tasks.
113
 
114
- | **Metric** | **Llama 3B** | **Nidum 3B** |
115
- |---------------------------|--------------|--------------|
116
- | **hellaswag/acc** | 0.3 | 0.4 |
117
- | **hellaswag/acc_norm** | 0.3 | 0.4 |
118
- | **hellaswag/acc_norm_stderr** | 0.15275 | 0.1633 |
119
- | **hellaswag/acc_stderr** | 0.15275 | 0.1633 |
120
 
121
  ---
122
 
 
19
  ---
20
 
21
  [![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
22
+ **Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/NidumAI-Inc)
23
 
24
  ---
25
  ### Key Features
 
81
 
82
  ---
83
 
84
+ ### Benchmarks
85
 
86
  After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
87
 
88
+ #### Benchmark Summary Table
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
+ | **Benchmark** | **Metric** | **LLaMA 3B** | **Nidum 3B** | **Observation** |
91
+ |-------------------|-----------------------------------|--------------|--------------|-----------------------------------------------------------------------------------------------------|
92
+ | **GPQA** | Exact Match (Flexible) | 0.3 | 0.5 | Nidum 3B demonstrates significant improvement, particularly in **generative tasks**. |
93
+ | | Accuracy | 0.4 | 0.5 | Consistent improvement, especially in **zero-shot** scenarios. |
94
+ | **HellaSwag** | Accuracy | 0.3 | 0.4 | Better performance in **common sense reasoning** tasks. |
95
+ | | Normalized Accuracy | 0.3 | 0.4 | Enhanced ability to understand and predict context in sentence completion. |
96
+ | | Normalized Accuracy (Stderr) | 0.15275 | 0.1633 | Slightly improved consistency in normalized accuracy. |
97
+ | | Accuracy (Stderr) | 0.15275 | 0.1633 | Shows robustness in reasoning accuracy compared to LLaMA 3B. |
98
 
99
+ ---
100
 
101
+ ### Insights:
102
+ 1. **GPQA Results**: Fine-tuning on uncensored data has boosted **Nidum 3B's Exact Match and Accuracy**, particularly excelling in **generative** and **zero-shot** tasks involving domain-specific knowledge.
103
+ 2. **HellaSwag Results**: **Nidum 3B** consistently outperforms **LLaMA 3B** in **common sense reasoning benchmarks**, indicating enhanced contextual and semantic understanding.
 
 
 
104
 
105
  ---
106