nidum
/

Nidum-Llama-3.2-3B-Uncensored

@@ -19,7 +19,7 @@ At Nidum, we believe in pushing the boundaries of innovation by providing advanc
 ---
 [![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
-**Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/orgs/NidumAI-Inc/repositories)
 ---
 ### Key Features
@@ -81,42 +81,26 @@ The following fine-tuning datasets are leveraged to enhance specific model capab
 ---
-### Benchmarks
 After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
-#### GPQA: Evaluating Domain Expertise
-We present **GPQA**, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
-| **Category**                          | **Metric**                   | **LLaMA 3B** | **Nidum 3B** |
-|---------------------------------------|------------------------------|--------------|--------------|
-| **gpqa_diamond_cot_n_shot**           | Exact Match (Flexible)       | 0            | 0.2          |
-|                                       | Accuracy                     | 0.1          | 0.2          |
-| **gpqa_diamond_generative_n_shot**    | Exact Match (Flexible)       | 0.3          | 0.5          |
-| **gpqa_diamond_zeroshot**             | Accuracy                     | 0.2          | 0.3          |
-| **gpqa_extended_cot_n_shot**          | Exact Match (Flexible)       | 0.2          | 0            |
-| **gpqa_extended_cot_zeroshot**        | Exact Match (Flexible)       | 0.2          | 0.3          |
-| **gpqa_extended_generative_n_shot**   | Exact Match (Flexible)       | 0.1          | 0.2          |
-| **gpqa_extended_n_shot**              | Accuracy                     | 0.2          | 0.2          |
-| **gpqa_extended_zeroshot**            | Accuracy                     | 0.1          | 0.1          |
-| **gpqa_main_cot_n_shot**              | Exact Match (Flexible)       | 0            | 0.1          |
-| **gpqa_main_cot_zeroshot**            | Exact Match (Flexible)       | 0.2          | 0.2          |
-| **gpqa_main_generative_n_shot**       | Exact Match (Flexible)       | 0.2          | 0.2          |
-| **gpqa_main_n_shot**                  | Accuracy                     | 0.4          | 0.3          |
-| **gpqa_main_zeroshot**                | Accuracy                     | 0.3          | 0.4          |
----
-#### HellaSwag: Common Sense Reasoning Benchmark
-HellaSwag evaluates a language model's ability to reason using common sense through sentence completion tasks.
-| **Metric**                | **Llama 3B** | **Nidum 3B** |
-|---------------------------|--------------|--------------|
-| **hellaswag/acc**         | 0.3          | 0.4          |
-| **hellaswag/acc_norm**    | 0.3          | 0.4          |
-| **hellaswag/acc_norm_stderr** | 0.15275      | 0.1633       |
-| **hellaswag/acc_stderr**  | 0.15275      | 0.1633       |
 ---

 ---
 [![GitHub Icon](https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/Font_Awesome_5_brands_github.svg/232px-Font_Awesome_5_brands_github.svg.png)](https://github.com/NidumAI-Inc)
+**Explore Nidum's Open-Source Projects on GitHub**: [https://github.com/NidumAI-Inc](https://github.com/NidumAI-Inc)
 ---
 ### Key Features
 ---
+### Benchmarks
 After fine-tuning with **uncensored data**, **Nidum-Llama-3.2-3B** demonstrates **superior performance compared to the original LLaMA model**, particularly in accuracy and handling diverse, unrestricted scenarios.
+#### Benchmark Summary Table
+| **Benchmark**    | **Metric**                       | **LLaMA 3B** | **Nidum 3B** | **Observation**                                                                                     |
+|-------------------|-----------------------------------|--------------|--------------|-----------------------------------------------------------------------------------------------------|
+| **GPQA**         | Exact Match (Flexible)           | 0.3          | 0.5          | Nidum 3B demonstrates significant improvement, particularly in **generative tasks**.                |
+|                  | Accuracy                         | 0.4          | 0.5          | Consistent improvement, especially in **zero-shot** scenarios.                                      |
+| **HellaSwag**    | Accuracy                         | 0.3          | 0.4          | Better performance in **common sense reasoning** tasks.                                             |
+|                  | Normalized Accuracy              | 0.3          | 0.4          | Enhanced ability to understand and predict context in sentence completion.                          |
+|                  | Normalized Accuracy (Stderr)     | 0.15275      | 0.1633       | Slightly improved consistency in normalized accuracy.                                               |
+|                  | Accuracy (Stderr)                | 0.15275      | 0.1633       | Shows robustness in reasoning accuracy compared to LLaMA 3B.                                        |
+---
+### Insights:
+1. **GPQA Results**: Fine-tuning on uncensored data has boosted **Nidum 3B's Exact Match and Accuracy**, particularly excelling in **generative** and **zero-shot** tasks involving domain-specific knowledge.
+2. **HellaSwag Results**: **Nidum 3B** consistently outperforms **LLaMA 3B** in **common sense reasoning benchmarks**, indicating enhanced contextual and semantic understanding.
 ---