kgreenewald commited on
Commit
9afaa56
·
verified ·
1 Parent(s): 30a033c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -161,11 +161,12 @@ The following datasets were used for calibration and/or finetuning.
161
  ## Evaluation
162
 
163
  The model was evaluated on the [MMLU](https://huggingface.co/datasets/cais/mmlu) datasets (not used in training). Shown are the [Expected Calibration Error (ECE)](https://towardsdatascience.com/expected-calibration-error-ece-a-step-by-step-visual-explanation-with-python-code-c3e9aa12937d) for each task, for the base model (Granite-3.0-8b-instruct) and Granite-Uncertainty-3.0-8b.
164
- The average ECE across tasks is 0.06 (out of 1). Note that this is smaller than the gap between the quantized certainty outputs (10% quantization steps).
165
  <!-- This section describes the evaluation protocols and provides the results. -->
166
 
167
 
168
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/x0IRS16p59O19r4hwOzyU.png)
 
169
 
170
 
171
  ## Model Card Authors
 
161
  ## Evaluation
162
 
163
  The model was evaluated on the [MMLU](https://huggingface.co/datasets/cais/mmlu) datasets (not used in training). Shown are the [Expected Calibration Error (ECE)](https://towardsdatascience.com/expected-calibration-error-ece-a-step-by-step-visual-explanation-with-python-code-c3e9aa12937d) for each task, for the base model (Granite-3.0-8b-instruct) and Granite-Uncertainty-3.0-8b.
164
+ The average ECE across tasks for our method is 0.064 (out of 1) and is consistently low across tasks (maximum task ECE 0.10), compared to the base model average ECE of 0.20 and maximum task ECE of 0.60. Note that our ECE of 0.064 is smaller than the gap between the quantized certainty outputs (10% quantization steps). Additionally, the zero-shot performance on the MMLU tasks does not degrade, averaging at 89%.
165
  <!-- This section describes the evaluation protocols and provides the results. -->
166
 
167
 
168
+
169
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/2MwP7DRZlNBtWSKWFvXOI.png)
170
 
171
 
172
  ## Model Card Authors