anamikac2708 commited on
Commit
f50a657
·
verified ·
1 Parent(s): bb41449

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -99,7 +99,7 @@ QLORA paper link - https://arxiv.org/abs/2305.14314
99
  ## Evaluation
100
 
101
  <!-- This section describes the evaluation protocols and provides the results. -->
102
- We evaluated the model on test set (sample 1k) https://huggingface.co/datasets/FinLang/investopedia-instruction-tuning-dataset. Evaluation was done using Proprietary LLMs as judge on four criteria Correctness, Faithfullness, Clarity, Completeness on scale of 1-5 (1 being worst & 5 being best) inspired by the paper Replacing Judges with Juries https://arxiv.org/abs/2404.18796. Model got an average score of 4.67.
103
  Average inference speed of the model is 10.96 secs. Human Evaluation is in progress to see the percentage of alignment between human and LLM.
104
 
105
 
 
99
  ## Evaluation
100
 
101
  <!-- This section describes the evaluation protocols and provides the results. -->
102
+ We evaluated the model on test set (sample 1k) https://huggingface.co/datasets/FinLang/investopedia-instruction-tuning-dataset. Evaluation was done using Proprietary LLMs as jury on four criteria Correctness, Faithfullness, Clarity, Completeness on scale of 1-5 (1 being worst & 5 being best) inspired by the paper Replacing Judges with Juries https://arxiv.org/abs/2404.18796. Model got an average score of 4.67.
103
  Average inference speed of the model is 10.96 secs. Human Evaluation is in progress to see the percentage of alignment between human and LLM.
104
 
105