Adding the Open Portuguese LLM Leaderboard Evaluation Results

b9022cb verified 10 months ago

5.92 kB

	---
	license: apache-2.0
	tags:
	- JJhooww/Mistral-7B-v0.2-Base_ptbr
	- J-LAB/BRisa
	model-index:
	- name: BRisa-7B-Instruct-v0.2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 65.08
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 53.69
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 43.37
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 91.5
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 73.61
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 68.31
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 74.28
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 65.12
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 60.77
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=J-LAB/BRisa-7B-Instruct-v0.2
	name: Open Portuguese LLM Leaderboard
	---

	# BRisa 7B Instruct

	This is an instruction model trained for good performance in Portuguese. The initial base is the Mistral 7B v2 Model ([source](https://huggingface.co/mistral-community/Mistral-7B-v0.2)). We utilized the JJhooww/Mistral-7B-v0.2-Base_ptbr version pre-trained on 1 billion tokens in Portuguese ([source](https://huggingface.co/JJhooww/Mistral-7B-v0.2-Base_ptbr)).

	The base model has good performance in Portuguese but faces significant challenges following instructions. We therefore used the version mistralai/Mistral-7B-Instruct-v0.2 and fine-tuned it for responses in Portuguese, then merged it with the base JJhooww/Mistral-7B-v0.2-Base_ptbr (https://huggingface.co/JJhooww/Mistral-7B-v0.2-Base_ptbr).

	- Developed by: ([J-LAB](https://huggingface.co/J-LAB/))
	- Language(s) (NLP): Portuguese
	- License: APACHE
	- Finetuned from model: ([source](https://huggingface.co/JJhooww/Mistral-7B-v0.2-Base_ptbr))

	### Model Sources

	- Demo: ([Demonstracao da Versão DPO](https://huggingface.co/spaces/J-LAB/BRisa-7B))


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/J-LAB/BRisa-7B-Instruct-v0.2) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|66.19\|
	\|ENEM Challenge (No Images)\| 65.08\|
	\|BLUEX (No Images) \| 53.69\|
	\|OAB Exams \| 43.37\|
	\|Assin2 RTE \| 91.50\|
	\|Assin2 STS \| 73.61\|
	\|FaQuAD NLI \| 68.31\|
	\|HateBR Binary \| 74.28\|
	\|PT Hate Speech Binary \| 65.12\|
	\|tweetSentBR \| 60.77\|