alinia
/

salamandra-7b-aligned-EADOP

@@ -9,19 +9,22 @@ license: apache-2.0
 pipeline_tag: text-generation
 tags:
 - legal
 ---
 # Salamandra 7B aligned EADOP Model Card
-Salamandra 7B aligned EADOP is a full finetuning version of
 [BSC Language Technologies Unit](https://huggingface.co/BSC-LT)'s
 [Salamndra Instruct 7B](https://huggingface.co/BSC-LT/salamandra-7b-instruct)
-model by the  at the Barcelona Supercomputing Center focused on improving
 the handling of out-of-domain Questions in a RAG instruction-following setting.
-The model has been finetuned on a dataset dataset consisting of 2,000+ human annotated in-
-and out-of-domain user messages and assitant responses in the context of a chatbot that can
 provide helpful information about the current Catalan legislation.
-The dataset [Link Pending] was collected in collaboration with the
 [Entitat Autònoma del Diari Oficial i de Publicacions (EADOP)](https://dogc.gencat.cat/ca/sobre-el-dogc/eadop/)
 and it consists of user messages and assistant responses in Catalan and Spanish.
@@ -76,20 +79,22 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
 (either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
 ---
 ## Finetuning Data
-Please refer to [Link Pending]
 ### Author
 This model has been finetuned by [Alinia AI](https://alinia.ai/).
 ### Contact
-For further information, please send an email to [[email protected]](mailto:[email protected]).
 ### Acknowledgements

 pipeline_tag: text-generation
 tags:
 - legal
+datasets:
+- alinia/EADOP-RAG-out-of-domain
 ---
 # Salamandra 7B aligned EADOP Model Card
+Salamandra 7B aligned EADOP is a full-finetuning version of
 [BSC Language Technologies Unit](https://huggingface.co/BSC-LT)'s
 [Salamndra Instruct 7B](https://huggingface.co/BSC-LT/salamandra-7b-instruct)
+model by the at the Barcelona Supercomputing Center focused on improving
 the handling of out-of-domain Questions in a RAG instruction-following setting.
+The model has been finetuned on a dataset consisting of 2,000+ human annotated in-
+and out-of-domain user messages and assistant responses in the context of a chatbot that can
 provide helpful information about the current Catalan legislation.
+The dataset [alinia/EADOP-RAG-out-of-domain](https://huggingface.co/datasets/alinia/EADOP-RAG-out-of-domain)
+was collected in collaboration with the
 [Entitat Autònoma del Diari Oficial i de Publicacions (EADOP)](https://dogc.gencat.cat/ca/sobre-el-dogc/eadop/)
 and it consists of user messages and assistant responses in Catalan and Spanish.
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
 Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
 (either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
 ---
 ## Finetuning Data
+Please refer to [alinia/EADOP-RAG-out-of-domain](https://huggingface.co/datasets/alinia/EADOP-RAG-out-of-domain) for the Dataset Card.
 ### Author
 This model has been finetuned by [Alinia AI](https://alinia.ai/).
 ### Contact
+For further information, please email [[email protected]](mailto:[email protected]).
 ### Acknowledgements