paurue commited on
Commit
e06c547
·
verified ·
1 Parent(s): 3ba14a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -9,19 +9,22 @@ license: apache-2.0
9
  pipeline_tag: text-generation
10
  tags:
11
  - legal
 
 
12
  ---
13
 
14
  # Salamandra 7B aligned EADOP Model Card
15
- Salamandra 7B aligned EADOP is a full finetuning version of
16
  [BSC Language Technologies Unit](https://huggingface.co/BSC-LT)'s
17
  [Salamndra Instruct 7B](https://huggingface.co/BSC-LT/salamandra-7b-instruct)
18
- model by the at the Barcelona Supercomputing Center focused on improving
19
  the handling of out-of-domain Questions in a RAG instruction-following setting.
20
 
21
- The model has been finetuned on a dataset dataset consisting of 2,000+ human annotated in-
22
- and out-of-domain user messages and assitant responses in the context of a chatbot that can
23
  provide helpful information about the current Catalan legislation.
24
- The dataset [Link Pending] was collected in collaboration with the
 
25
  [Entitat Autònoma del Diari Oficial i de Publicacions (EADOP)](https://dogc.gencat.cat/ca/sobre-el-dogc/eadop/)
26
  and it consists of user messages and assistant responses in Catalan and Spanish.
27
 
@@ -76,20 +79,22 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)
76
 
77
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
78
  ```
 
79
  Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
80
  (either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
81
 
82
  ---
83
 
84
  ## Finetuning Data
85
- Please refer to [Link Pending]
86
 
87
 
88
  ### Author
89
  This model has been finetuned by [Alinia AI](https://alinia.ai/).
90
 
 
91
  ### Contact
92
- For further information, please send an email to [[email protected]](mailto:[email protected]).
93
 
94
 
95
  ### Acknowledgements
 
9
  pipeline_tag: text-generation
10
  tags:
11
  - legal
12
+ datasets:
13
+ - alinia/EADOP-RAG-out-of-domain
14
  ---
15
 
16
  # Salamandra 7B aligned EADOP Model Card
17
+ Salamandra 7B aligned EADOP is a full-finetuning version of
18
  [BSC Language Technologies Unit](https://huggingface.co/BSC-LT)'s
19
  [Salamndra Instruct 7B](https://huggingface.co/BSC-LT/salamandra-7b-instruct)
20
+ model by the at the Barcelona Supercomputing Center focused on improving
21
  the handling of out-of-domain Questions in a RAG instruction-following setting.
22
 
23
+ The model has been finetuned on a dataset consisting of 2,000+ human annotated in-
24
+ and out-of-domain user messages and assistant responses in the context of a chatbot that can
25
  provide helpful information about the current Catalan legislation.
26
+ The dataset [alinia/EADOP-RAG-out-of-domain](https://huggingface.co/datasets/alinia/EADOP-RAG-out-of-domain)
27
+ was collected in collaboration with the
28
  [Entitat Autònoma del Diari Oficial i de Publicacions (EADOP)](https://dogc.gencat.cat/ca/sobre-el-dogc/eadop/)
29
  and it consists of user messages and assistant responses in Catalan and Spanish.
30
 
 
79
 
80
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
81
  ```
82
+
83
  Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
84
  (either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
85
 
86
  ---
87
 
88
  ## Finetuning Data
89
+ Please refer to [alinia/EADOP-RAG-out-of-domain](https://huggingface.co/datasets/alinia/EADOP-RAG-out-of-domain) for the Dataset Card.
90
 
91
 
92
  ### Author
93
  This model has been finetuned by [Alinia AI](https://alinia.ai/).
94
 
95
+
96
  ### Contact
97
+ For further information, please email [[email protected]](mailto:[email protected]).
98
 
99
 
100
  ### Acknowledgements