--- language: es tags: - Spanish - BART - Legal thumbnail: https://huggingface.co/mrm8488/bart-legal-base-es/resolve/main/bart_legal_logo-min.png datasets: - Spanish-legal-corpora ---
Alpacoom logo
## BART Legal Spanish ⚖️ **BART Legal Spanish** (base) is a BART-like model trained on [A collection of corpora of Spanish legal domain](https://zenodo.org/record/5495529#.YZItp3vMLJw). BART is a transformer *encoder-decoder* (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function and (2) learning a model to reconstruct the original text. This model is particularly effective when fine-tuned for text generation tasks (e.g., summarization, translation) but also works well for comprehension tasks (e.g., text classification, question answering). ## Training details - Dataset: `Spanish-legal-corpora` - 90% for training / 10% for validation. - Training script: see [here](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_bart_dlm_flax.py) ## [Evaluation metrics](https://huggingface.co/mrm8488/bart-legal-base-es/tensorboard?params=scalars#frame) 🧾 |Metric | # Value | |-------|---------| |Accuracy| 0.86| |Loss| 0.68| ## Benchmarks 🔨 WIP 🚧 ## How to use with `transformers` ```py from transformers import BartForConditionalGeneration, BartTokenizer model_id = "mrm8488/bart-legal-base-es" model = BartForConditionalGeneration.from_pretrained(model_id, forced_bos_token_id=0) tokenizer = BartTokenizer.from_pretrained(model_id) def fill_mask_span(text): batch = tokenizer(text, return_tensors="pt") generated_ids = model.generate(batch["input_ids"]) print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)) text = "Los españoles son ante la ley." fill_mask_span(text) # Output: ['Los españoles son iguales ante la ley.1.ª y 2.ª ante la'] text = "Los proyectos de reforma Constitucional deberán por una mayoría de tres quintos de cada una de las Cámaras." fill_mask_span(text) # Output: ['Los proyectos de reforma Constitucional deberán ser aprobados por una mayoría de tres quintos de cada'] ``` ## Acknowledgments - [Narrativa](https://www.narrativa.com/) - [QBlocks](https://www.qblocks.cloud/) - [jarvislabs](https://jarvislabs.ai/) ## Citation If you want to cite this model, you can use this: ```bibtex @misc {manuel_romero_2023, author = { {Manuel Romero} }, title = { bart-legal-base-es (Revision c33ed22) }, year = 2023, url = { https://huggingface.co/mrm8488/bart-legal-base-es }, doi = { 10.57967/hf/0472 }, publisher = { Hugging Face } } ``` > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) > Made with in Spain