fairlex-fscs-minilm / README.md
lbourdois's picture
Add multilingual to the language tag
ad2c8d4
|
raw
history blame
4.33 kB
metadata
language:
  - de
  - fr
  - it
  - multilingual
license: cc-by-nc-sa-4.0
tags:
  - legal
  - fairlex
pipeline_tag: fill-mask
widget:
  - text: >-
      Aus seinem damaligen strafbaren Verhalten resultierte eine Forderung der
      Nachlassverwaltung eines <mask>, wor�ber eine aussergerichtliche
      Vereinbarung �ber Fr. 500'000.
  - text: ' Elle avait pour but social les <mask> dans le domaine des changes, en particulier l''exploitation d''une plateforme internet.'
  - text: >-
      Il Pretore ha accolto la petizione con sentenza 16 luglio 2015, accordando
      all'attore l'importo <mask>, con interessi di mora a partire dalla
      notifica del precetto esecutivo, e ha rigettato in tale misura
      l'opposizione interposta a quest'ultimo.

FairLex: A multilingual benchmark for evaluating fairness in legal text processing

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, nationality/region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.


Ilias Chalkidis, Tommaso Passini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, and Anders S�gaard. 2022. FairLex: A multilingual bench-mark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland.


Pre-training details

For the purpose of this work, we release four domain-specific BERT models with continued pre-training on the corpora of the examined datasets (ECtHR, SCOTUS, FSCS, SPC). We train mini-sized BERT models with 6 Transformer blocks, 384 hidden units, and 12 attention heads. We warm-start all models from the public MiniLMv2 (Wang et al., 2021) using the distilled version of RoBERTa (Liu et al., 2019). For the English datasets (ECtHR, SCOTUS) and the one distilled from XLM-R (Conneau et al., 2021) for the rest (trilingual FSCS, and Chinese SPC).

Models list

Model name Training corpora Language
coastalcph/fairlex-ecthr-minlm ECtHR en
coastalcph/fairlex-scotus-minlm SCOTUS en
coastalcph/fairlex-fscs-minlm FSCS [de, fr, it]
coastalcph/fairlex-cail-minlm CAIL zh

Load Pretrained Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("coastalcph/fairlex-fscs-minlm")
model = AutoModel.from_pretrained("coastalcph/fairlex-fscs-minlm")

Evaluation on downstream tasks

Consider the experiments in the article:

Ilias Chalkidis, Tommaso Passini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, and Anders S�gaard. 2022. Fairlex: A multilingual bench-mark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland.

Author - Publication

@inproceedings{chalkidis-2022-fairlex,
author={Chalkidis, Ilias and Passini, Tommaso and Zhang, Sheng and
        Tomada, Letizia and Schwemer, Sebastian Felix and S�gaard, Anders},
title={FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
year={2022},
address={Dublin, Ireland}
}

Ilias Chalkidis on behalf of CoAStaL NLP Group

| Github: @ilias.chalkidis | Twitter: @KiddoThe2B |