kiddothe2b commited on
Commit
e66ed6a
·
1 Parent(s): 5134b8b

initial commit

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zh
3
+ pipeline_tag: fill-mask
4
+ license: cc-by-nc-sa-4.0
5
+ tags:
6
+ - legal
7
+ - fairlex
8
+ widget:
9
+ - text: "上述事实,被告人在庭审过程中亦无异议,且有<mask>的陈述,现场辨认笔录及照片,被告人的前科刑事判决书,释放证明材料,抓获经过,被告人的供述及身份证明等证据证实,足以认定。"
10
+ ---
11
+
12
+ # FairLex: A multilingual benchmark for evaluating fairness in legal text processing
13
+
14
+ We present a benchmark suite of four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, nationality/region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.
15
+
16
+ ---
17
+
18
+ Ilias Chalkidis, Tommaso Passini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, and Anders Søgaard. 2022. FairLex: A multilingual bench-mark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland.
19
+
20
+ ---
21
+
22
+ ## Pre-training details
23
+
24
+ For the purpose of this work, we release four domain-specific BERT models with continued pre-training on the corpora of the examined datasets (ECtHR, SCOTUS, FSCS, SPC).
25
+ We train mini-sized BERT models with 6 Transformer blocks, 384 hidden units, and 12 attention heads.
26
+ We warm-start all models from the public MiniLMv2 (Wang et al., 2021) using the distilled version of RoBERTa (Liu et al., 2019).
27
+ For the English datasets (ECtHR, SCOTUS) and the one distilled from XLM-R (Conneau et al., 2021) for the rest (trilingual FSCS, and Chinese SPC).
28
+
29
+ ## Models list
30
+
31
+ | Model name | Training corpora | Language |
32
+ |-----------------------------------|------------------|--------------------|
33
+ | `coastalcph/fairlex-ecthr-minlm` | ECtHR | `en` |
34
+ | `coastalcph/fairlex-scotus-minlm` | SCOTUS | `en` |
35
+ | `coastalcph/fairlex-fscs-minlm` | FSCS | [`de`, `fr`, `it`] |
36
+ | `coastalcph/fairlex-cail-minlm` | CAIL | `zh` |
37
+
38
+
39
+ ## Load Pretrained Model
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModel
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained("coastalcph/fairlex-cail-minlm")
45
+ model = AutoModel.from_pretrained("coastalcph/fairlex-cail-minlm")
46
+ ```
47
+
48
+
49
+
50
+ ## Evaluation on downstream tasks
51
+
52
+ Consider the experiments in the article:
53
+
54
+ _Ilias Chalkidis, Tommaso Passini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, and Anders Søgaard. 2022. Fairlex: A multilingual bench-mark for evaluating fairness in legal text processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland._
55
+
56
+
57
+ ## Author - Publication
58
+
59
+ ```
60
+ @inproceedings{chalkidis-2022-fairlex,
61
+ author={Chalkidis, Ilias and Passini, Tommaso and Zhang, Sheng and
62
+ Tomada, Letizia and Schwemer, Sebastian Felix and Søgaard, Anders},
63
+ title={FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing},
64
+ booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
65
+ year={2022},
66
+ address={Dublin, Ireland}
67
+ }
68
+ ```
69
+
70
+ Ilias Chalkidis on behalf of [CoAStaL NLP Group](https://coastalcph.github.io)
71
+
72
+ | Github: [@ilias.chalkidis](https://github.com/iliaschalkidis) | Twitter: [@KiddoThe2B](https://twitter.com/KiddoThe2B) |