fclong's picture
Upload 396 files
8ebda9e

A newer version of the Gradio SDK is available: 5.14.0

Upgrade

中文 | English

UniMC

EMNLP 2022 论文 《Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective》源码

Update

  • [2022-10-18] Release preprint in arXiv.
  • [2022-10-14] Release code in GitHub.

Requirements

安装 fengshen 框架

git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
cd Fengshenbang-LM
pip install --editable .

Quick Start

你可以参考我们的 example.py 脚本,只需要将处理好的 train、dev、test 即输入模型即可。

import argparse
from fengshen.pipelines.multiplechoice import UniMCPipelines

total_parser = argparse.ArgumentParser("TASK NAME")
total_parser = UniMCPipelines.piplines_args(total_parser)
args = total_parser.parse_args()
    
pretrained_model_path = 'IDEA-CCNL/Erlangshen-UniMC-RoBERTa-110M-Chinese'
args.learning_rate=2e-5
args.max_length=512
args.max_epochs=3
args.batchsize=8
args.default_root_dir='./'
model = UniMCPipelines(args,model_path=pretrained_model_path)

train_data = [] 
dev_data = [] 
test_data = [{
    "texta": "就是废物,充电不进害得老子把主板烧了,客服不耐烦",
    "textb": "",
    "question": "",
    "choice": ["这是一条差评", "这是一条好评"],
    "answer": "这是一条差评",
    "label": 0,
    "id": 31
}]

if args.train:
    model.train(train_data, dev_data)
result = model.predict(test_data)

Pretrained Model

对于英文模型,我们使用14份 multiplechoice 数据集进行了预训练。在中文模型中,我们已经收集了48份数据集对模型进行预训练,我们已经将预训练模型开源到 HuggingFace 社区当中。

Experiments

English

为了测评 UniMC 的性能,在英文中,我们使用 14份 multiple-choice 数据集(具体数据参考原论文)来对模型进行预训练,使其具备做选择题的能力,

Zero-shot

Model T0 11B GLaM 60B FLAN 137B PaLM 540B UniMC 235M
ANLI R1 43.6 40.9 47.7 48.4 52.0
ANLI R2 38.7 38.2 43.9 44.2 44.4
ANLI R3 41.3 40.9 47.0 45.7 47.8
CB 70.1 33.9 64.1 51.8 75.7

Chinese

为了测评 UniMC 在中文场景下的性能我们使用 13份 有监督数据集来对模型进行预训练,预训练数据如下:

Task type Task # of option Data size
Multiple-choice c3 4 11.8k
Multiple-choice ClozeT 2 0.7k
Multiple-choice CMRC2019 n 11.4k
Multiple-choice GCRC 4 7.8k
Classification DuEE-Fin 12 4.3k
Classification DuEE1.0 65 10.3k
Classification Fudan 20 19.6k
Classification THUNEWS 10 180k
NLI CMNLI 3 39k
NLI SNLI 3 545.8k
Paraphrace AFQMC 2 34.3k
Paraphrace PAWS-X 2 49k
Paraphrace STS-B 2 80k

我们使用中文领域常用的benchmark来测试UniMC的性能,具体是FewCLUE的9个任务,我们在 test_public 上测评模型的性能。

Few-shot

Model eprstmt csldcp tnews iflytek ocnli bustm chid csl wsc Avg
Finetuning 65.4 35.5 49 32.8 33 60.7 14.9 50 55.6 44.1
PET 86.7 51.7 54.5 46 44 56 61.2 59.4 57.5 57.44
LM-BFF 85.6 54.4 53 47.1 41.6 57.6 61.2 51.7 54.7 56.32
P-tuning 88.3 56 54.2 57.6 41.9 60.9 59.3 62.9 58.1 59.91
EFL 84.9 45 52.1 42.7 66.2 71.8 30.9 56.6 53 55.91
UniMC-RoBERTa-110M 88.64 54.08 54.32 48.6 66.55 73.76 67.71 52.54 59.92 62.86
UniMC-RoBERTa-330M 89.53 57.3 54.25 50 70.59 77.49 78.09 55.73 65.16 66.46
UniMC-MegatronBERT-1.3B 89.278 60.9 57.46 52.89 76.33 80.37 90.33 61.73 79.15 72.05

Zero-shot

Model eprstmt csldcp tnews iflytek ocnli bustm chid csl wsc Avg
GPT-zero 57.5 26.2 37 19 34.4 50 65.6 50.1 50.3 43.4
PET-zero 85.2 12.6 26.1 26.6 40.3 50.6 57.6 52.2 54.7 45.1
NSP-BERT 86.9 47.6 51 41.6 37.4 63.4 52 64.4 59.4 55.96
ZeroPrompt - - - 16.14 46.16 - - - 47.98 -
Yuan1.0-13B 88.13 38.99 57.47 38.82 48.13 59.38 86.14 50 38.99 56.22
ERNIE3.0-240B 88.75 50.97 57.83 40.42 53.57 64.38 87.13 56.25 53.46 61.41
UniMC-RoBERTa-110M 86.16 31.26 46.61 26.54 66.91 73.34 66.68 50.09 53.66 55.7
UniMC-RoBERTa-330M 87.5 30.4 47.6 31.5 69.9 75.9 78.17 49.5 60.55 59.01
UniMC-MegatronBERT-1.3B 88.79 42.06 55.21 33.93 75.57 79.5 89.4 50.25 66.67 64.53

Dataset

我们已经定义好了 UniMC 所需的数据格式,你只需要将数据转化为下面的数据格式即可:

文本分类

{
    "texta": "街头偶遇2018款长安CS35,颜值美炸!或售6万起,还买宝骏510?",   
    "textb": "", 
    "question": "下面新闻属于哪一个类别?",   
    "choice": [
        "房产", 
        "汽车", 
        "教育",
        "军事"
        ], 
    "answer": "汽车", 
    "label": 1, 
    "id": 7759
}

情感分析

{
    "texta": "就是废物,充电不进害得老子把主板烧了,客服不耐烦",
    "textb": "",
    "question": "",
    "choice": ["这是一条差评", "这是一条好评"],
    "answer": "这是一条差评",
    "label": 0,
    "id": 31
}

语义匹配

{
    "texta": "不要借了我是试试看能否操作的",
    "textb": "",
    "question": "",
    "choice": ["不能理解为:借款审核期间能否取消借款", "可以理解为:借款审核期间能否取消借款"],
    "answer": "不能理解为:借款审核期间能否取消借款",
    "label": 0,
    "id": 0
}

自然语言推理

{
    "texta": "身上裹一件工厂发的棉大衣,手插在袖筒里",
    "textb": "",
    "question": "",
    "choice": ["不能推断出:身上至少一件衣服", "很难推断出:身上至少一件衣服", "可以推断出:身上至少一件衣服"],
    "answer": "可以推断出:身上至少一件衣服",
    "label": 2,
    "id": 0
}

Citation

如果你觉得本仓库帮助到了你,你可以使用下面方式引用我们的工作

@article{unimc,
  author    = {Ping Yang and
               Junjie Wang and
               Ruyi Gan and
               Xinyu Zhu and
               Lin Zhang and
               Ziwei Wu and
               Xinyu Gao and
               Jiaxing Zhang and
               Tetsuya Sakai},
  title     = {Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective},
  journal   = {CoRR},
  volume    = {abs/2210.08590},
  year      = {2022}
}

License

Apache License 2.0