pochunhsu commited on
Commit
b247ae6
·
1 Parent(s): 5d377dd

Upload 6 files

Browse files
Files changed (3) hide show
  1. README.md +10 -11
  2. config.json +2 -2
  3. pytorch_model.bin +2 -2
README.md CHANGED
@@ -2,18 +2,17 @@
2
  license: bigscience-bloom-rail-1.0
3
  language:
4
  - zh
5
- - en
6
  pipeline_tag: text-generation
7
  widget:
8
- - text: "四月的某一天,天氣晴朗寒冷,"
9
- - text: "問:台灣最高的建築物是?答:"
10
  ---
11
 
12
  <h1 style='text-align: center '>BLOOM-zh</h1>
13
  <h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
14
  <h3 style='text-align: center '>Model Card</h3>
15
 
16
- Version 1.0 / 20.Feb.2023
17
 
18
  This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
19
 
@@ -33,10 +32,10 @@ BLOOM-zh is trained extendedly on large amount of Traditional Chinese text data.
33
 
34
  * **Developed by:** MediaTek Research
35
  * **Model Type:** Transformer-based Language Model
36
- * **Version:** 1.0.0
37
  * **Languages:** Multiple; see [training data](#training-data)
38
  * **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-1b1-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
39
- * **Release Date Estimate:** Wednesday, 22.February.2023
40
  * **Send Questions to:** [email protected]
41
  * **Paper:** [https://arxiv.org/abs/2303.04715](https://arxiv.org/abs/2303.04715)
42
  * **Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
@@ -65,7 +64,7 @@ For the uses of the model, please refer to [BLOOM](https://huggingface.co/bigsci
65
  ## Training Data
66
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
67
 
68
- We trained the 1B1 parameter model on a total of 6 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/abs/2303.04715).
69
 
70
  ## Risks and Limitations
71
  *This section identifies foreseeable harms and misunderstandings.*
@@ -75,9 +74,9 @@ For risks and limitations, please refer to [BLOOM](https://huggingface.co/bigsci
75
  ### Factors
76
  *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
77
 
78
- - The model is trained on Traditional Chinese and English. However, the pretrained weights capture more than 40 different languages.
79
 
80
- - The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
81
 
82
 
83
  ## Recommendations
@@ -90,5 +89,5 @@ For recommendations, please refer to [BLOOM](https://huggingface.co/bigscience/b
90
  ## Model Card Authors
91
  *Ordered roughly chronologically and by amount of time spent.*
92
 
93
- Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-Chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
94
- <!-- # Bloom_eval -->
 
2
  license: bigscience-bloom-rail-1.0
3
  language:
4
  - zh
 
5
  pipeline_tag: text-generation
6
  widget:
7
+ - text: 四月的某一天,天氣晴朗寒冷,
8
+ - text: 問:台灣最高的建築物是?答:
9
  ---
10
 
11
  <h1 style='text-align: center '>BLOOM-zh</h1>
12
  <h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
13
  <h3 style='text-align: center '>Model Card</h3>
14
 
15
+ Version 2.0 / 10.April.2023
16
 
17
  This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
18
 
 
32
 
33
  * **Developed by:** MediaTek Research
34
  * **Model Type:** Transformer-based Language Model
35
+ * **Version:** 2.0.0
36
  * **Languages:** Multiple; see [training data](#training-data)
37
  * **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-1b1-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
38
+ * **Release Date Estimate:** Monday, 10.April.2023
39
  * **Send Questions to:** [email protected]
40
  * **Paper:** [https://arxiv.org/abs/2303.04715](https://arxiv.org/abs/2303.04715)
41
  * **Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
 
64
  ## Training Data
65
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
66
 
67
+ We trained the 1B1 parameter model on a total of 11.5 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/abs/2303.04715).
68
 
69
  ## Risks and Limitations
70
  *This section identifies foreseeable harms and misunderstandings.*
 
74
  ### Factors
75
  *This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
76
 
77
+ - The model is trained on Traditional Chinese. However, the pretrained weights capture more than 40 different languages.
78
 
79
+ - The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions.
80
 
81
 
82
  ## Recommendations
 
89
  ## Model Card Authors
90
  *Ordered roughly chronologically and by amount of time spent.*
91
 
92
+ Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-Chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Chi-Ming Chung, Yi-Chang Chen, Da-Shan Shiu, Wei-Yun Ma
93
+ <!-- # Bloom_eval -->
config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "apply_residual_connection_post_layernorm": false,
3
  "architectures": [
4
- "BloomForCausalLM"
5
  ],
6
  "attention_dropout": 0.0,
7
  "attention_softmax_in_fp32": true,
@@ -27,4 +27,4 @@
27
  "unk_token_id": 0,
28
  "use_cache": true,
29
  "vocab_size": 250880
30
- }
 
1
  {
2
  "apply_residual_connection_post_layernorm": false,
3
  "architectures": [
4
+ "BloomModel"
5
  ],
6
  "attention_dropout": 0.0,
7
  "attention_softmax_in_fp32": true,
 
27
  "unk_token_id": 0,
28
  "use_cache": true,
29
  "vocab_size": 250880
30
+ }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:318c0b0e0f726b156e766691890a2ac5fc410f895b380defb53a8ba259c4af59
3
- size 2130720033
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24b882f6f6f1ac9d166797bedc845217245d29885c4e758dd5a3fb9b22e931ef
3
+ size 4261358455