brildev7 commited on
Commit
2a177ee
ยท
verified ยท
1 Parent(s): 2317775

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -192
README.md CHANGED
@@ -1,201 +1,89 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
  ## Model Details
13
-
14
  ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
 
76
  ## Training Details
77
-
78
  ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
-
201
-
 
1
  ---
2
+ library_name: peft
3
+ base_model: google/gemma-2b
4
+ language:
5
+ - ko
6
+ tags:
7
+ - summarization
8
+ - gemma
9
  ---
10
 
11
  # Model Card for Model ID
 
 
 
 
 
12
  ## Model Details
 
13
  ### Model Description
14
+ Summarise Korean sentences concisely
15
+ - **Developed by:** [Kang Seok Ju]
16
+ - **Contact:** [[email protected]]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Training Details
 
19
  ### Training Data
20
+ https://huggingface.co/datasets/raki-1203/ai_hub_summarization
21
+
22
+ # Inference Examples
23
+ ```
24
+ import os
25
+ import torch
26
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
27
+ from peft import PeftModel
28
+
29
+ model_id = "google/gemma-2b"
30
+ peft_model_id = "brildev7/gemma_2b_summarization_ko_sft_qlora"
31
+ quantization_config = BitsAndBytesConfig(
32
+ load_in_4bit=True,
33
+ bnb_4bit_compute_dtype=torch.float32,
34
+ bnb_4bit_quant_type="nf4"
35
+ )
36
+
37
+ model = AutoModelForCausalLM.from_pretrained(model_id,
38
+ quantization_config=quantization_config,
39
+ torch_dtype=torch.float32,
40
+ low_cpu_mem_usage=True,
41
+ attn_implementation="sdpa",
42
+ device_map="auto")
43
+ model = PeftModel.from_pretrained(model, peft_model_id)
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
46
+ tokenizer.pad_token_id = tokenizer.eos_token_id
47
+
48
+ # example
49
+ prompt_template = "๋‹ค์Œ ๊ธ€์„ ์š”์•ฝํ•˜์„ธ์š”.:{}\n์š”์•ฝ:"
50
+ passage = "๊ธฐํš์žฌ์ •๋ถ€๋Š” 20์ผ ์ด ๊ฐ™์€ ๋‚ด์šฉ์˜ '์ฃผ๋ฅ˜ ๋ฉดํ—ˆ ๋“ฑ์— ๊ด€ํ•œ ๋ฒ•๋ฅ  ์‹œํ–‰๋ น' ๊ฐœ์ •์•ˆ์„ ์ž…๋ฒ• ์˜ˆ๊ณ ํ–ˆ๋‹ค. ๊ฐœ์ •์•ˆ์—๋Š” ์ฃผ๋ฅ˜ ํŒ๋งค์—… ๋ฉดํ—ˆ ์ทจ์†Œ์˜ ์˜ˆ์™ธ์— ํ•ด๋‹นํ•˜๋Š” ์ฃผ๋ฅ˜์˜ ๋‹จ์ˆœ๊ฐ€๊ณตยท์กฐ์ž‘์˜ ๋ฒ”์œ„๋ฅผ ์ˆ ์ž” ๋“ฑ ๋นˆ ์šฉ๊ธฐ์— ์ฃผ๋ฅ˜๋ฅผ ๋‚˜๋ˆ  ๋‹ด์•„ ํŒ๋งคํ•˜๋Š” ๊ฒฝ์šฐ ๋“ฑ์ด ํฌํ•จ๋๋‹ค. ์‹๋‹นยท์ฃผ์  ๋“ฑ์—์„œ ์ฃผ๋ฅ˜๋ฅผ ํŒ๋งคํ•  ๋•Œ ์ˆ ์„ ์ž”์— ๋‚˜๋ˆ  ํŒ๋งคํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค. ์ข…ํ•ฉ์ฃผ๋ฅ˜๋„๋งค์—…์ž๊ฐ€ ์ฃผ๋ฅ˜์ œ์กฐ์ž ๋“ฑ์ด ์ œ์กฐยทํŒ๋งคํ•˜๋Š” ๋น„์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ ๋˜๋Š” ๋ฌด์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ๋ฅผ ์ฃผ๋ฅ˜์™€ ํ•จ๊ป˜ ์Œ์‹์  ๋“ฑ์— ๊ณต๊ธ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฃผ๋ฅ˜ํŒ๋งค ์ „์—…์˜๋ฌด ๋ฉดํ—ˆ์š”๊ฑด๋„ ์™„ํ™”ํ–ˆ๋‹ค. ํ˜„์žฌ ์•Œ์ฝ”์˜ฌ ๋„์ˆ˜๊ฐ€ 0%์ธ ์Œ๋ฃŒ๋Š” '๋ฌด์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ'๋กœ, 0% ์ด์ƒ 1% ๋ฏธ๋งŒ์ธ ๊ฒƒ์€ '๋น„์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ'๋กœ ๊ตฌ๋ถ„๋œ๋‹ค. ํ˜„ํ–‰ ๊ทœ์ •์ƒ ๋ฌด์•Œ์ฝ”์˜ฌยท๋น„์•Œ์ฝ”์˜ฌ ์ฃผ๋ฅ˜๋Š” ์ฃผ๋ฅ˜ ์—…์ž๊ฐ€ ์œ ํ†ตํ•  ์ˆ˜ ์—†๋Š”๋ฐ ์ด ๊ทœ์ •์„ ์™„ํ™”ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ธฐ์žฌ๋ถ€๋Š” ๋‹ค์Œ ๋‹ฌ 29์ผ๊นŒ์ง€ ์˜๊ฒฌ ์ˆ˜๋ ด์„ ๊ฑฐ์ณ ์ด๋ฅด๋ฉด ๋‹ค์Œ ๋‹ฌ ๋ง๋ถ€ํ„ฐ ์‹œํ–‰ํ•  ์˜ˆ์ •์ด๋‹ค๏ผŽ"
51
+ prompt = prompt_template.format(passage)
52
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
53
+ outputs = model.generate(**inputs,
54
+ max_new_tokens=1024,
55
+ temperature=0.2,
56
+ top_p=0.95,
57
+ do_sample=True,
58
+ use_cache=False)
59
+ print(tokenizer.decode(outputs[0]))
60
+ - ๊ธฐํš์žฌ์ •๋ถ€๋Š” 20์ผ ์ด๊ฐ™์€ ๋‚ด์šฉ์˜ '์ฃผ๋ฅ˜ ๋ฉดํ—ˆ ๋“ฑ์— ๊ด€ํ•œ ๋ฒ•๋ฅ  ์‹œํ–‰๋ น' ๊ฐœ์ •์•ˆ์„ ์ž…๋ฒ• ์˜ˆ๊ณ ํ–ˆ๋Š”๋ฐ, ์ด ๊ฐœ์ •์•ˆ์—๋Š” ์ฃผ๋ฅ˜ ํŒ๋งค์—… ๋ฉดํ—ˆ ์ทจ์†Œ์˜ ์˜ˆ์™ธ์— ํ•ด๋‹นํ•˜๋Š” ์ฃผ๋ฅ˜์˜ ๋‹จ์ˆœ๊ฐ€๊ณตยท์กฐ์ž‘์˜ ๋ฒ”์œ„๋ฅผ ์ˆ ์ž” ๋“ฑ ๋นˆ ์šฉ๊ธฐ์— ์ฃผ๋ฅ˜๋ฅผ ๋‚˜๋ˆ  ๋‹ด์•„ ํŒ๋งคํ•˜๋Š” ๊ฒฝ์šฐ ๋“ฑ์ด ํฌํ•จ๋๋‹ค.
61
+
62
+ # example
63
+ prompt_template = "๋‹ค์Œ ๊ธ€์„ ์š”์•ฝํ•˜์„ธ์š”.:{}\n์š”์•ฝ:"
64
+ passage = "์ง€๋‚œ 1์›” ์ผ๋ณธ ์˜ค์‚ฌ์นด ์šฐ๋ฉ”๋‹ค์˜ ๋ทฐํ‹ฐ์ƒต โ€˜์•ณ์ฝ”์Šค๋ฉ”โ€™์—์„œ ์ง„ํ–‰๋œ CJ์˜ฌ๋ฆฌ๋ธŒ์˜์˜ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ(PB) โ€˜๋ฐ”์ด์˜คํž ๋ณดโ€™์˜ ํŒ์—… ์Šคํ† ์–ด ํ˜„์žฅ. ์˜ค์‚ฌ์นด ์ตœ๋Œ€ ๊ทœ๋ชจ๋ฅผ ์ž๋ž‘ํ•˜๋Š” ์•ณ์ฝ”์Šค๋ฉ” ๋งค์žฅ ํ•œ ๊ฐ€์šด๋ฐ ๊พธ๋ฉฐ์ง„ ํŒ์—… ์Šคํ† ์–ด์—๋Š” ํ•œ๊ตญ์—์„œ ์ธ๊ธฐ ๋†’์€ ํ™”์žฅํ’ˆ์„ ์‹ค์ œ๋กœ ๊ฒฝํ—˜ํ•ด๋ณด๋ ค๋Š” ๊ณ ๊ฐ๋“ค๋กœ ๋ฐœ ๋””๋”œ ํ‹ˆ ์—†์ด ๋ถ์ ๊ฑฐ๋ ธ๋‹ค. ํƒ€์ด์™„ ๊ตญ์ ์ž์ด์ง€๋งŒ ์˜ค์‚ฌ์นด์—์„œ ๊ฑฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค๋Š” 32์‚ด ์ฟ ์ด์ž‰์”จ๋Š” ์ด๋‚  ํŒ์—… ์Šคํ† ์–ด๋ฅผ ์ฐพ์•„ ๋ฐ”์ด์˜คํž ๋ณด์˜ โ€˜ํƒ„ํƒ„ํฌ๋ฆผโ€™์„ ๊ตฌ๋งคํ–ˆ๋‹ค. ์‚ฌํšŒ๊ด€๊ณ„๋ง์„œ๋น„์Šค(SNS)์™€ ์œ ํŠœ๋ธŒ๋ฅผ ํ†ตํ•ด ํ•œ๊ตญ ํ™”์žฅํ’ˆ์ด ์ข‹๋‹ค๋Š” ํ‰์„ ๋“ค์–ด๋ณธ ํ„ฐ๋ผ ์ด๋ฒˆ ๊ธฐํšŒ์— ๊ตฌ๋งคํ•ด ์‚ฌ์šฉํ•ด๋ณด๊ธฐ๋กœ ๊ฒฐ์‹ฌํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์ฟ ์ด์ž‰์”จ๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ์„ ์“ฐ๋ฉด ํ•œ๊ตญ ์—ฌ์„ฑ์ฒ˜๋Ÿผ ์˜ˆ๋ป์ง€์ง€ ์•Š์„๊นŒ ๊ธฐ๋Œ€๊ฐ€ ๋œ๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ด๋‚  ์•ณ์ฝ”์Šค๋ฉ”๋Š” ๋ฐ”์ด์˜คํž ๋ณด ํŒ์—… ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ˆˆ์— ์ž˜ ๋„๋Š” ๋ฉ”์ธ ์ง„์—ด๋Œ€ ์ƒ๋‹น์ˆ˜๊ฐ€ ํ•œ๊ตญ ๋ธŒ๋žœ๋“œ ์ฐจ์ง€์˜€๋‹ค. ๋Œ€๋ถ€๋ถ„ ํ•œ๊ตญ์—์„œ๋„ ์ธ๊ธฐ๊ฐ€ ๋†’์€ ๋ธŒ๋žœ๋“œ๋“ค๋กœ, ์ž…๊ตฌ์—์„œ ๋ฐ”๋กœ ๋ณด์ด๋Š” ์ง„์—ด๋Œ€์—๋Š” โ€˜์›จ์ดํฌ๋ฉ”์ดํฌโ€™์™€ โ€˜ํ”ผ์น˜์”จโ€™, โ€˜์–ด๋ฎค์ฆˆโ€™๊ฐ€, ํ•ด์™ธ ๋ช…ํ’ˆ ๋ธŒ๋žœ๋“œ ์กด ์ •์ค‘์•™์—๋Š” โ€˜ํ—ค๋ผโ€™๊ฐ€ ์ž๋ฆฌํ•˜๊ณ  ์žˆ์—ˆ๋‹ค. ์ผ๋ณธ ๋‚ด K๋ทฐํ‹ฐ์˜ ์ธ๊ธฐ๊ฐ€ ์˜ˆ์‚ฌ๋กญ์ง€ ์•Š๋‹ค. โ€˜์ œ 3์ฐจ ํ•œ๋ฅ˜๋ถโ€™์ด๋ผ๊ณ ๊นŒ์ง€ ์ผ์ปฌ์–ด์ง€๋Š” ํ•œ๋ฅ˜์—ดํ’์„ ํƒ€๊ณ  ์ผ๋ณธ ๋‚ด K๋ทฐํ‹ฐ์˜ ์ž…์ง€๊ฐ€ ๋‚˜๋‚ ์ด ์น˜์†Ÿ๊ณ  ์žˆ๋‹ค. ๊ณผ๊ฑฐ์—๋Š” ์ผ๋ณธ ๋‚ด์—์„œ ํ•œ๊ตญ ๋ฌธํ™”๋ฅผ ์ข‹์•„ํ•˜๋Š” ์ผ๋ถ€ ์†Œ๋น„์ž๋“ค ์‚ฌ์ด์—์„œ๋งŒ ์œ ํ–‰ํ•˜๋Š” ์ˆ˜์ค€์ด์—ˆ๋‹ค๋ฉด, ์ง€๊ธˆ์€ ์ผ๋ณธ ๋ทฐํ‹ฐ ์‹œ์žฅ์— ํ•˜๋‚˜์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ K๋ทฐํ‹ฐ๊ฐ€ ์ž๋ฆฌ๋ฅผ ์žก์•˜๋‹ค๋Š” ํ‰๊ฐ€๋‹ค. 21์ผ ๋ฒ ์ธ์•ค๋“œ์ปดํผ๋‹ˆ์™€ ์œ ๋กœ๋ชจ๋‹ˆํ„ฐ์— ๋”ฐ๋ฅด๋ฉด K๋ทฐํ‹ฐ์˜ ์ผ๋ณธ ์ง€์—ญ๋ณ„ ์นจํˆฌ์œจ(ํŠน์ • ๊ธฐ๊ฐ„ ๋™์•ˆ ํŠน์ • ์ƒํ’ˆ ์†Œ๋น„ ๊ทœ๋ชจ ๋น„์ค‘)์€ 2017๋…„ 1%์—์„œ 2022๋…„ 4.9%๋กœ 5๋…„ ๋งŒ์— 5๋ฐฐ๊ฐ€ ์ฆ๊ฐ€ํ–ˆ๋‹ค. ์ตœ๊ทผ 3๋…„๊ฐ„ ์—ฐํ‰๊ท  ์„ฑ์žฅ๋ฅ ์€ 20%๊ฐ€ ๋„˜๋Š”๋‹ค. ์ง€๋‚œํ•ด์—๋Š” ์ผ๋ณธ ์ˆ˜์ž… ํ™”์žฅํ’ˆ ๊ตญ๊ฐ€๋ณ„ ๋น„์ค‘์—์„œ ํ•œ๊ตญ์ด ์ฒ˜์Œ์œผ๋กœ ํ”„๋ž‘์Šค๋ฅผ ์ œ์น˜๊ณ  1์œ„์— ์˜ค๋ฅด๊ธฐ๋„ ํ–ˆ๋‹ค. ์„œํšจ์ฃผ ๋ฒ ์ธ์•ค๋“œ์ปดํผ๋‹ˆ ํŒŒํŠธ๋„ˆ๋Š” ์ง€๊ธˆ๋ณด๋‹ค 3~4๋ฐฐ ์ด์ƒ ์„ฑ์žฅํ•  ์—ฌ๋ ฅ์ด ์ถฉ๋ถ„ํ•˜๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ผ๋ณธ ์—ฌ์„ฑ๋“ค์ด K๋ทฐํ‹ฐ์— ๋งค๋ฃŒ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ. ๊ฐ€์žฅ ํฐ ์ด์œ ๋กœ๋Š” โ€˜๋†’์€ ๊ฐ€์„ฑ๋น„(๊ฐ€๊ฒฉ ๋Œ€๋น„ ์„ฑ๋Šฅ)โ€™๊ฐ€ ๊ผฝํžŒ๋‹ค. ์—…๊ณ„์— ๋”ฐ๋ฅด๋ฉด ์‹ค์ œ ์ผ๋ณธ์—์„œ ๋งŽ์ด ํŒ๋งค๋˜๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ ๋ธŒ๋žœ๋“œ์˜ ๊ธฐ์ดˆ์ œํ’ˆ๋“ค์€ ์ผ๋ณธ ๋ธŒ๋žœ๋“œ์— ๋น„ํ•ด ์ œํ’ˆ ๊ฐ€๊ฒฉ์ด 10~20% ๊ฐ€๋Ÿ‰ ์ €๋ ดํ•œ ํŽธ์ด๋‹ค. ์ด๋Š” ํ•œ๊ตญ์ฝœ๋งˆ์™€ ์ฝ”์Šค๋งฅ์Šค ๊ฐ™์€ ๊ตญ๋‚ด ํ™”์žฅํ’ˆ OEM(์ฃผ๋ฌธ์ž ์ƒํ‘œ ๋ถ€์ฐฉ ์ƒ์‚ฐ)ยทODM(์ฃผ๋ฌธ์ž ๊ฐœ๋ฐœ์ƒ์‚ฐ) ์ œ์กฐ์‚ฌ๋“ค์˜ ์„ฑ์žฅ ๋•์ด ํฌ๋‹ค. ์ด๋“ค์˜ ๊ธฐ์ˆ ๋ ฅ์€ ์„ธ๊ณ„ ์ตœ๊ณ  ์ˆ˜์ค€์œผ๋กœ, ์„ธ๊ณ„ ์ตœ๋Œ€ ํ™”์žฅํ’ˆ ๊ธฐ์—…์ธ ๋กœ๋ ˆ์•Œ๋„ ๊ณ ๊ฐ์‚ฌ์ผ ์ •๋„๋‹ค. ์ด๋“ค์€ ๋‹จ์ˆœ ์ œํ’ˆ ์ œ์กฐ๋ฅผ ๋„˜์–ด ์‹ ์ œํ’ˆ์„ ๊ฐœ๋ฐœํ•ด ๋ธŒ๋žœ๋“œ์— ๋จผ์ € ์ œ์•ˆํ•˜๊ณ  ๋˜ ํ•„์š”์‹œ ๋งˆ์ผ€ํŒ…๊นŒ์ง€ ์ง€์›ํ•ด ๋ธŒ๋žœ๋“œ๋ฅผ ํ‚ค์šฐ๋Š” ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค. ํ•œ๊ตญ ๋ทฐํ‹ฐ ๋ธŒ๋žœ๋“œ ๋Œ€๋ถ€๋ถ„์ด ์ด๋“ค์„ ํ†ตํ•ด ์ œํ’ˆ์„ ๋งŒ๋“ค๊ณ  ์žˆ์–ด ์ค‘์†Œ ๊ทœ๋ชจ K๋ทฐํ‹ฐ ๋ธŒ๋žœ๋“œ๋„ ํ’ˆ์งˆ์ด ๋ณด์žฅ๋œ๋‹ค๋Š” ์–˜๊ธฐ๋‹ค. ๋˜ K๋ทฐํ‹ฐ ์ œํ’ˆ์˜ ๊ฐ•์ ์œผ๋กœ๋Š” โ–ณ๋…ํŠนํ•˜๊ณ  ํŠธ๋ Œ๋””ํ•œ ์ปจ์…‰ โ–ณ๋ฐœ๋น ๋ฅธ ์‹ ์ œํ’ˆ ์ถœ์‹œ โ–ณ์˜ˆ์œ ํŒจํ‚ค์ง€ ๋“ฑ์ด ๊ฑฐ๋ก ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ฆํ•˜๋“ฏ ์ตœ๊ทผ ์ผ๋ณธ์—์„  ์œ„์˜ ๊ฐ•์ ๋“ค์„ ๊ฐ–์ถ˜ ํ•œ๊ตญ์˜ ์‹ ์ง„ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ๋“ค์ด ์ธ๊ธฐ๋‹ค. ์‹ค์ œ๋กœ ์ผ๋ณธ ๋‚ด ํŠธ์œ„ํ„ฐ์™€ ์œ ํŠœ๋ธŒ ๋“ฑ SNS์—์„œ๋Š” ์ˆ˜์‹ญ~์ˆ˜๋ฐฑ๋งŒ ํŒ”๋กœ์›Œ๋ฅผ ๋ณด์œ ํ•œ ํ˜„์ง€ ์ธํ”Œ๋ฃจ์–ธ์„œ๋“ค๋„ ์ผ๋ช… โ€˜๋‚ด๋ˆ๋‚ด์‚ฐโ€™(๋‚ด ๋ˆ ์ฃผ๊ณ  ๋‚ด๊ฐ€ ์‚ฐ ๋ฌผ๊ฑด) ์˜์ƒ์—์„œ ์ž๋ฐœ์ ์œผ๋กœ K๋ทฐํ‹ฐ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ ์ œํ’ˆ์„ ์†Œ๊ฐœํ•˜๊ณ  ์žˆ๋‹ค. ์ง€๋‚œ 1์›” ์ผ๋ณธ ์˜ค์‚ฌ์นด์— ์†Œ์žฌํ•œ ๋ทฐํ‹ฐ ๋žญํ‚น์ƒต โ€˜์•ณ์ฝ”์Šค๋ฉ” ์šฐ๋ฉ”๋‹ค์ โ€™์—์„œ ์ผ๋ณธ ์—ฌ์„ฑ๋“ค์ด ํ•œ๊ตญ ์ฝ”์Šค๋ฉ”ํ‹ฑ ๋ธŒ๋žœ๋“œ โ€˜๋ผ์นด(Laka)โ€™์˜ ์ œํ’ˆ์„ ์‚ดํŽด๋ณด๊ณ  ์žˆ๋Š” ๋ชจ์Šต. [๊น€ํšจํ˜œ ๊ธฐ์ž] ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ โ€˜๋ผ์นดโ€™๋‹ค. ํ•œ๊ตญ๋ณด๋‹ค ์ผ๋ณธ์—์„œ ๋” ์œ ๋ช…ํ•œ ๋ผ์นด๋Š” 100๋งŒ ๊ตฌ๋…์ž๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋Š” ๋ฉ”์ดํฌ์—… ์•„ํ‹ฐ์ŠคํŠธ์ด์ž ์œ ํŠœ๋ฒ„ โ€˜ํžˆ๋กœโ€™(์˜ค๋‹ค๊ธฐ๋ฆฌ ํžˆ๋กœ)๊ฐ€ ์˜์ƒ์—์„œ ์ œํ’ˆ์„ ์ถ”์ฒœํ•ด ํ™๋ณด ํšจ๊ณผ๋ฅผ ํ†กํ†กํžˆ ๋ดค๋‹ค. ์ด๋ฏผ๋ฏธ ๋ผ์นด ๋Œ€ํ‘œ๋Š” ์ผ๋ณธ์—์„œ ํŠน์ • ์ œํ’ˆ์ด ๊ฐ‘์ž๊ธฐ ํ•˜๋ฃจ์— ์ˆ˜์ฒœ๊ฐœ๊ฐ€ ํŒ”๋ ค ๋ฌด์Šจ ์ผ์ธ๊ฐ€ ๋ดค๋Š”๋ฐ, ํ˜„์ง€ ์œ ๋ช… ์œ ํŠœ๋ฒ„๊ฐ€ ์ถ”์ฒœํ•œ ์˜์ƒ์ด ์˜ฌ๋ผ์™”๋”๋ผ๋ฉฐ ํ˜‘์ฐฌ์ด๋‚˜ ๊ด‘๊ณ ๊ฐ€ ์•„๋‹ˆ์–ด์„œ ๋” ๋†€๋ž๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ด์— ์ง€๋‚œ 2020๋…„ ์ฒ˜์Œ ์ผ๋ณธ์— ์ง„์ถœํ•œ ๋ผ์นด๋Š” ์˜ฌํ•ด 1์›” ๋ง ์ผ๋ณธ ์ „์—ญ ์•ฝ 350์—ฌ๊ฐœ ๋งค์žฅ์— ์ž…์ ํ•˜๋Š” ์„ฑ๊ณผ๋ฅผ ์˜ฌ๋ ธ๋‹ค. 2021๋…„ 47์–ต์›์— ๋ถˆ๊ณผํ–ˆ๋˜ ๋ผ์นด์˜ ๋งค์ถœ๋„ ์ง€๋‚œํ•ด 4๋ฐฐ๊ฐ€ ๋„˜๊ฒŒ ์ƒ์Šนํ•ด 200์–ต์›์— ์œก๋ฐ•ํ•œ๋‹ค. ์ผ๋ณธ ์‹œ์žฅ์—์„œ ๋‘๊ฐ์„ ๋ณด์ด๋Š” ๊ตญ๋‚ด ํ™”์žฅํ’ˆ ๋ธŒ๋žœ๋“œ๋“ค์ด ๋Š˜๋ฉด์„œ ์ƒˆ๋กญ๊ฒŒ ์ง„์ถœ์„ ํƒ€์ง„ํ•˜๊ฑฐ๋‚˜ ์ค€๋น„ํ•˜๊ณ  ์žˆ๋Š” ์—…์ฒด๋“ค๋„ ๋Š˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋™์•ˆ ํ•œ๊ตญ ํ™”์žฅํ’ˆ์˜ ๊ฐ€์žฅ ํฐ ์‹œ์žฅ์ด์—ˆ๋˜ ์ค‘๊ตญ์ด ๊ฒฝ๊ธฐ ์นจ์ฒด ๋ฐ ์ •์น˜์  ์ด์Šˆ ๋“ฑ์œผ๋กœ ์ชผ๊ทธ๋ผ๋“ค๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์ผ๋ณธ์ด ์ด๋ฅผ ๋Œ€์ฒดํ•  ์ƒˆ๋กœ์šด ์‹œ์žฅ์œผ๋กœ ๋ถ€์ƒํ•œ ๊ฒƒ์ด๋‹ค. ์ผ๋ณธ ํ™”์žฅํ’ˆ ํŒ๋งค ์ฑ„๋„๋“ค๋„ K๋ทฐํ‹ฐ ์œ ์น˜์— ์ ๊ทน์ ์ด๋‹ค. ์•ณ์ฝ”์Šค๋ฉ”์˜ ๊ฒฝ์šฐ ๊ฑฐ์˜ ๋งค๋‹ฌ K๋ทฐํ‹ฐ ํŒ์—…์ด ์—ด๋ฆฌ๊ณ  ์žˆ๋Š” ์ˆ˜์ค€์œผ๋กœ, ์˜ค๋Š” 5์›”์—๋Š” ๋„์ฟ„์ ์—์„œ K๋ทฐํ‹ฐ ํŽ˜์Šคํ‹ฐ๋ฒŒ๋„ ์—ด ๊ณ„ํš์ด๋‹ค. ๋กœํ”„ํŠธ์™€ ํ”„๋ผ์ž ๋“ฑ๋„ K๋ทฐํ‹ฐ ์œ ์น˜ ๊ฒฝ์Ÿ์ด ๋œจ๊ฒ๋‹ค. CJ์˜ฌ๋ฆฌ๋ธŒ์˜ ๊ด€๊ณ„์ž๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ์— ๋Œ€ํ•œ ๋ฐ˜์‘์ด ์ข‹๊ณ  ํŠนํžˆ ์˜ฌ๋ฆฌ๋ธŒ์˜์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ๋ธŒ๋žœ๋“œ์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ๋†’๋‹ค ๋ณด๋‹ˆ ํ”Œ๋žซํผ์—์„œ ๋จผ์ € ํŒ์—… ์š”์ฒญ์ด ์™”๋‹ค๋ฉฐ ์•ž์œผ๋กœ๋„ ์ผ๋ณธ ์‹œ์žฅ ์œ ํ†ต์— ๋”์šฑ ์ ๊ทน์ ์œผ๋กœ ๋‚˜์„œ๋ ค ํ•œ๋‹ค๊ณ  ์ „ํ–ˆ๋‹ค."
65
+ prompt = prompt_template.format(passage)
66
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
67
+ outputs = model.generate(**inputs,
68
+ max_new_tokens=1024,
69
+ temperature=0.2,
70
+ top_p=0.95,
71
+ do_sample=True,
72
+ use_cache=False)
73
+ print(tokenizer.decode(outputs[0]))
74
+ - CJ์˜ฌ๋ฆฌ๋ธŒ์˜์˜ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ ๋ฐ”์ด์˜คํž ๋ณด์˜ ํŒ์—… ์Šคํ† ์–ด ํ˜„์žฅ์—์„œ ํ•œ๊ตญ ํ™”์žฅํ’ˆ์„ ์‹ค์ œ๋กœ ๊ฒฝํ—˜ํ•ด๋ณด๋ ค๋Š” ๊ณ ๊ฐ๋“ค์ด ๋ถ์ ๊ฑฐ๋ ธ์œผ๋ฉฐ, CJ์˜ฌ๋ฆฌ๋ธŒ์˜ ๊ด€๊ณ„์ž๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ์— ๋Œ€ํ•œ ๋ฐ˜์‘์ด ์ข‹๊ณ  ํŠนํžˆ ์˜ฌ๋ฆฌ๋ธŒ์˜์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ๋ธŒ๋žœ๋“œ์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ๋†’๋‹ค ๋ณด๋‹ˆ ํ”Œ๋žซํผ์—์„œ ๋จผ์ € ํŒ์—… ์š”์ฒญ์ด ์™”๋‹ค๋ฉฐ ์•ž์œผ๋กœ๋„ ์ผ๋ณธ ์‹œ์žฅ ์œ ํ†ต์— ๋”์šฑ ์ ๊ทน์ ์œผ๋กœ ๋‚˜์„œ๋ ค ํ•œ๋‹ค๊ณ  ์ „ํ–ˆ๋‹ค.
75
+
76
+ # example
77
+ prompt_template = "๋‹ค์Œ ๊ธ€์„ ์š”์•ฝํ•˜์„ธ๏ฟฝ๏ฟฝ.:{}\n์š”์•ฝ:"
78
+ passage = "์ „ ์„ธ๊ณ„ ์œ ๋ช…์ธ 4์ฒœ๋ช… ๊ฐ€๋Ÿ‰์ด ๋”ฅํŽ˜์ดํฌ(์ธ๊ณต์ง€๋Šฅ์œผ๋กœ ๋งŒ๋“  ์˜์ƒยท์ด๋ฏธ์ง€ ํ•ฉ์„ฑ ์กฐ์ž‘๋ฌผ) ์Œ๋ž€๋ฌผ๋กœ ํ”ผํ•ด๋ฅผ ๋ดค๋‹ค๋Š” ๋ถ„์„๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค. 21์ผ(ํ˜„์ง€์‹œ๊ฐ„) ์˜๊ตญ ์ผ๊ฐ„ ๊ฐ€๋””์–ธ์˜ ๋ณด๋„์— ๋”ฐ๋ฅด๋ฉด ์˜๊ตญ ๋ฐฉ์†ก์‚ฌ ์ฑ„๋„4 ๋‰ด์Šค๋Š” ๋ฐฉ๋ฌธ์ž๊ฐ€ ๋งŽ์€ ๋”ฅํŽ˜์ดํฌ ์›น์‚ฌ์ดํŠธ 5๊ณณ์„ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ์˜๊ตญ์ธ 250๋ช…์„ ํฌํ•จํ•ด ์œ ๋ช…์ธ 4์ฒœ๋ช… ๊ฐ€๋Ÿ‰์˜ ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ์„ ์ฐพ์•„๋ƒˆ๋‹ค๊ณ  ๋ฐํ˜”๋‹ค. ์ฑ„๋„4 ๋‰ด์Šค๋Š” ๋ถ„์„ ๋Œ€์ƒ ๋”ฅํŽ˜์ดํฌ ์›น์‚ฌ์ดํŠธ๊ฐ€ 3๊ฐœ์›”๊ฐ„ 1์–ต๋ทฐ๋ฅผ ๊ธฐ๋กํ–ˆ๋‹ค๊ณ  ๋ฐํžˆ๋ฉด์„œ ํ”ผํ•ด์ž ์ค‘์—๋Š” ์œ ๋ช…ํ•œ ์—ฌ๋ฐฐ์šฐ์™€ TV ์Šคํƒ€, ์Œ์•…๊ฐ€, ์œ ํŠœ๋ฒ„ ๋“ฑ์ด ํฌํ•จ๋ผ ์žˆ๋‹ค๊ณ  ์„ค๋ช…ํ–ˆ๋‹ค. ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ ํ”ผํ•ด์ž๋กœ ํ™•์ธ๋œ ์ฑ„๋„4 ๋‰ด์Šค์˜ ์ง„ํ–‰์ž ์บ์‹œ ๋‰ด๋จผ์€ โ€œ์ด๊ฒƒ(๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ)์„ ๋งŒ๋“  ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ์ž์‹ ์˜ ๊ฐ€์ƒ ๋ฒ„์ „, ๊ฐ€์งœ ๋ฒ„์ „์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด ์ •๋ง ์•…์˜์ ์œผ๋กœ ๋Š๊ปด์ง„๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์˜๊ตญ์€ ์ง€๋‚œ 1์›”31์ผ ์ด๋ž˜ ์˜จ๋ผ์ธ ์•ˆ์ „๋ฒ•(Online Safety Act)์— ๋”ฐ๋ผ ๋™์˜ ์—†๋Š” ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ ๊ณต์œ ๋ฅผ ๋ถˆ๋ฒ•์œผ๋กœ ๊ทœ์ •ํ–ˆ์œผ๋‚˜ ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ ์ œ์ž‘์€ ๋ถˆ๋ฒ•ํ™”ํ•˜์ง€๋Š” ์•Š์•˜๋‹ค. ์˜๊ตญ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฏธ๊ตญ ๋“ฑ ์ „์„ธ๊ณ„์—์„œ ์œ ๋ช…์ธ์˜ ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ์ด ์‚ฌํšŒ์  ๋ฌธ์ œ๋กœ ๋ถ€๊ฐ๋˜๊ณ  ์žˆ๋‹ค. ์ง€๋‚œ 1์›” ๋ฏธ๊ตญ์˜ ์„ธ๊ณ„์  ํŒ์Šคํƒ€ ํ…Œ์ผ๋Ÿฌ ์Šค์œ„ํ”„ํŠธ์˜ ์‚ฌ์ง„์„ ํ•ฉ์„ฑํ•œ ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€ ์ด๋ฏธ์ง€๊ฐ€ ์†Œ์…œ๋ฏธ๋””์–ด ์—‘์Šค(X, ์˜› ํŠธ์œ„ํ„ฐ) ๋“ฑ์—์„œ ํ™•์‚ฐ๋˜๋ฉด์„œ ๊ทœ์ œ์— ๋Œ€ํ•œ ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค."
79
+ prompt = prompt_template.format(passage)
80
+
81
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
82
+ outputs = model.generate(**inputs,
83
+ max_new_tokens=1024,
84
+ temperature=0.2,
85
+ top_p=0.95,
86
+ do_sample=True,
87
+ use_cache=False)
88
+ print(tokenizer.decode(outputs[0]))
89
+ - 21์ผ ์˜๊ตญ ์ผ๊ฐ„ ๊ฐ€๋””์–ธ์€ ์˜๊ตญ์ธ 250๋ช…์„ ํฌํ•จํ•ด ์œ ๋ช…์ธ 4์ฒœ๋ช… ๊ฐ€๋Ÿ‰์˜ ๋”ฅํŽ˜์ดํฌ ์Œ๋ž€๋ฌผ์„ ์ฐพ์•„๋ƒˆ๋‹ค๊ณ  ๋ฐํžˆ๋ฉด์„œ ํ”ผํ•ด์ž ์ค‘์—๋Š” ์œ ๋ช…ํ•œ ์—ฌ๋ฐฐ์šฐ์™€ TV ์Šคํƒ€, ์Œ์•…๊ฐ€, ์œ ํŠœ๋ฒ„ ๋“ฑ์ด ํฌํ•จ๋ผ ์žˆ๋‹ค๊ณ  ์„ค๋ช…ํ–ˆ๋‹ค.