sonsus commited on
Commit
c2ba4d5
ยท
1 Parent(s): 4674242
This view is limited to 50 files because it contains too many changes. ย  See raw diff
Files changed (50) hide show
  1. README.md +122 -0
  2. README_kr.md +123 -0
  3. __pycache__/analysis_utils.cpython-311.pyc +0 -0
  4. __pycache__/app.cpython-311.pyc +0 -0
  5. __pycache__/query_comp.cpython-311.pyc +0 -0
  6. __pycache__/view_utils.cpython-311.pyc +0 -0
  7. analysis_utils.py +338 -0
  8. app.py +452 -0
  9. eval_models_list.txt +3 -0
  10. eval_prompt_list.txt +3 -0
  11. guide_mds/input_jsonls_en.md +37 -0
  12. guide_mds/input_jsonls_kr.md +34 -0
  13. modules/__pycache__/nav.cpython-311.pyc +0 -0
  14. modules/nav.py +57 -0
  15. pages/__pycache__/see_results.cpython-311.pyc +0 -0
  16. pages/brief_intro.py +40 -0
  17. pages/quick_start_guide.py +14 -0
  18. pages/see_results.py +365 -0
  19. precommit.sh +5 -0
  20. query_comp.py +137 -0
  21. requirements.txt +14 -0
  22. run.sh +4 -0
  23. streamlit_app_local/.streamlit/config.toml +4 -0
  24. streamlit_app_local/README.md +24 -0
  25. streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc +0 -0
  26. streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc +0 -0
  27. streamlit_app_local/__pycache__/app.cpython-311.pyc +0 -0
  28. streamlit_app_local/__pycache__/app.cpython-38.pyc +0 -0
  29. streamlit_app_local/__pycache__/query_comp.cpython-311.pyc +0 -0
  30. streamlit_app_local/__pycache__/view_utils.cpython-311.pyc +0 -0
  31. streamlit_app_local/__pycache__/view_utils.cpython-38.pyc +0 -0
  32. streamlit_app_local/analysis_utils.py +338 -0
  33. streamlit_app_local/app.py +318 -0
  34. streamlit_app_local/eval_models_list.txt +3 -0
  35. streamlit_app_local/eval_prompt_list.txt +5 -0
  36. streamlit_app_local/guide_mds +1 -0
  37. streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc +0 -0
  38. streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc +0 -0
  39. streamlit_app_local/modules/nav.py +46 -0
  40. streamlit_app_local/pages/brief_intro.py +40 -0
  41. streamlit_app_local/pages/see_results.py +391 -0
  42. streamlit_app_local/query_comp.py +135 -0
  43. streamlit_app_local/requirements.txt +5 -0
  44. streamlit_app_local/run.sh +3 -0
  45. streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl +6 -0
  46. streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl +6 -0
  47. streamlit_app_local/user_submit/llm/240413_dora.jsonl +6 -0
  48. streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl +6 -0
  49. streamlit_app_local/user_submit/llm/240413_full.jsonl +6 -0
  50. streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl +6 -0
README.md CHANGED
@@ -12,3 +12,125 @@ short_description: VARCO Arena is a reference-free LLM benchmarking approach
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ # Varco Arena
17
+ Varco Arena conducts tournaments between models to be compared for each test set command, ranking models accurately at an affordable price. This is more accurate and cost-effective than rating win rates by comparing against reference outputs.
18
+
19
+ For more information, the followings may help understanding how it works.
20
+ * [Paper](https://huggingface.co/papers/2411.01281)
21
+ * [Blog Post (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
22
+
23
+
24
+ ## Quickstart
25
+ ### Running Web Demo locally (streamlit, Recommended!)
26
+ ```bash
27
+ git clone [THIS_REPO]
28
+ # install requirements below. we recommend miniforge to manage environment
29
+ cd streamlit_app_local
30
+ bash run.sh
31
+ ```
32
+ For more details, see `[THIS_REPO]/streamlit_app_local/README.md`
33
+
34
+ ### CLI use
35
+ * located at
36
+ * `varco_arena/`
37
+ * debug configurations for vscode at
38
+ * `varco_arena/.vscode`
39
+ ```bash
40
+ ## gpt-4o-mini as a judge
41
+ python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
42
+ ## vllm-openai served LLM as a judge
43
+ python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
44
+
45
+ # dbg lines
46
+ ## openai api judge dbg
47
+ python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
48
+ ## other testing lines
49
+ python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
50
+ ## dummy judge dbg (checking errors without api requests)
51
+ python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
52
+ ```
53
+
54
+ ## Requirements
55
+ We tested this on `python = 3.11.9` env: `requirements.txt`
56
+ ```
57
+ openai>=1.17.0
58
+ munch
59
+ pandas
60
+ numpy
61
+ tqdm>=4.48.0
62
+ plotly
63
+ scikit-learn
64
+ kaleido
65
+ tiktoken>=0.7.0
66
+ pyyaml
67
+ transformers
68
+ streamlit>=1.40.2
69
+ openpyxl
70
+ git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
71
+
72
+ # Linux
73
+ uvloop
74
+ # Windows
75
+ winloop
76
+ ```
77
+
78
+ #### Argument
79
+ - -i, --input : directory path which contains input jsonlines files (llm outputs)
80
+ - -o, --output_dir : directory where results to be put
81
+ - -e, --evaluation : judge model specification (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", \[vllm-served-model-name\])
82
+ - -k, --openai_api_key : OpenAI API Key
83
+ - -u, --openai_url: URL to openai_styled_llm_server (requested by openai sdk)
84
+
85
+ #### advanced
86
+ - -j, --n_jobs : n jobs to be put to `asyncio.semaphore(n=)`
87
+ - -p, --evalprompt : [see the directory](./varco_arena/prompts/*.yaml)
88
+ - -lr, --limit_requests : vLLM OpenAI server request limit (default: 7,680)
89
+ - -lt, --limit_tokens : vLLM OpenAI server token limit (default: 15,728,640)
90
+
91
+ #### Input Data Format
92
+ [input jsonl guides](./streamlit_app_local/guide_mds/input_jsonls_en.md)
93
+
94
+
95
+ ## Contributing & Customizing
96
+ #### Do this after git clone and installation
97
+ ```bash
98
+ pip install pre-commit
99
+ pre-commit install
100
+ ```
101
+ #### before commit
102
+ ```bash
103
+ bash precommit.sh # black formatter will reformat the codes
104
+ ```
105
+
106
+ ## FAQ
107
+ * I want to apply my custom judge prompt to run Varco Arena
108
+ * [`./varco_arena/prompts/`](./varco_arena/prompts/__init__.py) defines the prompts with `yaml` file and the class objects for those. Edit those as your need.
109
+ * I want tailored judge prompts for each line of the test set row (i.e. ~100th row - `prompt1`, 101st~ - `prompt2`)
110
+ * You could see `load_prompt` at the above link receives `promptname` + `task` as a parameters to load the prompt. The function is called at [`./varco_arena/manager.py:async_run`](./varco_arena/manager.py).
111
+ * I want more fields for my llm outputs jsonl files for tailored use, i.e. want more fields beyond `instruction`, `source`, `generated`.
112
+ * It's going to get tricky but let me briefly guide you about this.
113
+ * You might have to edit `varco_arena/eval_utils.py`:`async_eval_w_prompt` (this part calls `PROMPT_OBJ.complete_prompt()`)
114
+ * And all the related codes will require revision.
115
+
116
+ ## Special Thanks to (contributors)
117
+ - Minho Lee (@Dialogue Model Team, NCSOFT) [github](https://github.com/minolee/)
118
+ - query wrapper
119
+ - rag prompt
120
+ - Jumin Oh (@Generation Model Team, NCSOFT)
121
+ - overall prototyping of the system in haste
122
+
123
+
124
+ ## Citation
125
+ If you found our work helpful, consider citing our paper!
126
+ ```
127
+ @misc{son2024varcoarenatournamentapproach,
128
+ title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
129
+ author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
130
+ year={2024},
131
+ eprint={2411.01281},
132
+ archivePrefix={arXiv},
133
+ primaryClass={cs.CL},
134
+ url={https://arxiv.org/abs/2411.01281},
135
+ }
136
+ ```
README_kr.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Varco Arena
2
+ ๋ฐ”๋ฅด์ฝ” ์•„๋ ˆ๋‚˜๋Š” ํ…Œ์ŠคํŠธ์…‹ ๋ช…๋ น์–ด๋ณ„๋กœ ๋น„๊ตํ•  ๋ชจ๋ธ๋“ค์˜ ํ† ๋„ˆ๋จผํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ์ •ํ™•ํ•˜๊ฒŒ ๋ชจ๋ธ๋“ค์˜ ์ˆœ์œ„๋ฅผ ๋งค๊น๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ reference ์•„์›ƒํ’‹๊ณผ ๋น„๊ตํ•˜์—ฌ ์Šน๋ฅ ์„ ๋งค๊ธฐ๋Š” ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ •ํ™•ํ•˜๋ฉฐ ์กฐ๊ธˆ ๋” ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค.
3
+
4
+ ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์— ๋Œ€ํ•ด์„œ๋Š” ์•„๋ž˜์˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
5
+ * [๋…ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
6
+ * [์—”์”จ์†Œํ”„ํŠธ ํ…Œํฌ๋ธ”๋กœ๊ทธ (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
7
+
8
+
9
+ ## Quickstart
10
+ ### ๋กœ์ปฌ์—์„œ ์ŠคํŠธ๋ฆผ๋ฆฟ ์•ฑ์œผ๋กœ ์‹œ์ž‘ํ•˜๊ธฐ (์ถ”์ฒœ!)
11
+ ```bash
12
+ git clone [THIS_REPO]
13
+ # install requirements below. we recommend miniforge to manage environment
14
+ cd streamlit_app_local
15
+ bash run.sh
16
+ ```
17
+ ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ `[THIS_REPO]/streamlit_app_local/README.md` ์„ ์ฐธ์กฐํ•˜์„ธ์š”!
18
+
19
+ ### CLI ์‚ฌ์šฉ
20
+ * cli์™€ ์›น ์•ฑ์€ ์„œ๋กœ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ํ™œ์šฉํ•˜๋ฉฐ, ์•„๋ž˜์˜ ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ์Šต๋‹ˆ๋‹ค.
21
+ * `varco_arena/`
22
+ * vscode ์ƒ์—์„œ ๋””๋ฒ„๊น…์„ ์œ„ํ•œ ํ”„๋ฆฌ์…‹ ํ”„๋กฌํ”„ํŠธ๋ณ„ ํ…Œ์ŠคํŠธ ๋ช…๋ น์–ด๋Š” ๋‹ค์Œ ํŒŒ์ผ์— ์ ํ˜€์žˆ์Šต๋‹ˆ๋‹ค.
23
+ * `varco_arena/.vscode/launch.json`
24
+ ```bash
25
+ ## gpt-4o-mini as a judge
26
+ python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
27
+ ## vllm-openai served LLM as a judge
28
+ python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
29
+
30
+ # dbg lines
31
+ ## openai api judge dbg
32
+ python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
33
+ ## other testing lines
34
+ python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
35
+ ## dummy judge dbg (checking errors without api requests)
36
+ python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
37
+ ```
38
+
39
+ ## Requirements
40
+ `python = 3.11.9` ์ƒ์—์„œ ํ…Œ์ŠคํŠธ ํ•จ. `requirements.txt`
41
+ ```
42
+ openai>=1.17.0
43
+ munch
44
+ pandas
45
+ numpy
46
+ tqdm>=4.48.0
47
+ plotly
48
+ scikit-learn
49
+ kaleido
50
+ tiktoken>=0.7.0
51
+ pyyaml
52
+ transformers
53
+ streamlit>=1.40.2
54
+ openpyxl
55
+ git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
56
+
57
+ # Linux์ธ ๊ฒฝ์šฐ
58
+ uvloop
59
+ # Windows์ธ ๊ฒฝ์šฐ
60
+ winloop
61
+ ```
62
+
63
+ #### Argument
64
+ - -i, --input : ์ž…๋ ฅ ํŒŒ์ผ or ๋””๋ ‰ํ† ๋ฆฌ or ํŒŒ์ผ๋ช…์— ๋Œ€ํ•œ ์ •๊ทœ ํ‘œํ˜„์‹
65
+ - -o, --output_dir : ์ถœ๋ ฅ ํŒŒ์ผ์ด ์ €์žฅ๋˜๋Š” ๋””๋ ‰ํ† ๋ฆฌ
66
+ - -e, --evaluation : ํ‰๊ฐ€ ๋ชจ๋ธ (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", vllm์—์„œ ๋„์šด ๋ชจ๋ธ ๋ช… ๋“ฑ)
67
+ - -m, --matching_method: ๋งค์น˜ ๋ฐฉ์‹ (๊ธฐ๋ณธ๊ฐ’ "tournament", "league" (๋น„์ถ”์ฒœ) )
68
+ - -k, --openai_api_key : OpenAI API Key
69
+ - -u, --openai_url: ๋กœ์ปฌ vLLM OpenAI ์„œ๋ฒ„ ์‚ฌ์šฉ ์‹œ URL(ip์ฃผ์†Œ+ํฌํŠธ)
70
+
71
+ #### advanced
72
+ - -j, --n_jobs : asyncio.semaphore()์— ์ „๋‹ฌ๋  ์ธ์ž. Arena๊ฐ€ ์ง„ํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ๊ธฐ๋ณธ๊ฐ’์ธ 32 ์ดํ•˜๋กœ ๋‚ด๋ ค๋ณด์ž
73
+ - -p, --evalprompt : [ํ•ด๋‹น ๋””๋ ‰ํ† ๋ฆฌ ์ฐธ์กฐ](./varco_arena/prompts/*.yaml)
74
+ - -lr, --limit_requests : vLLM OpenAI ์„œ๋ฒ„ ์š”์ฒญ ์ œํ•œ (default: 7,680)
75
+ - -lt, --limit_tokens : vLLM OpenAI ์„œ๋ฒ„ ํ† ํฐ ์ œํ•œ (default: 15,728,640)
76
+
77
+ #### Input Data Format
78
+ [input jsonl ๊ฐ€์ด๋“œ ๋งํฌ](./streamlit_app_local/guide_mds/input_jsonls_kr.md)
79
+
80
+
81
+
82
+ ## Contributing & Customizing
83
+ #### git clone ๋ฐ dependency ์„ค์น˜ ํ›„์— ํ•  ์ผ
84
+ ```bash
85
+ pip install pre-commit
86
+ pre-commit install
87
+ ```
88
+ #### commit ํ•˜๊ธฐ ์ „์— ํ•  ์ผ
89
+ ```bash
90
+ bash precommit.sh # ์ด๊ฒŒ ์ฝ”๋“œ๋“ค์„ ๋‹ค ๋ฆฌํฌ๋งทํ•ด์ค„๊ฑฐ์ž„
91
+ ```
92
+
93
+
94
+ ๋ฌธ์˜: ์†์„ ์ผ
95
+ * ๋‚ด๊ฐ€ ๋งŒ๋“  ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์–ด์š”
96
+ * [`./varco_arena/prompts/`](./varco_arena_core/prompts/__init__.py) ์—์„  ๊ฐ์ข… ํ”„๋กฌํ”„ํŠธ ํด๋ž˜์Šค ๋ฐ `yaml` ํŒŒ์ผ ํ˜•ํƒœ๋กœ ์ •์˜๋œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค. ํ”„๋ฆฌ์…‹์„ ์ฐธ์กฐํ•˜์—ฌ ์ž‘์„ฑํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
97
+ * ํ…Œ์ŠคํŠธ์…‹ ๋ณ„๋กœ ๋‹ค๋ฅธ ํ‰๊ฐ€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์–ด์š” (e.g. ์ž‘์—…์— ๋”ฐ๋ผ ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์–ด์š”)
98
+ * ์œ„ ๊ฑธ์–ด๋“œ๋ฆฐ ๋งํฌ์˜ `load_prompt` ๋ฅผ ํ†ตํ•ด์„œ `promptname` + `task` ํ˜•ํƒœ๋กœ [`./varco_arena_core/manager.py:async_run`](./varco_arena_core/manager.py) ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋กœ๋“œ๋˜๋„๋ก ํ•ด๋†“์•˜์Šต๋‹ˆ๋‹ค.
99
+ * ์ œ๊ฐ€ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์€ ์ž…๋ ฅํŒŒ์ผ์— `instruction`, `source`, `generated` ์ด์™ธ์— ๋‹ค๋ฅธ ํ•„๋“œ๋ฅผ ์ถ”๊ฐ€ํ•ด์„œ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์–ด์š”.
100
+ * ์กฐ๊ธˆ ๋ณต์žกํ•ด์ง€๋Š”๋ฐ ๋‹ค์Œ ๋ถ€๋ถ„์„ ๊ณ ์ณ์ฃผ์„ธ์š”
101
+ * `varco_arena/eval_utils.py` ์—์„œ `async_eval_w_prompt` ๋ถ€๋ถ„์„ ์†๋ด์•ผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์—ฌ๊ธฐ์—์„œ PROMPT_OBJ.complete_prompt()์„ ํ˜ธ์ถœํ•จ)
102
+ * ๊ทธ ์™ธ ์—ฐ๊ด€๋œ ๋ถ€๋ถ„์€ ํƒ€๊ณ ํƒ€๊ณ  ๊ณ ์ณ์ฃผ์…”์•ผ...
103
+
104
+ ## Special Thanks to (contributors)
105
+ - ์ด๋ฏผํ˜ธ (@๋Œ€ํ™”๋ชจ๋ธํŒ€, NCSOFT) [github](https://github.com/minolee/)
106
+ - query wrapper
107
+ - rag prompt
108
+ - ์˜ค์ฃผ๋ฏผ (@์ƒ์„ฑ๋ชจ๋ธํŒ€, NCSOFT)
109
+ - overall prototyping of the system in haste
110
+
111
+ ## Citation
112
+ ์ €ํฌ ์ž‘์—…๋ฌผ์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ์ €ํฌ๋„ ๋„์›€์„ ๋ฐ›์•„๋ณผ ์ˆ˜ ์žˆ์„๊นŒ์š”?๐Ÿ˜‰
113
+ ```
114
+ @misc{son2024varcoarenatournamentapproach,
115
+ title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
116
+ author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
117
+ year={2024},
118
+ eprint={2411.01281},
119
+ archivePrefix={arXiv},
120
+ primaryClass={cs.CL},
121
+ url={https://arxiv.org/abs/2411.01281},
122
+ }
123
+ ```
__pycache__/analysis_utils.cpython-311.pyc ADDED
Binary file (17.7 kB). View file
 
__pycache__/app.cpython-311.pyc ADDED
Binary file (22.3 kB). View file
 
__pycache__/query_comp.cpython-311.pyc ADDED
Binary file (7.99 kB). View file
 
__pycache__/view_utils.cpython-311.pyc ADDED
Binary file (18.3 kB). View file
 
analysis_utils.py ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from copy import deepcopy
2
+ from functools import partial
3
+ from typing import *
4
+
5
+ import pandas as pd
6
+ from fire import Fire
7
+
8
+ """
9
+ This code assumes dealing with only one instruction
10
+ """
11
+
12
+
13
+ # from varco_arena.tournament
14
+ def log2_power_of_two(n):
15
+ # First, let's make sure n is indeed a power of 2
16
+ if n & (n - 1) != 0 or n == 0:
17
+ raise ValueError("n must be a positive power of 2")
18
+
19
+ exponent = 0
20
+ while n > 1:
21
+ n >>= 1 # Right shift is like dividing by 2, but faster
22
+ exponent += 1
23
+ return exponent
24
+
25
+
26
+ def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
27
+ finals = df[df["round"] == "final"]
28
+ if len(finals) == 1:
29
+ first = finals.iloc[0].winner_resolved
30
+ else:
31
+ first = (
32
+ None # error case (no finals match or multiple finals (buggy result file))
33
+ )
34
+
35
+ return first
36
+
37
+
38
+ def get_unique_participants(df: pd.DataFrame) -> list:
39
+ participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
40
+ participants = [p for p in participants if p] # remove None
41
+ participants = sorted(participants) # make it sorted
42
+ return participants
43
+
44
+
45
+ def _impute_byes(df):
46
+ max_depth = df.depth.max()
47
+
48
+ # init
49
+ imputed_parts = dict()
50
+ for depth in range(max_depth + 1):
51
+ imputed_parts[depth] = df[df.depth == depth].copy()
52
+
53
+ # reverse
54
+ for depth in range(max_depth, 0, -1): # always we have 1 proper match for depth=0
55
+ null_v_null = {
56
+ "model_a": "",
57
+ "model_b": "",
58
+ "winner": "model_a",
59
+ "match_order_in_round": "-",
60
+ "depth": depth,
61
+ }
62
+
63
+ # fill some_model vs null byes
64
+ players = get_unique_participants(imputed_parts[depth])
65
+ proceeded = get_unique_participants(imputed_parts[depth - 1])
66
+ imputed = []
67
+ for p in proceeded:
68
+ if p not in players:
69
+ p_v_null = deepcopy(null_v_null)
70
+ p_v_null["model_a"] = p
71
+ imputed.append(p_v_null)
72
+ imputed_parts[depth] = pd.concat(
73
+ [
74
+ imputed_parts[depth],
75
+ pd.DataFrame(imputed),
76
+ ],
77
+ axis="index",
78
+ )
79
+
80
+ # fill null vs null
81
+ n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
82
+ if n_null_v_null > 0:
83
+ imputed = pd.DataFrame([null_v_null] * n_null_v_null)
84
+ imputed_parts[depth] = pd.concat(
85
+ [
86
+ imputed_parts[depth],
87
+ imputed,
88
+ ],
89
+ axis="index",
90
+ )
91
+
92
+ df_imputed = pd.concat(imputed_parts.values(), axis="index")
93
+ df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
94
+
95
+ return df_imputed
96
+
97
+
98
+ def index_test_scenario(df) -> pd.DataFrame:
99
+ df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
100
+ df["idx_inst_src"] = df.apply(
101
+ lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
102
+ )
103
+
104
+ # later used for tournament bracket backtrackiung
105
+ if "depth" not in df.columns:
106
+ mappings = {
107
+ "final": 0,
108
+ "semi-final": 1,
109
+ "quarter-final": 2,
110
+ }
111
+
112
+ def _convert_round_to_depth(rnd: str, mappings=None) -> int:
113
+ if rnd is None:
114
+ depth = None
115
+ elif rnd in mappings.keys():
116
+ depth = mappings[rnd]
117
+ elif rnd.startswith("round-"): # assume perfect power of two
118
+ num = int(rnd.replace("round-", "").strip())
119
+ depth = log2_power_of_two(num) - 1
120
+ return depth
121
+
122
+ conv = partial(_convert_round_to_depth, mappings=mappings)
123
+ df["depth"] = df["round"].apply(conv)
124
+
125
+ return df
126
+
127
+
128
+ def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
129
+ df = df.sort_values(by="depth").reset_index(drop=True)
130
+ # make winner interpretable (A -> model_a, B -> model_b)
131
+ df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
132
+
133
+ # define alpha2names if not given (covers upto 168 participants)
134
+ if alpha2names is None:
135
+ alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
136
+ abcdefghijklmnopqrstuvwxyz\
137
+ โ“โ“‘โ“’โ““โ“”โ“•โ“–โ“—โ“˜โ“™โ“šโ“›โ“œโ“โ“žโ“Ÿโ“ โ“กโ“ขโ“ฃโ“คโ“ฅโ“ฆโ“งโ“จโ“ฉ\
138
+ ใ‰ ใ‰กใ‰ขใ‰ฃใ‰คใ‰ฅใ‰ฆใ‰งใ‰จใ‰ฉใ‰ชใ‰ซใ‰ฌใ‰ญ\
139
+ ใ‰ฎใ‰ฏใ‰ฐใ‰ฑใ‰ฒใ‰ณใ‰ดใ‰ตใ‰ถใ‰ทใ‰ธใ‰นใ‰บใ‰ป\
140
+ ใ„ฑใ„ดใ„ทใ„นใ…ใ…‚ใ……ใ…‡ใ…ˆใ…Šใ…‹ใ…Œใ…ใ…Ž\
141
+ ฮ‘ฮ’ฮ“ฮ”ฮ•ฮ–ฮ—ฮ˜ฮ™ฮšฮ›ฮœฮฮžฮŸฮ ฮกฮฃฮคฮฅฮฆฮงฮจฮฉ\
142
+ ฮฑฮฒฮณฮดฮตฮถฮทฮธฮนฮบฮปฮผฮฝฮพฮฟฯ€ฯฯƒฯ„ฯ…ฯ†ฯ‡ฯˆฯ‰"
143
+ model_full_names = get_unique_participants(df)
144
+ alpha2names = dict(zip(alphabets, model_full_names))
145
+ if len(alpha2names) < len(model_full_names):
146
+ raise ValueError(
147
+ f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
148
+ )
149
+ names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
150
+ df = _impute_byes(df)
151
+
152
+ # preserve readables for later
153
+ df = _make_readables(df, names2alpha)
154
+
155
+ if len(df[df["round"] == "final"]) != 1:
156
+ raise ValueError(f"final match need to be one and only.")
157
+
158
+ return df, alpha2names
159
+
160
+
161
+ def _make_readables(df, names2alpha):
162
+ df["human_readable_model_a"] = df.model_a.copy()
163
+ df["human_readable_model_b"] = df.model_b.copy()
164
+
165
+ df.model_a = df.model_a.apply(
166
+ lambda modelname: names2alpha[modelname] if modelname else "x"
167
+ )
168
+ df.model_b = df.model_b.apply(
169
+ lambda modelname: names2alpha[modelname] if modelname else "x"
170
+ )
171
+
172
+ df["human_readable_idx"] = df.apply(
173
+ lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
174
+ axis=1,
175
+ )
176
+
177
+ df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
178
+ df["winner_nodes"] = df.apply(
179
+ lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
180
+ ) # later for figure representation of winner as a "node"
181
+ return df
182
+
183
+
184
+ # draw
185
+ def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
186
+ def _draw_round(
187
+ df: pd.DataFrame,
188
+ depth: int = None,
189
+ winners_in_order: list = None,
190
+ ) -> Tuple:
191
+ df_now = df[df.depth == depth]
192
+ max_depth = df.depth.max()
193
+
194
+ width = 2 ** ((max_depth - depth) + 2)
195
+
196
+ connect_left = "โ”€" * (width)
197
+ connect_left = connect_left[4:]
198
+ connect_right = " " * (width)
199
+ connect_right = "โ”" + connect_right[1:]
200
+
201
+ if winners_in_order is None:
202
+ assert (
203
+ depth == 0
204
+ ), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
205
+ winners_in_order = df_now.winner_nodes
206
+
207
+ round_drawing_parts = []
208
+ descending_round_winners = []
209
+ for node in winners_in_order:
210
+ round_drawing_parts.append("".join([node, connect_left, connect_right]))
211
+ # next round winners in sync with winner order
212
+ row_now = df_now.query(f"winner_nodes=='{node}'")
213
+ descending_round_winners.append(row_now.model_a.item())
214
+ descending_round_winners.append(row_now.model_b.item())
215
+
216
+ # find descending_round_winners within winner_nodes format (num:alpha)
217
+ if depth == max_depth:
218
+ pass # keep the descending_round_winners intact
219
+ else:
220
+ df_descend = df[df.depth == depth + 1]
221
+ for i, winner_alpha in enumerate(descending_round_winners):
222
+ node_intr = df_descend.query(
223
+ f"winner_resolved=='{winner_alpha}'"
224
+ ).winner_nodes.item()
225
+ descending_round_winners[i] = node_intr
226
+
227
+ round_drawing = "".join(round_drawing_parts)
228
+ descending_unit = " " * width
229
+ descending_unit = "โ”‚" + descending_unit[1:]
230
+ descending_lines_parts = [descending_unit] * len(df_now) * 2
231
+ descending_lines = "".join(descending_lines_parts)
232
+
233
+ return round_drawing, descending_lines, descending_round_winners
234
+
235
+ drawings = []
236
+ winners_in_order = None
237
+ max_depth = df.depth.max()
238
+ for depth in range(max_depth + 1):
239
+ max_depth = df.depth.max()
240
+ winner_drw, lines_desc, winners_in_order = _draw_round(
241
+ df,
242
+ depth=depth,
243
+ winners_in_order=winners_in_order,
244
+ )
245
+ drawings.append((winner_drw, lines_desc))
246
+ # prepare bracket top
247
+ champion_alphabet = drawings[0][0].split()[0].split(":")[0]
248
+ champion_readable = alpha2names[champion_alphabet]
249
+ bracket_top = [f"๐Ÿฅ‡winner: {champion_readable}", "โ”‚"]
250
+ # prepare mid
251
+ bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
252
+
253
+ # prepare bot
254
+ initial_participants = winners_in_order
255
+ bracket_bot = (" " * 3).join(initial_participants)
256
+
257
+ full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
258
+
259
+ return full_figure
260
+
261
+
262
+ def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
263
+ n_models = len(get_unique_participants(result_df))
264
+ size_testset = int(len(result_df) / (n_models - 1))
265
+ interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
266
+ return interpretation, n_models, size_testset
267
+
268
+
269
+ def make_legend_str(df, alpha2names) -> str:
270
+ first = get_1st(df, alpha2names)
271
+ alpha2names = {k: v.replace("๐Ÿฅ‡ ", "") for k, v in alpha2names.items()}
272
+ alpha_ordered = sorted(list(alpha2names.keys()))
273
+ # names_ordered = sorted(list(alpha2names.values()))
274
+ # name2alpha = {v: k for k, v in alpha2names.items()}
275
+
276
+ for k, v in alpha2names.items():
277
+ if v == alpha2names[first]:
278
+ alpha2names[k] = f"๐Ÿฅ‡ {v}"
279
+ res_str = f"\n\nlegend:"
280
+ # for name in names_ordered:
281
+ # alpha = name2alpha[name]
282
+ for alpha in alpha_ordered:
283
+ name_w_medal = alpha2names[alpha]
284
+ res_str += f"\n{alpha}\t{name_w_medal}"
285
+ return res_str
286
+
287
+
288
+ def main(
289
+ jslname: str = "result.json",
290
+ ):
291
+ """
292
+ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ
293
+ """
294
+
295
+ df = pd.read_json(jslname, orient="records")
296
+ df = df.drop(columns=["tstamp", "logs"])
297
+ df = index_test_scenario(df)
298
+
299
+ # ์ค‘๊ฐ„์— visualization(df) ์—ฌ๊ธฐ์„  ์ƒ๋žต. ๋งŒ์•ฝ ์ด๊ฑฐ ๋บด๊ณ  ๋‹ค ๋”ฐ๋ผํ–ˆ๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธด๋‹ค๋ฉด viz ๋ฌธ์ œ๋‹ค. ๊ทผ๋ฐ ์•ˆ๊ทธ๋Ÿด๊ฑฐ๊ฐ™๊ธดํ•จ
300
+ selections = df.idx_inst_src.unique()
301
+ for i, sel in enumerate(selections):
302
+ try:
303
+ df_now = df[df.idx_inst_src == sel]
304
+ df_now_processed, _alpha2names = init_tournament_dataframe(
305
+ df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
306
+ )
307
+ if "alpha2names" not in dir():
308
+ alpha2names = _alpha2names
309
+ assert "alpha2names" in dir()
310
+ bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
311
+ legend = make_legend_str(df_now_processed, alpha2names)
312
+ print(bracket_drawing + legend)
313
+ print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
314
+ print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
315
+
316
+ for match_idx_human in df_now_processed.human_readable_idx:
317
+ match_idx = int(match_idx_human.split(": ")[0])
318
+ row = df_now_processed.loc[match_idx]
319
+ winner = row.winner
320
+ except Exception as e:
321
+ print(e, file=open(f"{i}_err.txt", "w"))
322
+ print("", file=open(f"{i}_err.txt", "a"))
323
+ print(sel, file=open(f"{i}_err.txt", "a"))
324
+ df_now_processed[
325
+ [
326
+ "depth",
327
+ "round",
328
+ "winner_nodes",
329
+ "winner_resolved",
330
+ "winner",
331
+ "model_a",
332
+ "model_b",
333
+ ]
334
+ ].to_json(f"{i}_err.jsonl", lines=True, orient="records")
335
+
336
+
337
+ if __name__ == "__main__":
338
+ Fire(main)
app.py ADDED
@@ -0,0 +1,452 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # import shutil
2
+ import os
3
+ import select
4
+ import subprocess
5
+ import sys
6
+ import time
7
+ from datetime import datetime, timedelta, timezone
8
+ from pathlib import Path
9
+ from typing import *
10
+
11
+ import streamlit as st
12
+
13
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
14
+
15
+ from varco_arena.varco_arena_core.prompts import load_prompt
16
+ from view_utils import (
17
+ default_page_setting,
18
+ escape_markdown,
19
+ set_nav_bar,
20
+ show_linebreak_in_md,
21
+ )
22
+
23
+ VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
24
+ USR_SUB = VA_ROOT.parts[-1]
25
+
26
+ import shutil
27
+
28
+ import pandas as pd
29
+
30
+ import analysis_utils as au
31
+ from view_utils import visualization
32
+
33
+
34
+ class DataCache:
35
+ def __init__(self):
36
+ self.cache = {}
37
+
38
+ def store(self, key: str, data: dict):
39
+ self.cache[key] = data
40
+
41
+ def get(self, key: str) -> Optional[dict]:
42
+ return self.cache.get(key)
43
+
44
+
45
+ # Initialize the cache in session state if it doesn't exist
46
+ if "data_cache" not in st.session_state:
47
+ st.session_state.data_cache = DataCache()
48
+
49
+
50
+ def purge_user_sub_data(data_path_to_purge: Union[Path, str] = None):
51
+ if data_path_to_purge is None:
52
+ print("nothing to purge")
53
+ return
54
+ else:
55
+ shutil.rmtree(data_path_to_purge)
56
+ print(f"purged {str(data_path_to_purge)}")
57
+ return
58
+
59
+
60
+ @st.cache_data
61
+ def load_and_cache_data(result_file_path: Optional[str] = None) -> Tuple[Dict, Dict]:
62
+ """
63
+ Load data from file, cache it in memory, then remove the file.
64
+ Returns cached data on subsequent calls.
65
+
66
+ Args:
67
+ result_file_path: Path to the result JSON file
68
+
69
+ Returns:
70
+ Tuple of (all_result_dict, df_dict)
71
+ """
72
+ # Check if we already have cached data for this path
73
+ if result_file_path:
74
+ cache_key = str(Path(result_file_path))
75
+ cached_data = st.session_state.data_cache.get(cache_key)
76
+ if cached_data:
77
+ return cached_data["all_result_dict"], cached_data["df_dict"]
78
+
79
+ # Initialize empty dicts
80
+ all_result_dict = {}
81
+ df_dict = {}
82
+
83
+ if result_file_path is not None:
84
+ try:
85
+ result_file_path = Path(result_file_path)
86
+
87
+ # Read and process data
88
+ df = pd.read_json(result_file_path)
89
+ for col in ["tstamp", "logs"]:
90
+ if col in df.columns:
91
+ df.drop(columns=[col], inplace=True)
92
+ df = au.index_test_scenario(df)
93
+
94
+ fig_dict_per_task = {}
95
+ df_dict_per_task = {}
96
+
97
+ # Process overall data
98
+ fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
99
+ df_dict_per_task["Overall"] = df
100
+
101
+ # Process per-task data
102
+ for task in df["task"].unique():
103
+ df_task = df[df["task"] == task]
104
+ fig_dict_per_task[task] = visualization(df_task, is_overall=False)
105
+ df_dict_per_task[task] = df_task
106
+
107
+ # Create key from path components
108
+ prm_name = result_file_path.parts[-2]
109
+ exp_name = result_file_path.parts[-3]
110
+ key = f"{exp_name}/{prm_name}"
111
+
112
+ all_result_dict[key] = fig_dict_per_task
113
+ df_dict[key] = df_dict_per_task
114
+
115
+ # Store in cache before removing file
116
+ cache_data = {"all_result_dict": all_result_dict, "df_dict": df_dict}
117
+ st.session_state.data_cache.store(str(result_file_path), cache_data)
118
+
119
+ # Remove user experiment directory
120
+ purge_user_sub_data(data_path_to_purge=VA_ROOT)
121
+
122
+ except Exception as e:
123
+ st.error(f"Error processing data: {str(e)}")
124
+ return {}, {}
125
+
126
+ return all_result_dict, df_dict
127
+
128
+
129
+ def upload_files(uploaded_files) -> Path:
130
+ # prep directory for user submission
131
+ user_sub_root = VA_ROOT
132
+ if user_sub_root.exists():
133
+ if not user_sub_root.is_dir():
134
+ raise ValueError(
135
+ f"{user_sub_root} file exists and is not a directory. Consider renaming it."
136
+ )
137
+ else:
138
+ user_sub_root.mkdir(parents=True)
139
+
140
+ KST = timezone(timedelta(hours=9))
141
+ tstamp = datetime.now(KST)
142
+ tstr = tstamp.strftime("%m-%d_%H:%M:%S")
143
+ files_dir_str = "./" + str(user_sub_root / tstr)
144
+ files_dir = Path(files_dir_str)
145
+ files_dir.mkdir(parents=True, exist_ok=True)
146
+ uploaded_files = list(uploaded_files)
147
+
148
+ if not uploaded_files:
149
+ st.warning("โŒ No files to upload. Please drag/drop or browse files to upload.")
150
+ # purge_user_sub_data(data_path_to_purge=VA_ROOT)
151
+ elif len(uploaded_files) < 2:
152
+ st.error("โŒ You need at least 2 jsonlines files to properly run VA.")
153
+ purge_user_sub_data(data_path_to_purge=VA_ROOT)
154
+ else: # properly uploaded
155
+ for file in uploaded_files:
156
+ # Create a path for the file in the server directory
157
+ file_path = files_dir / file.name
158
+
159
+ # Save the file to the server directory
160
+ with open(file_path, "wb") as f:
161
+ f.write(file.getbuffer())
162
+
163
+ jslfiles = list(files_dir.glob("*.jsonl"))
164
+ st.success(f"โœ… Successfully uploaded {len(jslfiles)} jsonl files.")
165
+ return files_dir.resolve()
166
+
167
+
168
+ def run_varco_arena(
169
+ price_estimation: bool = False,
170
+ # upload_dir: Union[str, Path] = None,
171
+ promptname: str = None,
172
+ exp_name: str = None,
173
+ api_key: Optional[str] = None,
174
+ evaluation_model: str = "gpt-4o-mini",
175
+ update_interval: float = 1.0,
176
+ ):
177
+ # Use environment variable for API key
178
+ ptn = f"{str(st.session_state.upfiles_dir)}"
179
+ outdir = Path(ptn)
180
+ if exp_name:
181
+ outdir = outdir / exp_name
182
+
183
+ command = f"python varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
184
+ if price_estimation:
185
+ command = f"{command} -c"
186
+ else:
187
+ command = command.replace("python", "yes | python ")
188
+ print(command)
189
+
190
+ api_key = None # clear immediately
191
+
192
+ process = subprocess.Popen(
193
+ command,
194
+ stdout=subprocess.PIPE,
195
+ stderr=subprocess.STDOUT,
196
+ stdin=subprocess.PIPE,
197
+ text=True,
198
+ bufsize=1,
199
+ shell=True,
200
+ )
201
+
202
+ # Set stdout and stdin to non-blocking mode
203
+ os.set_blocking(process.stdout.fileno(), False)
204
+
205
+ last_update_time = time.time()
206
+ terminal_output = st.empty()
207
+ full_output = f"{command}\n"
208
+ while True:
209
+ # Check if we have output to read
210
+ if select.select([process.stdout], [], [], 0)[0]:
211
+ output = process.stdout.readline()
212
+ if output:
213
+ full_output += output
214
+ if price_estimation:
215
+ to_show = full_output
216
+ terminal_output.code(to_show, language="bash")
217
+ else:
218
+ current_time = time.time()
219
+ if current_time - last_update_time > update_interval:
220
+ lines = full_output.split("\n")
221
+ if len(lines) < 5:
222
+ to_show = full_output
223
+ else:
224
+ to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
225
+ terminal_output.code(to_show, language="bash")
226
+ last_update_time = current_time
227
+ print(output)
228
+ time.sleep(0.1)
229
+ # Check if the process has finished
230
+ if process.poll() is not None:
231
+ # Read any remaining output
232
+ remaining_output = process.stdout.read()
233
+ if remaining_output:
234
+ lines = remaining_output.split("\n")
235
+ if len(lines) > 10:
236
+ to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
237
+ else:
238
+ to_show += remaining_output
239
+ terminal_output.code(to_show, language="bash")
240
+ print(remaining_output)
241
+ break
242
+
243
+ return_code = process.poll()
244
+ return outdir, return_code
245
+
246
+
247
+ def main():
248
+ # init lang
249
+ st.session_state["korean"] = st.session_state.get("korean", False)
250
+
251
+ sidebar_placeholder = default_page_setting()
252
+ set_nav_bar(
253
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
254
+ )
255
+
256
+ st.title("โš”๏ธ VARCO ARENA โš”๏ธ")
257
+ if st.session_state.korean:
258
+ st.write(
259
+ "**VARCO Arena๋Š” ๊ฐ ๋ชจ๋ธ์˜ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ต ํ‰๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ˆœ์œ„๋ฅผ ์ œ๊ณตํ•˜๋Š” ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ปค์Šคํ…€ ํ…Œ์ŠคํŠธ์…‹์— ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋ฒ”๋‹ต์•ˆ์„ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.**"
260
+ )
261
+ else:
262
+ st.write(
263
+ "**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
264
+ )
265
+
266
+ st.divider()
267
+ # Set up the file uploader
268
+ if st.session_state.korean:
269
+ st.markdown("### 1. ๋ชจ๋ธ ์ถœ๋ ฅํŒŒ์ผ ์—…๋กœ๋“œ")
270
+ else:
271
+ st.markdown("### 1. Upload LLM responses")
272
+ uploaded_files = st.file_uploader(
273
+ "Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
274
+ )
275
+ if st.session_state.korean:
276
+ st.info("์—…๋กœ๋“œ ํ•˜์‹  ํŒŒ์ผ์€ ์ž๋™์œผ๋กœ ์‚ญ์ œ๋˜๋ฉฐ ์ˆ˜์ง‘๋˜๊ฑฐ๋‚˜ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.")
277
+ else:
278
+ st.info(
279
+ "Your uploads will be removed automatically, not being collected nor reused for any purpose."
280
+ )
281
+ # upload state
282
+ if "upfiles_dir" not in st.session_state:
283
+ st.session_state.upfiles_dir = None
284
+ if st.button("์—…๋กœ๋“œํ•˜๊ธฐ" if st.session_state.korean else "Upload Files"):
285
+ st.session_state.upfiles_dir = upload_files(uploaded_files)
286
+ if st.button("์—…๋กœ๋“œํ•œ ํŒŒ์ผ ์ง€์šฐ๊ธฐ" if st.session_state.korean else "Purge my uploads"):
287
+ st.session_state.upfiles_dir = None
288
+ if VA_ROOT.is_dir():
289
+ shutil.rmtree(VA_ROOT)
290
+ st.success(
291
+ "โœ… ์—…๋กœ๋“œํ•œ ํŒŒ์ผ์„ ์„œ๋ฒ„์—์„œ ์ง€์› ์Šต๋‹ˆ๋‹ค"
292
+ if st.session_state.korean
293
+ else "โœ… Removed your uploads from the server successfully"
294
+ )
295
+ else:
296
+ st.error(
297
+ "โŒ ์ง€์šธ ํŒŒ์ผ์ด ์—†์Šต๋‹ˆ๋‹ค"
298
+ if st.session_state.korean
299
+ else "โŒ You have nothing uploaded"
300
+ )
301
+
302
+ if st.session_state.korean:
303
+ with st.expander("โ“โ” ๋ฌด์—‡์„ ์—…๋กœ๋“œ ํ•˜๋‚˜์š”โ“โ”"):
304
+ st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
305
+ else:
306
+ with st.expander("โ“โ” What should I upload โ“โ”"):
307
+ st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
308
+
309
+ # Form for cost estimation
310
+ with st.form("cost_estimation_form"):
311
+ if st.session_state.korean:
312
+ st.write("### 2. ๊ฐ€๊ฒฉ ์‚ฐ์ •")
313
+ else:
314
+ st.write("### 2. Cost Estimation")
315
+ eval_model = st.selectbox(
316
+ "Select Judge",
317
+ open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
318
+ )
319
+ promptname = st.selectbox(
320
+ "Select Evalutaion Prompt",
321
+ open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
322
+ )
323
+ if st.session_state.korean:
324
+ st.markdown("*`llmbar`์™ธ ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ๋Š” ์ถฉ๋ถ„ํžˆ ๊ฒ€์ฆ๋œ ํ”„๋กฌํ”„ํŠธ๋Š” ์•„๋‹™๋‹ˆ๋‹ค. (๋™์ž‘์€ ํ•จ)")
325
+ else:
326
+ st.markdown(
327
+ "*Eval prompts other than `llmbar` is working example, not the optimal ones."
328
+ )
329
+ if promptname == USR_SUB:
330
+ raise ValueError(
331
+ f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
332
+ )
333
+
334
+ estimate_button = st.form_submit_button("Calculate Cost!")
335
+ with st.expander(
336
+ "LLM Judge์— ํ™œ์šฉ๋˜๋Š” ํ”„๋กฌํ”„ํŠธ (`Calculate Cost!` ํด๋ฆญ์‹œ ๊ฐฑ์‹ )"
337
+ if st.session_state.korean
338
+ else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
339
+ ):
340
+ prompt = load_prompt(promptname, task="-")
341
+ kwargs = dict(
342
+ inst="{inst}",
343
+ src="{src}",
344
+ out_a="{out_a}",
345
+ out_b="{out_b}",
346
+ task="-",
347
+ )
348
+ if promptname == "translation_pair":
349
+ kwargs["source_lang"] = "{source_lang}"
350
+ kwargs["target_lang"] = "{target_lang}"
351
+ prompt_cmpl = prompt.complete_prompt(**kwargs)
352
+
353
+ st.markdown(f"### Evaluation Prompt: {promptname}")
354
+ for msg in prompt_cmpl:
355
+ st.markdown(f"**{msg['role']}**")
356
+ st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
357
+
358
+ if estimate_button:
359
+ if st.session_state.get("upfiles_dir") is None:
360
+ st.error(
361
+ "โŒ Requirements: You have to upload jsonlines files first to proceed"
362
+ )
363
+ else:
364
+ st.markdown("##### Estimated Cost")
365
+ dummy_api_key = "dummy"
366
+ dummy_exp_name = "dummy"
367
+ result_file_path, return_code = run_varco_arena(
368
+ # upload_dir=st.session_state.upfiles_dir,
369
+ promptname=promptname,
370
+ api_key=dummy_api_key,
371
+ exp_name=dummy_exp_name,
372
+ price_estimation=True,
373
+ evaluation_model=eval_model,
374
+ )
375
+ if return_code:
376
+ st.error(
377
+ "โŒ RuntimeError: An error occurred during cost estimation. **Restart from file upload!**"
378
+ )
379
+ purge_user_sub_data(data_path_to_purge=VA_ROOT)
380
+
381
+ else:
382
+ st.success("โœ… Cost estimation completed successfully")
383
+ st.session_state.cost_estimated = True
384
+
385
+ # Form for actual run
386
+ with st.form("run_arena_form"):
387
+ if st.session_state.korean:
388
+ st.write("### 3. Varco Arena ๊ตฌ๋™ํ•˜๊ธฐ")
389
+ else:
390
+ st.write("### 3. Run Varco Arena")
391
+ api_key = st.text_input("Enter your OpenAI API Key", type="password")
392
+
393
+ # demo exp name fixated
394
+ KST = timezone(timedelta(hours=9))
395
+ tstamp = datetime.now(KST)
396
+ tstr = tstamp.strftime("%m-%d_%H:%M:%S")
397
+ exp_name = f"{tstr}_KST_submit"
398
+
399
+ if st.session_state.korean:
400
+ st.write("**์ฃผ์˜**:`Ctrl+C` ๋ฒ„ํŠผ์€ ๊ตฌํ˜„๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๊ตฌ๋™ ์ „ ์ˆ™๊ณ ํ•ด์ฃผ์„ธ์š”.")
401
+ else:
402
+ st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
403
+ run_button = st.form_submit_button(
404
+ "๐Ÿ”ฅ Run Arena!",
405
+ disabled=(not st.session_state.get("cost_estimated", False))
406
+ or "result_file_path"
407
+ in st.session_state.keys(), # run already performed once
408
+ )
409
+
410
+ if run_button:
411
+ set_nav_bar(
412
+ True,
413
+ sidebar_placeholder=sidebar_placeholder,
414
+ toggle_hashstr="app_during_run",
415
+ )
416
+ if st.session_state.get("upfiles_dir") is None:
417
+ st.error(
418
+ "โŒ Requirements: You have to upload jsonlines files first to proceed"
419
+ )
420
+ elif not api_key:
421
+ st.error("โŒ Requirements: OpenAI key required to run VA.")
422
+ else:
423
+ result_file_path, return_code = run_varco_arena(
424
+ # upload_dir=st.session_state.upfiles_dir,
425
+ promptname=promptname,
426
+ api_key=api_key,
427
+ exp_name=exp_name,
428
+ price_estimation=False,
429
+ evaluation_model=eval_model,
430
+ )
431
+ if return_code:
432
+ st.error(
433
+ "โŒ RuntimeError: An error occurred during Varco Arena run. Check the file and **restart from file upload!**"
434
+ )
435
+ purge_user_sub_data(data_path_to_purge=VA_ROOT)
436
+
437
+ else:
438
+ st.success("โœ… Varco Arena run completed successfully")
439
+ st.session_state.result_file_path = list(
440
+ result_file_path.glob("**/result.json")
441
+ )[-1]
442
+ set_nav_bar(
443
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
444
+ )
445
+
446
+ if st.session_state.get("result_file_path", None) is not None:
447
+ print(f"{st.session_state.get('result_file_path', None)=}")
448
+ load_and_cache_data(result_file_path=str(st.session_state.result_file_path))
449
+
450
+
451
+ if __name__ == "__main__":
452
+ main()
eval_models_list.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gpt-4o-mini
2
+ gpt-4o-2024-05-13
3
+ gpt-4o-2024-08-06
eval_prompt_list.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ llmbar
2
+ translation_pair
3
+ rag_pair_kr
guide_mds/input_jsonls_en.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #### \[EN\] Upload guide (`jsonl`)
2
+ **Basic Requirements**
3
+ * Upload one `jsonl` file per model (e.g., five files to compare five LLMs)
4
+ * โš ๏ธ Important: All `jsonl` files must have the same number of rows
5
+ * โš ๏ธ Important: The `model_id` field must be unique within and across all files
6
+
7
+ **Required Fields**
8
+ * Per Model Fields
9
+ * `model_id`: Unique identifier for the model (recommendation: keep it short)
10
+ * `generated`: The LLM's response to the test instruction
11
+
12
+ * Required only for Translation (`translation_pair` prompt need those. See `streamlit_app_local/user_submit/mt/llama5.jsonl`)
13
+ * `source_lang`: input language (e.g. Korean, KR, kor, ...)
14
+ * `target_lang`: output language (e.g. English, EN, ...)
15
+
16
+ * Common Fields (Must be identical across all files)
17
+ * `instruction`: The input prompt or test instruction given to the model
18
+ * `task`: Category label used to group results (useful when using different evaluation prompts per task)
19
+
20
+ **Example Format**
21
+ ```python
22
+ # model1.jsonl
23
+ {"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
24
+ {"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}
25
+
26
+ # model2.jsonl
27
+ {"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
28
+ {"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
29
+ ...
30
+ ..
31
+ .
32
+ ```
33
+ **Use Case Example**
34
+ If you want to compare different prompting strategies for the same model:
35
+ * Use the same `instruction` across files (using unified test scenarios).
36
+ * `generated` responses of each prompting strategy will vary across the files.
37
+ * Use descriptive `model_id` values like "prompt1", "prompt2", etc.
guide_mds/input_jsonls_kr.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #### \[KR\] ์ง‘์–ด๋„ฃ์„ jsonl ํŒŒ์ผ ๊ฐ€์ด๋“œ
2
+ ๋น„๊ตํ•  ๋ชจ๋ธ์ด ๋‹ค์„ฏ ๊ฐœ๋ผ๋ฉด ๋‹ค์„ฏ ๊ฐœ์˜ .jsonl ํŒŒ์ผ์„ ์—…๋กœ๋“œํ•˜์„ธ์š”.
3
+ * ๐Ÿ’ฅ๋ชจ๋“  jsonl ์€ ๊ฐ™์€ ์ˆ˜์˜ ํ–‰์„ ๊ฐ€์ ธ์•ผํ•ฉ๋‹ˆ๋‹ค.
4
+ * ๐Ÿ’ฅ`model_id` ํ•„๋“œ๋Š” ํŒŒ์ผ๋งˆ๋‹ค ๋‹ฌ๋ผ์•ผํ•˜๋ฉฐ ํŒŒ์ผ ๋‚ด์—์„œ๋Š” ์œ ์ผํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
5
+
6
+ **jsonl ํ•„์ˆ˜ ํ•„๋“œ**
7
+ * ๊ฐœ๋ณ„
8
+ * `model_id`: ํ‰๊ฐ€๋ฐ›๋Š” ๋ชจ๋ธ์˜ ์ด๋ฆ„์ž…๋‹ˆ๋‹ค. (์งง๊ฒŒ ์“ฐ๋Š” ๊ฒƒ ์ถ”์ฒœ)
9
+ * `generated`: ๋ชจ๋ธ์ด testset instruction ์— ์ƒ์„ฑํ•œ ์‘๋‹ต์„ ๋„ฃ์œผ์„ธ์š”.
10
+
11
+ * ๋ฒˆ์—ญํ‰๊ฐ€ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ์‹œ (`translation_pair`. `streamlit_app_local/user_submit/mt/llama5.jsonl` ์—์„œ ์˜ˆ์‹œ ๋ณผ ์ˆ˜ ์žˆ์Œ)
12
+ * `source_lang`: input language (e.g. Korean, KR, kor, ...)
13
+ * `target_lang`: output language (e.g. English, EN, ...)
14
+
15
+ * ๊ณตํ†ต ๋ถ€๋ถ„ (**๋ชจ๋“  ํŒŒ์ผ์— ๋Œ€ํ•ด ๊ฐ™์•„์•ผ ํ•จ**)
16
+ * `instruction`: ๋ชจ๋ธ์— ์ง‘์–ด๋„ฃ๋Š” `testset instruction` ํ˜น์€ `input`์— ํ•ด๋‹นํ•˜๋Š” ๋ฌด์–ธ๊ฐ€์ž…๋‹ˆ๋‹ค.
17
+ * `task`: ์ „์ฒด ๊ฒฐ๊ณผ๋ฅผ subset์œผ๋กœ ๊ทธ๋ฃน์ง€์–ด์„œ ๋ณด์—ฌ์ค„ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. `evaluation prompt`๋ฅผ ํ–‰๋ณ„๋กœ ๋‹ค๋ฅด๊ฒŒ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์„ ๋•Œ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
18
+
19
+
20
+ ๊ฐ jsonl ํŒŒ์ผ์€ ์•„๋ž˜์ฒ˜๋Ÿผ ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค.
21
+ ```python
22
+ # model1.jsonl
23
+ {"model_id": "๋ชจ๋ธ1", "task": "๊ธธ ๋ฌป๊ธฐ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "์ €๊ธฐ๋กœ์š”"}
24
+ {"model_id": "๋ชจ๋ธ1", "task": "์‚ฐ์ˆ˜", "instruction": "1+1", "generated": "2"} # ๊ธธ ๋ฌป๊ธฐ์™€ ์‚ฐ์ˆ˜์˜ ๊ฒฝ์šฐ ๋‹ค๋ฅธ ํ‰๊ฐ€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์„ ์ˆ˜ ์žˆ๊ฒ ์ฃ ?
25
+
26
+ # model2.jsonl -* model1.jsonl๊ณผ `instruction`์€ ๊ฐ™๊ณ  `generated`, `model_id` ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค!
27
+ {"model_id": "๋ชจ๋ธ2", "task": "๊ธธ ๋ฌป๊ธฐ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "ํ•˜์ด"}
28
+ {"model_id": "๋ชจ๋ธ2", "task": "์‚ฐ์ˆ˜", "instruction": "1+1", "generated": "3"}
29
+
30
+ ...
31
+ ..
32
+ ```
33
+
34
+ ์˜ˆ๋ฅผ ๋“ค์–ด, ํ•œ๊ฐ€์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŒ…์„ ์‹œ๋„ํ•˜์—ฌ ๋‹ค๋ฅธ ์ƒ์„ฑ๋ฌธ์„ ์–ป์—ˆ๊ณ  ์ด๋ฅผ ๋น„๊ตํ•˜๊ณ  ์‹ถ์€ ๊ฒฝ์šฐ๋ฅผ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. ์ด ๋•Œ ํ‰๊ฐ€๋ฐ›์„ testset์€ ๊ฐ™์œผ๋ฏ€๋กœ `instruction`์€ ๋ชจ๋‘ ๊ฐ™๊ณ  ํ”„๋กฌํ”„ํŒ…์— ๋”ฐ๋ผ `generated`๋Š” ๋‹ฌ๋ผ์ง€๊ฒ ์ฃ ? `model_id` ๋Š” `"prompt1"`, `"prompt2"` ๋“ฑ ์ทจํ–ฅ์— ๋งž๊ฒŒ ์ ์–ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
modules/__pycache__/nav.cpython-311.pyc ADDED
Binary file (3.8 kB). View file
 
modules/nav.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
5
+ with sidebar_placeholder.container():
6
+ st.toggle(
7
+ "๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด๋กœ ๋ณด๊ธฐ",
8
+ value=st.session_state.get("korean", False),
9
+ key=f"korean_toggle_{toggle_hashstr}", # Add this explicit key
10
+ on_change=lambda: setattr(
11
+ st.session_state,
12
+ "korean",
13
+ st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
14
+ ),
15
+ )
16
+
17
+ st.page_link(
18
+ "app.py",
19
+ label="Varco Arena ๊ตฌ๋™" if st.session_state.korean else "Run VARCO Arena",
20
+ icon="๐Ÿ”ฅ",
21
+ )
22
+ st.page_link(
23
+ "pages/see_results.py",
24
+ label="๊ฒฐ๊ณผ ๋ณด๊ธฐ" if st.session_state.korean else "See Results",
25
+ icon="๐Ÿ“",
26
+ disabled=st.session_state.get("is_running", False),
27
+ )
28
+ st.page_link(
29
+ "pages/brief_intro.py",
30
+ label="์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋‚˜์š”?" if st.session_state.korean else "How it Works",
31
+ icon="โ”",
32
+ disabled=st.session_state.get("is_running", False),
33
+ )
34
+ st.page_link(
35
+ "pages/quick_start_guide.py",
36
+ label="๋ณธ๊ฒฉ ์‚ฌ์šฉํ•˜๊ธฐ ๊ฐ€์ด๋“œ" if st.session_state.korean else "Quick Start Guide",
37
+ icon="๐Ÿ’ฏ",
38
+ disabled=st.session_state.get("is_running", False),
39
+ )
40
+ related_links_en = """
41
+ **About**
42
+ * [Paper](https://huggingface.co/papers/2411.01281)
43
+ * [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
44
+ * [Inquiry](https://linkedin.com/in/deftson/)
45
+ """.strip()
46
+ related_links_kr = """
47
+ **About**
48
+ * [๋…ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
49
+ * [๋ธ”๋กœ๊ทธ](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
50
+ * [๋ฌธ์˜](https://linkedin.com/in/deftson/)
51
+ """.strip()
52
+ st.info(related_links_kr if st.session_state.korean else related_links_en)
53
+
54
+ st.divider()
55
+ demo_warning_kr = "โ•โ—**๋ณธ ๋ฐ๋ชจ์—์„œ๋Š” ์ƒˆ๋กœ๊ณ ์นจ ํ›„์— ํ‰๊ฐ€ ๊ฒฐ๊ณผ๊ฐ€ ๋ณด์กด๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค**โ•โ— ์•ฑ์„ ์˜จ์ „ํžˆ ํ™œ์šฉํ•˜์‹œ๋ ค๋ฉด ๊ฐœ์ธ ๊ธฐ๊ธฐ์—์„œ ํ˜ธ์ŠคํŒ…ํ•˜์„ธ์š” (**๐Ÿ’ฏ๋ณธ๊ฒฉ ์‚ฌ์šฉํ•˜๊ธฐ ๊ฐ€์ด๋“œ** ์ฐธ์กฐ)"
56
+ demo_warning_en = "โ•โ—For this demo, **evaluation results will not be preserved after refreshing**โ•โ— To fully utilize the app, please host it on your personal device (refer to **๐Ÿ’ฏQuick Start Guide**)"
57
+ st.markdown(demo_warning_kr if st.session_state.korean else demo_warning_en)
pages/__pycache__/see_results.cpython-311.pyc ADDED
Binary file (26.3 kB). View file
 
pages/brief_intro.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ from view_utils import default_page_setting, set_nav_bar
4
+
5
+ sidebar_placeholder = default_page_setting()
6
+ set_nav_bar(
7
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
8
+ )
9
+
10
+ st.image("va_en.png")
11
+ if st.session_state.korean:
12
+ st.info("์—ฌ๋Ÿฌ๋ถ„์ด ์—…๋กœ๋“œํ•˜๋Š” ํŒŒ์ผ์€ ์œ„ ๊ทธ๋ฆผ์˜ LLM ์ถœ๋ ฅ๋“ค์ž…๋‹ˆ๋‹ค.")
13
+ with st.expander("โ“โ” ๋ฌด์—‡์„ ์—…๋กœ๋“œ ํ•˜๋‚˜์š”โ“โ”"):
14
+ st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
15
+ else:
16
+ st.info(
17
+ "What you upload is the generated responses of LLMs regarding the test prompts."
18
+ )
19
+ with st.expander("โ“โ” What should I upload โ“โ”"):
20
+ st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
21
+
22
+
23
+ st.image("va_concept_new.png")
24
+ st.markdown(
25
+ """
26
+ | |Current Practice|Varco Arena|
27
+ |-|-|-|
28
+ |Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
29
+ |No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
30
+ |Comparison type|reference-mediated (anchored) |direct|
31
+ """
32
+ )
33
+ if st.session_state.korean:
34
+ st.info(
35
+ "Varco Arena๋Š” ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ์ˆœ์œ„๋ฅผ ๋” ์ ์€ ํšŸ์ˆ˜์˜ ๋น„๊ต ๋‚ด์— ์–ป์–ด๋‚ด๋ฉฐ, ์ด๋Ÿฌํ•œ ํŠน์ง•์€ LLM ์ง์ ‘ ๋น„๊ต์˜ ์ด์ ์œผ๋กœ๋ถ€ํ„ฐ ๊ธฐ์ธํ•ฉ๋‹ˆ๋‹ค."
36
+ )
37
+ else:
38
+ st.info(
39
+ "Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
40
+ )
pages/quick_start_guide.py ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ from view_utils import default_page_setting, set_nav_bar
4
+
5
+ sidebar_placeholder = default_page_setting()
6
+ set_nav_bar(
7
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="quickstart_init"
8
+ )
9
+
10
+
11
+ if st.session_state.korean:
12
+ st.markdown(open("varco_arena/README_kr.md").read())
13
+ else:
14
+ st.markdown(open("varco_arena/README_en.md").read())
pages/see_results.py ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import streamlit as st
3
+
4
+ import analysis_utils as au
5
+ from analysis_utils import number_breakdown_from_df
6
+ from app import load_and_cache_data
7
+
8
+ # from app import VA_ROOT
9
+ from query_comp import QueryWrapper, get_base_url
10
+ from varco_arena.varco_arena_core.prompts import load_prompt
11
+ from view_utils import (
12
+ default_page_setting,
13
+ escape_markdown,
14
+ set_nav_bar,
15
+ show_linebreak_in_md,
16
+ )
17
+
18
+ DEFAULT_LAYOUT_DICT = {
19
+ "title": {"font": {"size": 20, "family": "Gothic A1"}},
20
+ "font": {"size": 16, "family": "Gothic A1"},
21
+ "xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
22
+ "yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
23
+ "legend": {"font": {"size": 12, "family": "Gothic A1"}},
24
+ }
25
+
26
+
27
+ def navigate(t, source, key, val):
28
+ # print(key, val)
29
+ if source is None:
30
+ return
31
+ target_index = t.index(source) + val
32
+ if 0 <= target_index < len(t):
33
+ st.session_state[key] = t[target_index]
34
+ st.rerun()
35
+
36
+
37
+ def main():
38
+ sidebar_placeholder = default_page_setting(layout="wide")
39
+ set_nav_bar(
40
+ False,
41
+ sidebar_placeholder=sidebar_placeholder,
42
+ toggle_hashstr="see_results_init",
43
+ )
44
+
45
+ # load the data
46
+ # print(f"{st.session_state.get('result_file_path', None)=}")
47
+ most_recent_run = st.session_state.get("result_file_path", None)
48
+ most_recent_run = str(most_recent_run) if most_recent_run is not None else None
49
+ (
50
+ st.session_state["all_result_dict"],
51
+ st.session_state["df_dict"],
52
+ ) = load_and_cache_data(result_file_path=most_recent_run)
53
+
54
+ # side bar
55
+ st.sidebar.title("Select Result:")
56
+ result_select = QueryWrapper("expname")(
57
+ st.sidebar.selectbox,
58
+ list(st.session_state["all_result_dict"].keys()),
59
+ )
60
+
61
+ if result_select is None:
62
+ st.stop()
63
+ eval_prompt_name = result_select.split("/")[-1].strip()
64
+ if st.sidebar.button("Clear Cache"):
65
+ st.cache_data.clear()
66
+ st.cache_resource.clear()
67
+ st.rerun()
68
+
69
+ if result_select:
70
+ if "alpha2names" in st.session_state:
71
+ del st.session_state["alpha2names"]
72
+
73
+ fig_dict_per_task = st.session_state["all_result_dict"][result_select]
74
+ task_list = list(fig_dict_per_task.keys())
75
+ elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
76
+ # tabs = st.tabs(task_list)
77
+
78
+ df_dict_per_task = st.session_state["df_dict"][result_select]
79
+
80
+ default_layout_dict = DEFAULT_LAYOUT_DICT
81
+ task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
82
+
83
+ if task is None:
84
+ st.stop()
85
+ figure_dict = fig_dict_per_task[task]
86
+ judgename = figure_dict["judgename"]
87
+ df = df_dict_per_task[task]
88
+ interpretation, n_models, size_testset = number_breakdown_from_df(df)
89
+ if st.session_state.korean:
90
+ st.markdown(f"## ๊ฒฐ๊ณผ ({task})")
91
+ st.markdown(f"##### Judge ๋ชจ๋ธ: {judgename} / ํ‰๊ฐ€ํ”„๋กฌ: {eval_prompt_name}")
92
+ st.markdown(f"##### ํ…Œ์ŠคํŠธ์…‹ ์‚ฌ์ด์ฆˆ: {int(size_testset)} ํ–‰")
93
+ else:
94
+ st.markdown(f"## Results ({task})")
95
+ st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
96
+ st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
97
+ col1, col2 = st.columns(2)
98
+ with col1:
99
+ with st.container(border=True):
100
+ st.markdown(f"#### Ratings ({task})")
101
+ st.table(figure_dict["elo_rating"])
102
+ st.write(show_linebreak_in_md(escape_markdown(interpretation)))
103
+
104
+ with col2:
105
+ with st.container(border=True):
106
+ st.plotly_chart(
107
+ elo_rating_by_task.update_layout(**default_layout_dict),
108
+ use_container_width=True,
109
+ key=f"{task}_elo_rating_by_task",
110
+ )
111
+
112
+ st.divider()
113
+
114
+ if st.session_state.korean:
115
+ st.markdown("### ํ† ๋„ˆ๋จผํŠธ (ํ…Œ์ŠคํŠธ ์‹œ๋‚˜๋ฆฌ์˜ค) ๋ณ„๋กœ ๋ณด๊ธฐ")
116
+ else:
117
+ st.markdown("### Tournament Results by Test Scenario")
118
+
119
+ # with st.expander("๋ณผ ํ† ๋„ˆ๋จผํŠธ ๊ณ ๋ฅด๊ธฐ"):
120
+ d = list(df.idx_inst_src.unique())
121
+ default_idx = st.session_state.get("selected_tournament", None)
122
+ cols = st.columns((1, 18, 1))
123
+ with cols[0]:
124
+ if st.button("โ—€", key="prev_tournament"):
125
+ navigate(d, default_idx, "selected_tournament", -1)
126
+ with cols[1]:
127
+ tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
128
+ st.selectbox,
129
+ d,
130
+ default_idx,
131
+ key=f"{task}_tournament_select",
132
+ on_change=lambda: st.session_state.update(
133
+ selected_tournament=st.session_state.get(f"{task}_tournament_select"),
134
+ selected_match=None,
135
+ ),
136
+ label_visibility="collapsed",
137
+ )
138
+ with cols[2]:
139
+ if st.button("โ–ถ", key="next_tournament"):
140
+ navigate(d, default_idx, "selected_tournament", 1)
141
+
142
+ # tournament_prm_select = st.selectbox(
143
+ # "Select Tournament",
144
+ # df.idx_inst_src.unique(),
145
+ # index=d.index(st.session_state.get("selected_tournament")),
146
+ # key=f"{task}_tournament_{result_select}",
147
+ # )
148
+
149
+ # print(tournament_prm_select, type(tournament_prm_select))
150
+ st.session_state["selected_tournament"] = tournament_prm_select
151
+ # tournament_prm_select = st.selectbox(
152
+ # "Select Tournament",
153
+ # df.idx_inst_src.unique(),
154
+ # key=f"{task}_tournament_{result_select}",
155
+ # )
156
+ df_now_processed = None
157
+ if tournament_prm_select:
158
+ df_now = df[df.idx_inst_src == tournament_prm_select]
159
+ df_now_processed, _alpha2names = au.init_tournament_dataframe(
160
+ df_now,
161
+ alpha2names=st.session_state["alpha2names"]
162
+ if "alpha2names" in st.session_state.keys()
163
+ else None,
164
+ )
165
+ if "alpha2names" not in st.session_state:
166
+ st.session_state["alpha2names"] = _alpha2names
167
+
168
+ try:
169
+ bracket_drawing = au.draw(
170
+ df_now_processed,
171
+ alpha2names=st.session_state["alpha2names"],
172
+ )
173
+ legend = au.make_legend_str(
174
+ df_now_processed, st.session_state["alpha2names"]
175
+ )
176
+ st.code(bracket_drawing + legend)
177
+
178
+ m = list(df_now_processed.human_readable_idx)
179
+ default_idx = st.session_state.get("selected_match", None)
180
+ cols = st.columns((1, 18, 1))
181
+ with cols[0]:
182
+ if st.button("โ—€", key="prev_match"):
183
+ navigate(m, default_idx, "selected_match", -1)
184
+ with cols[1]:
185
+ match_idx_human = QueryWrapper("match", "Select Match")(
186
+ st.selectbox,
187
+ m,
188
+ default_idx,
189
+ key=f"{task}_match_select",
190
+ label_visibility="collapsed",
191
+ )
192
+ with cols[2]:
193
+ if st.button("โ–ถ", key="next_match"):
194
+ navigate(m, default_idx, "selected_match", 1)
195
+
196
+ # match_idx_human = st.selectbox(
197
+ # "Select Match",
198
+ # df_now_processed.human_readable_idx,
199
+ # key=f"{task}_match_{result_select}",
200
+ # )
201
+ # print(match_idx_human)
202
+ st.session_state["selected_match"] = match_idx_human
203
+ # match_idx_human = st.selectbox(
204
+ # "Select Match",
205
+ # df_now_processed.human_readable_idx,
206
+ # key=f"{task}_match_{result_select}",
207
+ # )
208
+ if match_idx_human:
209
+ match_idx = int(match_idx_human.split(": ")[0])
210
+ row = df_now_processed.loc[match_idx]
211
+
212
+ st.markdown("#### Current Test Scenario:")
213
+
214
+ with st.expander(
215
+ f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
216
+ ):
217
+ prompt = load_prompt(eval_prompt_name, task=task)
218
+ kwargs = dict(
219
+ inst="{inst}",
220
+ src="{src}",
221
+ out_a="{out_a}",
222
+ out_b="{out_b}",
223
+ task=task,
224
+ )
225
+ if eval_prompt_name == "translation_pair":
226
+ kwargs["source_lang"] = "{source_lang}"
227
+ kwargs["target_lang"] = "{target_lang}"
228
+ prompt_cmpl = prompt.complete_prompt(**kwargs)
229
+ for msg in prompt_cmpl:
230
+ st.markdown(f"**{msg['role']}**")
231
+ st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
232
+
233
+ st.info(show_linebreak_in_md(tournament_prm_select))
234
+
235
+ winner = row.winner
236
+ col1, col2 = st.columns(2)
237
+
238
+ winnerbox = st.success
239
+ loserbox = st.error
240
+ with col1:
241
+ iswinner = winner == "model_a"
242
+ writemsg = winnerbox if iswinner else loserbox
243
+ st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
244
+ writemsg(
245
+ show_linebreak_in_md(row.generated_a),
246
+ icon="โœ…" if iswinner else "โŒ",
247
+ )
248
+ with col2:
249
+ iswinner = winner == "model_b"
250
+ writemsg = winnerbox if iswinner else loserbox
251
+ st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
252
+ writemsg(
253
+ show_linebreak_in_md(row.generated_b),
254
+ icon="โœ…" if iswinner else "โŒ",
255
+ )
256
+ except Exception as e:
257
+ import traceback
258
+
259
+ traceback.print_exc()
260
+ st.markdown(
261
+ "**Bug: ์•„๋ž˜ ํ‘œ๋ฅผ ๋ณต์‚ฌํ•ด์„œ ์ด์Šˆ๋กœ ๋‚จ๊ฒจ์ฃผ์‹œ๋ฉด ๊ฐœ์„ ์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค๐Ÿ™**"
262
+ if st.session_state.korean
263
+ else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.๐Ÿ™"
264
+ )
265
+ st.error(e)
266
+ st.info(tournament_prm_select)
267
+ st.table(
268
+ df_now_processed[
269
+ [
270
+ "depth",
271
+ "round",
272
+ "winner_nodes",
273
+ "winner_resolved",
274
+ "winner",
275
+ "model_a",
276
+ "model_b",
277
+ ]
278
+ ]
279
+ )
280
+ st.write("Sharable link")
281
+ st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
282
+ st.divider()
283
+
284
+ if st.session_state.korean:
285
+ st.markdown("### ๋งค์น˜ ํ†ต๊ณ„")
286
+ else:
287
+ st.markdown("### Match Stats.")
288
+ col1, col2 = st.columns(2)
289
+ col1, col2 = st.columns(2)
290
+ with col1:
291
+ with st.container(border=True):
292
+ st.plotly_chart(
293
+ figure_dict[
294
+ "fraction_of_model_a_wins_for_all_a_vs_b_matches"
295
+ ].update_layout(autosize=True, **default_layout_dict),
296
+ use_container_width=True,
297
+ key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
298
+ )
299
+ with col2:
300
+ with st.container(border=True):
301
+ st.plotly_chart(
302
+ figure_dict["match_count_of_each_combination_of_models"].update_layout(
303
+ autosize=True, **default_layout_dict
304
+ ),
305
+ use_container_width=True,
306
+ key=f"{task}_match_count_of_each_combination_of_models",
307
+ )
308
+
309
+ with col1:
310
+ with st.container(border=True):
311
+ st.plotly_chart(
312
+ figure_dict["match_count_for_each_model"].update_layout(
313
+ **default_layout_dict
314
+ ),
315
+ use_container_width=True,
316
+ key=f"{task}_match_count_for_each_model",
317
+ )
318
+ with col2:
319
+ pass
320
+
321
+ if st.session_state.korean:
322
+ st.markdown("### ์ฐธ๊ณ ์šฉ LLM Judge ํŽธํ–ฅ ์ •๋ณด")
323
+ else:
324
+ st.markdown("### FYI: How biased is your LLM Judge?")
325
+
326
+ with st.expander("ํŽผ์ณ์„œ ๋ณด๊ธฐ" if st.session_state.korean else "Expand to show"):
327
+ st.info(
328
+ """
329
+ Varco Arena์—์„œ๋Š” position bias์˜ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ๋ชจ๋ธ์ด A๋‚˜ B์œ„์น˜์— ๋ฒˆ๊ฐˆ์•„ ์œ„์น˜ํ•˜๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ LLM Judge ํ˜น์€ Prompt์˜ ์„ฑ๋Šฅ์ด ๋ถ€์กฑํ•˜๋‹ค๊ณ  ๋Š๊ปด์ง„๋‹ค๋ฉด, ์•„๋ž˜ ์•Œ๋ ค์ง„ LLM Judge bias๊ฐ€ ์ฐธ๊ณ ๊ฐ€ ๋ ๊ฒ๋‹ˆ๋‹ค.
330
+ * position bias (์™ผ์ชฝ)
331
+ * length bias (์˜ค๋ฅธ์ชฝ)
332
+
333
+ ๊ฒฐ๊ณผ์˜ ์™œ๊ณก์ด LLM Judge์˜ ๋ถ€์กฑํ•จ ๋–„๋ฌธ์ด์—ˆ๋‹ค๋Š” ์ ์„ ๊ทœ๋ช…ํ•˜๋ ค๋ฉด ์‚ฌ์šฉํ•˜์‹  LLM Judge์™€ Prompt์˜ binary classification ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ด๋ณด์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค (Varco Arena๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ด๋ฅผ ์ˆ˜ํ–‰ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!).""".strip()
334
+ if st.session_state.korean
335
+ else """
336
+ In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
337
+ * position bias (left)
338
+ * length bias (right)
339
+
340
+ To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
341
+ """.strip()
342
+ )
343
+ st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
344
+ col1, col2 = st.columns(2)
345
+ with col1:
346
+ with st.container(border=True):
347
+ st.plotly_chart(
348
+ figure_dict["counts_of_match_winners"].update_layout(
349
+ **default_layout_dict
350
+ ),
351
+ use_container_width=True,
352
+ key=f"{task}_counts_of_match_winners",
353
+ )
354
+ with col2:
355
+ with st.container(border=True):
356
+ st.plotly_chart(
357
+ figure_dict["length_bias"].update_layout(**default_layout_dict),
358
+ use_container_width=True,
359
+ key=f"{task}_length_bias",
360
+ )
361
+ st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
362
+
363
+
364
+ if __name__ == "__main__":
365
+ main()
precommit.sh ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # reformat all to prevent commit problem
2
+ git add .
3
+ pre-commit run --all-files
4
+ # restore the files added so that one can actually add files to be commit
5
+ git restore --staged .
query_comp.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # query๋ฅผ ์ž๋™์œผ๋กœ ์ฝ๊ณ  ์“ฐ๋Š” container๋ฅผ ์ •์˜
2
+
3
+ from __future__ import annotations
4
+
5
+ import re
6
+ from typing import Callable, TypeVar
7
+
8
+ import streamlit as st
9
+
10
+ __all__ = ["QueryWrapper", "get_base_url"]
11
+
12
+ T = TypeVar("T")
13
+
14
+
15
+ import hashlib
16
+ import urllib.parse
17
+
18
+
19
+ def SHA1(msg: str) -> str:
20
+ return hashlib.sha1(msg.encode()).hexdigest()[:8]
21
+
22
+
23
+ def get_base_url():
24
+ session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
25
+ return urllib.parse.urlunparse(
26
+ [session.client.request.protocol, session.client.request.host, "", "", "", ""]
27
+ )
28
+
29
+
30
+ class QueryWrapper:
31
+ queries: dict[str, _QueryWrapper] = {} # ๊ธฐ๋ก์šฉ
32
+
33
+ def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
34
+ self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
35
+ query, label, use_hash
36
+ )
37
+
38
+ def __call__(self, *args, **kwargs):
39
+ return self.__wrapper(*args, **kwargs)
40
+
41
+ @classmethod
42
+ def get_sharable_link(cls):
43
+ # for k, v in cls.queries.items():
44
+ # print(f"{k}: {v}")
45
+ return re.sub(
46
+ "&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
47
+ ).strip("&")
48
+
49
+
50
+ class _QueryWrapper:
51
+ ILLEGAL_CHARS = "&/=?"
52
+
53
+ def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
54
+ self.query = query
55
+ self.label = label or query
56
+ self.use_hash = use_hash
57
+ self.hash_table = {}
58
+ self.key = None
59
+
60
+ def __call__(
61
+ self,
62
+ base_container: Callable,
63
+ legal_list: list[T],
64
+ default: T | list[T] | None = None,
65
+ *,
66
+ key: str | None = None,
67
+ **kwargs,
68
+ ) -> T | list[T] | None:
69
+ val_from_query = st.query_params.get_all(self.query.lower())
70
+ # print(val_from_query)
71
+ legal = len(val_from_query) > 0
72
+ self.key = key or self.label
73
+
74
+ self.hash_table = {SHA1(str(v)): v for v in legal_list}
75
+
76
+ # filter out illegal values
77
+ if legal and legal_list:
78
+ val_from_query = [v for v in val_from_query if v in self.hash_table]
79
+ # print(self.label, val_from_query, legal)
80
+ if legal:
81
+ selected = [self.hash_table[v] for v in val_from_query]
82
+ elif default:
83
+ selected = default
84
+ elif self.label in st.session_state:
85
+ selected = st.session_state[self.label]
86
+ if legal_list:
87
+ if isinstance(selected, list):
88
+ selected = [v for v in selected if v in legal_list]
89
+ elif selected not in legal_list:
90
+ selected = []
91
+ else:
92
+ selected = []
93
+ if selected is None:
94
+ pass
95
+ elif len(selected) == 1 and base_container in [st.selectbox, st.radio]:
96
+ selected = selected[0]
97
+ # print(self.label, selected)
98
+ if base_container == st.checkbox:
99
+ selected = base_container(
100
+ self.label,
101
+ legal_list,
102
+ index=legal_list.index(selected) if selected in legal_list else None,
103
+ key=self.key,
104
+ **kwargs,
105
+ )
106
+ elif base_container == st.multiselect:
107
+ selected = base_container(
108
+ self.label, legal_list, default=selected, key=self.key, **kwargs
109
+ )
110
+ elif base_container == st.radio:
111
+ selected = base_container(
112
+ self.label,
113
+ legal_list,
114
+ index=legal_list.index(selected) if selected in legal_list else None,
115
+ key=self.key,
116
+ **kwargs,
117
+ )
118
+ elif base_container == st.selectbox:
119
+ selected = base_container(
120
+ self.label,
121
+ legal_list,
122
+ index=legal_list.index(selected) if selected in legal_list else None,
123
+ key=self.key,
124
+ **kwargs,
125
+ )
126
+ else:
127
+ selected = base_container(self.label, legal_list, key=self.key, **kwargs)
128
+ return st.session_state[self.key]
129
+
130
+ def __str__(self):
131
+ selected = st.session_state.get(self.key, None)
132
+ if isinstance(selected, str):
133
+ return f"{self.query.lower()}={SHA1(selected)}"
134
+ elif isinstance(selected, list):
135
+ return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
136
+ else:
137
+ return ""
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ openai>=1.17.0
2
+ munch
3
+ pandas
4
+ numpy
5
+ tqdm>=4.48.0
6
+ plotly
7
+ scikit-learn
8
+ kaleido
9
+ tiktoken>=0.7.0
10
+ pyyaml
11
+ transformers
12
+ streamlit>=1.40.2
13
+ openpyxl
14
+ git+https://github.com/shobrook/openlimit.git#egg=openlimit
run.sh ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ #!/bin/bash
2
+ # ์—ฌ๊ธฐ์„œ ๋Œ๋ ค์•ผํ•ด์š” (cwd = ./streamlit_app_components/)
3
+ # VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py
4
+ streamlit run app.py
streamlit_app_local/.streamlit/config.toml ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ [client]
2
+ showSidebarNavigation = false
3
+ [server]
4
+ headless = true
streamlit_app_local/README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Varco Arena web app
2
+ ```bash
3
+ cd ./streamlit_app_local/
4
+ bash run.sh
5
+ ```
6
+
7
+ ## view: Running
8
+ ![run_page](streamlit_app_local/page_run.png)
9
+
10
+ ## view: Results
11
+ ![result_page1](./page_result_1.png)
12
+ ![result_page2](./page_result_2.png)
13
+ ![result_page3](./page_result_3.png)
14
+ ![result_page4](./page_result_4.png)
15
+
16
+ ## how to use
17
+ 1. upload llm outputs in jsonlines format
18
+ 2. cost estimation
19
+ 3. run
20
+ 4. go to see results tab
21
+
22
+ ## example run results at...
23
+ - `streamlit_app_local/user_submit/`
24
+ - visible on web if you `bash run.sh`
streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc ADDED
Binary file (17.7 kB). View file
 
streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc ADDED
Binary file (9.12 kB). View file
 
streamlit_app_local/__pycache__/app.cpython-311.pyc ADDED
Binary file (15.9 kB). View file
 
streamlit_app_local/__pycache__/app.cpython-38.pyc ADDED
Binary file (6.32 kB). View file
 
streamlit_app_local/__pycache__/query_comp.cpython-311.pyc ADDED
Binary file (8 kB). View file
 
streamlit_app_local/__pycache__/view_utils.cpython-311.pyc ADDED
Binary file (18.3 kB). View file
 
streamlit_app_local/__pycache__/view_utils.cpython-38.pyc ADDED
Binary file (9.91 kB). View file
 
streamlit_app_local/analysis_utils.py ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from copy import deepcopy
2
+ from functools import partial
3
+ from typing import *
4
+
5
+ import pandas as pd
6
+ from fire import Fire
7
+
8
+ """
9
+ This code assumes dealing with only one instruction
10
+ """
11
+
12
+
13
+ # from varco_arena.tournament
14
+ def log2_power_of_two(n):
15
+ # First, let's make sure n is indeed a power of 2
16
+ if n & (n - 1) != 0 or n == 0:
17
+ raise ValueError("n must be a positive power of 2")
18
+
19
+ exponent = 0
20
+ while n > 1:
21
+ n >>= 1 # Right shift is like dividing by 2, but faster
22
+ exponent += 1
23
+ return exponent
24
+
25
+
26
+ def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
27
+ finals = df[df["round"] == "final"]
28
+ if len(finals) == 1:
29
+ first = finals.iloc[0].winner_resolved
30
+ else:
31
+ first = (
32
+ None # error case (no finals match or multiple finals (buggy result file))
33
+ )
34
+
35
+ return first
36
+
37
+
38
+ def get_unique_participants(df: pd.DataFrame) -> list:
39
+ participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
40
+ participants = [p for p in participants if p] # remove None
41
+ participants = sorted(participants) # make it sorted
42
+ return participants
43
+
44
+
45
+ def _impute_byes(df):
46
+ max_depth = df.depth.max()
47
+
48
+ # init
49
+ imputed_parts = dict()
50
+ for depth in range(max_depth + 1):
51
+ imputed_parts[depth] = df[df.depth == depth].copy()
52
+
53
+ # reverse
54
+ for depth in range(max_depth, 0, -1): # always we have 1 proper match for depth=0
55
+ null_v_null = {
56
+ "model_a": "",
57
+ "model_b": "",
58
+ "winner": "model_a",
59
+ "match_order_in_round": "-",
60
+ "depth": depth,
61
+ }
62
+
63
+ # fill some_model vs null byes
64
+ players = get_unique_participants(imputed_parts[depth])
65
+ proceeded = get_unique_participants(imputed_parts[depth - 1])
66
+ imputed = []
67
+ for p in proceeded:
68
+ if p not in players:
69
+ p_v_null = deepcopy(null_v_null)
70
+ p_v_null["model_a"] = p
71
+ imputed.append(p_v_null)
72
+ imputed_parts[depth] = pd.concat(
73
+ [
74
+ imputed_parts[depth],
75
+ pd.DataFrame(imputed),
76
+ ],
77
+ axis="index",
78
+ )
79
+
80
+ # fill null vs null
81
+ n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
82
+ if n_null_v_null > 0:
83
+ imputed = pd.DataFrame([null_v_null] * n_null_v_null)
84
+ imputed_parts[depth] = pd.concat(
85
+ [
86
+ imputed_parts[depth],
87
+ imputed,
88
+ ],
89
+ axis="index",
90
+ )
91
+
92
+ df_imputed = pd.concat(imputed_parts.values(), axis="index")
93
+ df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
94
+
95
+ return df_imputed
96
+
97
+
98
+ def index_test_scenario(df) -> pd.DataFrame:
99
+ df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
100
+ df["idx_inst_src"] = df.apply(
101
+ lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
102
+ )
103
+
104
+ # later used for tournament bracket backtrackiung
105
+ if "depth" not in df.columns:
106
+ mappings = {
107
+ "final": 0,
108
+ "semi-final": 1,
109
+ "quarter-final": 2,
110
+ }
111
+
112
+ def _convert_round_to_depth(rnd: str, mappings=None) -> int:
113
+ if rnd is None:
114
+ depth = None
115
+ elif rnd in mappings.keys():
116
+ depth = mappings[rnd]
117
+ elif rnd.startswith("round-"): # assume perfect power of two
118
+ num = int(rnd.replace("round-", "").strip())
119
+ depth = log2_power_of_two(num) - 1
120
+ return depth
121
+
122
+ conv = partial(_convert_round_to_depth, mappings=mappings)
123
+ df["depth"] = df["round"].apply(conv)
124
+
125
+ return df
126
+
127
+
128
+ def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
129
+ df = df.sort_values(by="depth").reset_index(drop=True)
130
+ # make winner interpretable (A -> model_a, B -> model_b)
131
+ df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
132
+
133
+ # define alpha2names if not given (covers upto 168 participants)
134
+ if alpha2names is None:
135
+ alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
136
+ abcdefghijklmnopqrstuvwxyz\
137
+ โ“โ“‘โ“’โ““โ“”โ“•โ“–โ“—โ“˜โ“™โ“šโ“›โ“œโ“โ“žโ“Ÿโ“ โ“กโ“ขโ“ฃโ“คโ“ฅโ“ฆโ“งโ“จโ“ฉ\
138
+ ใ‰ ใ‰กใ‰ขใ‰ฃใ‰คใ‰ฅใ‰ฆใ‰งใ‰จใ‰ฉใ‰ชใ‰ซใ‰ฌใ‰ญ\
139
+ ใ‰ฎใ‰ฏใ‰ฐใ‰ฑใ‰ฒใ‰ณใ‰ดใ‰ตใ‰ถใ‰ทใ‰ธใ‰นใ‰บใ‰ป\
140
+ ใ„ฑใ„ดใ„ทใ„นใ…ใ…‚ใ……ใ…‡ใ…ˆใ…Šใ…‹ใ…Œใ…ใ…Ž\
141
+ ฮ‘ฮ’ฮ“ฮ”ฮ•ฮ–ฮ—ฮ˜ฮ™ฮšฮ›ฮœฮฮžฮŸฮ ฮกฮฃฮคฮฅฮฆฮงฮจฮฉ\
142
+ ฮฑฮฒฮณฮดฮตฮถฮทฮธฮนฮบฮปฮผฮฝฮพฮฟฯ€ฯฯƒฯ„ฯ…ฯ†ฯ‡ฯˆฯ‰"
143
+ model_full_names = get_unique_participants(df)
144
+ alpha2names = dict(zip(alphabets, model_full_names))
145
+ if len(alpha2names) < len(model_full_names):
146
+ raise ValueError(
147
+ f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
148
+ )
149
+ names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
150
+ df = _impute_byes(df)
151
+
152
+ # preserve readables for later
153
+ df = _make_readables(df, names2alpha)
154
+
155
+ if len(df[df["round"] == "final"]) != 1:
156
+ raise ValueError(f"final match need to be one and only.")
157
+
158
+ return df, alpha2names
159
+
160
+
161
+ def _make_readables(df, names2alpha):
162
+ df["human_readable_model_a"] = df.model_a.copy()
163
+ df["human_readable_model_b"] = df.model_b.copy()
164
+
165
+ df.model_a = df.model_a.apply(
166
+ lambda modelname: names2alpha[modelname] if modelname else "x"
167
+ )
168
+ df.model_b = df.model_b.apply(
169
+ lambda modelname: names2alpha[modelname] if modelname else "x"
170
+ )
171
+
172
+ df["human_readable_idx"] = df.apply(
173
+ lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
174
+ axis=1,
175
+ )
176
+
177
+ df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
178
+ df["winner_nodes"] = df.apply(
179
+ lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
180
+ ) # later for figure representation of winner as a "node"
181
+ return df
182
+
183
+
184
+ # draw
185
+ def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
186
+ def _draw_round(
187
+ df: pd.DataFrame,
188
+ depth: int = None,
189
+ winners_in_order: list = None,
190
+ ) -> Tuple:
191
+ df_now = df[df.depth == depth]
192
+ max_depth = df.depth.max()
193
+
194
+ width = 2 ** ((max_depth - depth) + 2)
195
+
196
+ connect_left = "โ”€" * (width)
197
+ connect_left = connect_left[4:]
198
+ connect_right = " " * (width)
199
+ connect_right = "โ”" + connect_right[1:]
200
+
201
+ if winners_in_order is None:
202
+ assert (
203
+ depth == 0
204
+ ), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
205
+ winners_in_order = df_now.winner_nodes
206
+
207
+ round_drawing_parts = []
208
+ descending_round_winners = []
209
+ for node in winners_in_order:
210
+ round_drawing_parts.append("".join([node, connect_left, connect_right]))
211
+ # next round winners in sync with winner order
212
+ row_now = df_now.query(f"winner_nodes=='{node}'")
213
+ descending_round_winners.append(row_now.model_a.item())
214
+ descending_round_winners.append(row_now.model_b.item())
215
+
216
+ # find descending_round_winners within winner_nodes format (num:alpha)
217
+ if depth == max_depth:
218
+ pass # keep the descending_round_winners intact
219
+ else:
220
+ df_descend = df[df.depth == depth + 1]
221
+ for i, winner_alpha in enumerate(descending_round_winners):
222
+ node_intr = df_descend.query(
223
+ f"winner_resolved=='{winner_alpha}'"
224
+ ).winner_nodes.item()
225
+ descending_round_winners[i] = node_intr
226
+
227
+ round_drawing = "".join(round_drawing_parts)
228
+ descending_unit = " " * width
229
+ descending_unit = "โ”‚" + descending_unit[1:]
230
+ descending_lines_parts = [descending_unit] * len(df_now) * 2
231
+ descending_lines = "".join(descending_lines_parts)
232
+
233
+ return round_drawing, descending_lines, descending_round_winners
234
+
235
+ drawings = []
236
+ winners_in_order = None
237
+ max_depth = df.depth.max()
238
+ for depth in range(max_depth + 1):
239
+ max_depth = df.depth.max()
240
+ winner_drw, lines_desc, winners_in_order = _draw_round(
241
+ df,
242
+ depth=depth,
243
+ winners_in_order=winners_in_order,
244
+ )
245
+ drawings.append((winner_drw, lines_desc))
246
+ # prepare bracket top
247
+ champion_alphabet = drawings[0][0].split()[0].split(":")[0]
248
+ champion_readable = alpha2names[champion_alphabet]
249
+ bracket_top = [f"๐Ÿฅ‡winner: {champion_readable}", "โ”‚"]
250
+ # prepare mid
251
+ bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
252
+
253
+ # prepare bot
254
+ initial_participants = winners_in_order
255
+ bracket_bot = (" " * 3).join(initial_participants)
256
+
257
+ full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
258
+
259
+ return full_figure
260
+
261
+
262
+ def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
263
+ n_models = len(get_unique_participants(result_df))
264
+ size_testset = int(len(result_df) / (n_models - 1))
265
+ interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
266
+ return interpretation, n_models, size_testset
267
+
268
+
269
+ def make_legend_str(df, alpha2names) -> str:
270
+ first = get_1st(df, alpha2names)
271
+ alpha2names = {k: v.replace("๐Ÿฅ‡ ", "") for k, v in alpha2names.items()}
272
+ alpha_ordered = sorted(list(alpha2names.keys()))
273
+ # names_ordered = sorted(list(alpha2names.values()))
274
+ # name2alpha = {v: k for k, v in alpha2names.items()}
275
+
276
+ for k, v in alpha2names.items():
277
+ if v == alpha2names[first]:
278
+ alpha2names[k] = f"๐Ÿฅ‡ {v}"
279
+ res_str = f"\n\nlegend:"
280
+ # for name in names_ordered:
281
+ # alpha = name2alpha[name]
282
+ for alpha in alpha_ordered:
283
+ name_w_medal = alpha2names[alpha]
284
+ res_str += f"\n{alpha}\t{name_w_medal}"
285
+ return res_str
286
+
287
+
288
+ def main(
289
+ jslname: str = "result.json",
290
+ ):
291
+ """
292
+ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ
293
+ """
294
+
295
+ df = pd.read_json(jslname, orient="records")
296
+ df = df.drop(columns=["tstamp", "logs"])
297
+ df = index_test_scenario(df)
298
+
299
+ # ์ค‘๊ฐ„์— visualization(df) ์—ฌ๊ธฐ์„  ์ƒ๋žต. ๋งŒ์•ฝ ์ด๊ฑฐ ๋บด๊ณ  ๋‹ค ๋”ฐ๋ผํ–ˆ๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธด๋‹ค๋ฉด viz ๋ฌธ์ œ๋‹ค. ๊ทผ๋ฐ ์•ˆ๊ทธ๋Ÿด๊ฑฐ๊ฐ™๊ธดํ•จ
300
+ selections = df.idx_inst_src.unique()
301
+ for i, sel in enumerate(selections):
302
+ try:
303
+ df_now = df[df.idx_inst_src == sel]
304
+ df_now_processed, _alpha2names = init_tournament_dataframe(
305
+ df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
306
+ )
307
+ if "alpha2names" not in dir():
308
+ alpha2names = _alpha2names
309
+ assert "alpha2names" in dir()
310
+ bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
311
+ legend = make_legend_str(df_now_processed, alpha2names)
312
+ print(bracket_drawing + legend)
313
+ print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
314
+ print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
315
+
316
+ for match_idx_human in df_now_processed.human_readable_idx:
317
+ match_idx = int(match_idx_human.split(": ")[0])
318
+ row = df_now_processed.loc[match_idx]
319
+ winner = row.winner
320
+ except Exception as e:
321
+ print(e, file=open(f"{i}_err.txt", "w"))
322
+ print("", file=open(f"{i}_err.txt", "a"))
323
+ print(sel, file=open(f"{i}_err.txt", "a"))
324
+ df_now_processed[
325
+ [
326
+ "depth",
327
+ "round",
328
+ "winner_nodes",
329
+ "winner_resolved",
330
+ "winner",
331
+ "model_a",
332
+ "model_b",
333
+ ]
334
+ ].to_json(f"{i}_err.jsonl", lines=True, orient="records")
335
+
336
+
337
+ if __name__ == "__main__":
338
+ Fire(main)
streamlit_app_local/app.py ADDED
@@ -0,0 +1,318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # import shutil
2
+ import os
3
+ import select
4
+ import subprocess
5
+ import sys
6
+ import time
7
+ from datetime import datetime, timedelta, timezone
8
+ from pathlib import Path
9
+ from typing import *
10
+
11
+ import streamlit as st
12
+
13
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
14
+
15
+ from varco_arena_core.prompts import load_prompt
16
+
17
+ from view_utils import (
18
+ default_page_setting,
19
+ escape_markdown,
20
+ set_nav_bar,
21
+ show_linebreak_in_md,
22
+ )
23
+
24
+ # import sys
25
+ # print(sys.executable)
26
+
27
+
28
+ VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
29
+ USR_SUB = VA_ROOT.parts[-1]
30
+
31
+
32
+ def upload_files(uploaded_files) -> Path:
33
+ # prep directory for user submission
34
+ user_sub_root = VA_ROOT
35
+ if user_sub_root.exists():
36
+ if not user_sub_root.is_dir():
37
+ raise ValueError(
38
+ f"{user_sub_root} file exists and is not a directory. Consider renaming it."
39
+ )
40
+ else:
41
+ user_sub_root.mkdir(parents=True)
42
+
43
+ KST = timezone(timedelta(hours=9))
44
+ tstamp = datetime.now(KST)
45
+ tstr = tstamp.strftime("%m-%d_%H:%M:%S")
46
+ files_dir_str = "./" + str(user_sub_root / tstr)
47
+ files_dir = Path(files_dir_str)
48
+ files_dir.mkdir(parents=True, exist_ok=True)
49
+ uploaded_files = list(uploaded_files)
50
+
51
+ if not uploaded_files:
52
+ st.warning("โŒ No files to upload. Please drag/drop or browse files to upload.")
53
+ elif len(uploaded_files) < 2:
54
+ st.error("โŒ You need at least 2 jsonlines files to properly run VA.")
55
+ else: # properly uploaded
56
+ for file in uploaded_files:
57
+ # Create a path for the file in the server directory
58
+ file_path = files_dir / file.name
59
+
60
+ # Save the file to the server directory
61
+ with open(file_path, "wb") as f:
62
+ f.write(file.getbuffer())
63
+
64
+ jslfiles = list(files_dir.glob("*.jsonl"))
65
+ st.success(f"โœ… Successfully uploaded {len(jslfiles)} jsonl files.")
66
+ return files_dir.resolve()
67
+
68
+
69
+ def run_varco_arena(
70
+ price_estimation: bool = False,
71
+ # upload_dir: Union[str, Path] = None,
72
+ promptname: str = None,
73
+ exp_name: str = None,
74
+ api_key: Optional[str] = None,
75
+ evaluation_model: str = "gpt-4o-mini",
76
+ update_interval: float = 1.0,
77
+ ):
78
+ # Use environment variable for API key
79
+ ptn = f"{str(st.session_state.upfiles_dir)}"
80
+ outdir = Path(ptn)
81
+ if exp_name:
82
+ outdir = outdir / exp_name
83
+
84
+ command = f"python ../varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
85
+ if price_estimation:
86
+ command = f"{command} -c"
87
+ else:
88
+ command = command.replace("python", "yes | python ")
89
+ print(command)
90
+
91
+ api_key = None # clear immediately
92
+
93
+ process = subprocess.Popen(
94
+ command,
95
+ stdout=subprocess.PIPE,
96
+ stderr=subprocess.STDOUT,
97
+ stdin=subprocess.PIPE,
98
+ text=True,
99
+ bufsize=1,
100
+ shell=True,
101
+ )
102
+
103
+ # Set stdout and stdin to non-blocking mode
104
+ os.set_blocking(process.stdout.fileno(), False)
105
+
106
+ last_update_time = time.time()
107
+ terminal_output = st.empty()
108
+ full_output = f"{command}\n"
109
+ while True:
110
+ # Check if we have output to read
111
+ if select.select([process.stdout], [], [], 0)[0]:
112
+ output = process.stdout.readline()
113
+ if output:
114
+ full_output += output
115
+ if price_estimation:
116
+ to_show = full_output
117
+ terminal_output.code(to_show, language="bash")
118
+ else:
119
+ current_time = time.time()
120
+ if current_time - last_update_time > update_interval:
121
+ lines = full_output.split("\n")
122
+ if len(lines) < 5:
123
+ to_show = full_output
124
+ else:
125
+ to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
126
+ terminal_output.code(to_show, language="bash")
127
+ last_update_time = current_time
128
+ print(output)
129
+ time.sleep(0.1)
130
+ # Check if the process has finished
131
+ if process.poll() is not None:
132
+ # Read any remaining output
133
+ remaining_output = process.stdout.read()
134
+ if remaining_output:
135
+ lines = remaining_output.split("\n")
136
+ if len(lines) > 10:
137
+ to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
138
+ else:
139
+ to_show += remaining_output
140
+ terminal_output.code(to_show, language="bash")
141
+ print(remaining_output)
142
+ break
143
+
144
+ return_code = process.poll()
145
+ return outdir, return_code
146
+
147
+
148
+ def main():
149
+ # init lang
150
+ st.session_state["korean"] = st.session_state.get("korean", False)
151
+
152
+ sidebar_placeholder = default_page_setting()
153
+ set_nav_bar(
154
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
155
+ )
156
+
157
+ st.title("โš”๏ธ VARCO ARENA โš”๏ธ")
158
+ if st.session_state.korean:
159
+ st.write(
160
+ "**VARCO Arena๋Š” ๊ฐ ๋ชจ๋ธ์˜ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ต ํ‰๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ์ˆœ์œ„๋ฅผ ์ œ๊ณตํ•˜๋Š” ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ปค์Šคํ…€ ํ…Œ์ŠคํŠธ์…‹์— ์œ ์šฉํ•˜๊ฒŒ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋ฒ”๋‹ต์•ˆ์„ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.**"
161
+ )
162
+ else:
163
+ st.write(
164
+ "**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
165
+ )
166
+
167
+ st.divider()
168
+ # Set up the file uploader
169
+ if st.session_state.korean:
170
+ st.markdown("๋ชจ๋ธ ์ถœ๋ ฅํŒŒ์ผ ์—…๋กœ๋“œ")
171
+ else:
172
+ st.markdown("### 1. Upload LLM responses")
173
+ uploaded_files = st.file_uploader(
174
+ "Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
175
+ )
176
+
177
+ # upload state
178
+ if "upfiles_dir" not in st.session_state:
179
+ st.session_state.upfiles_dir = None
180
+ if st.button("Upload Files"):
181
+ st.session_state.upfiles_dir = upload_files(uploaded_files)
182
+ # st.success(st.session_state.upfiles_dir)
183
+
184
+ # st.markdown("**๐Ÿ’ฅ์ฃผ์˜: ์ค‘๋ณต๋œ ํ…Œ์ŠคํŠธ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ์˜ค๋ฅ˜๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค๐Ÿ’ฅ**")
185
+ if st.session_state.korean:
186
+ with st.expander("โ“โ” ๋ฌด์—‡์„ ์—…๋กœ๋“œ ํ•˜๋‚˜์š”โ“โ”"):
187
+ st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
188
+ else:
189
+ with st.expander("โ“โ” What should I upload โ“โ”"):
190
+ st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
191
+
192
+ # Form for cost estimation
193
+ with st.form("cost_estimation_form"):
194
+ if st.session_state.korean:
195
+ st.write("### 2. ๊ฐ€๊ฒฉ ์‚ฐ์ •")
196
+ else:
197
+ st.write("### 2. Cost Estimation")
198
+ eval_model = st.selectbox(
199
+ "Select Judge",
200
+ open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
201
+ )
202
+ promptname = st.selectbox(
203
+ "Select Evalutaion Prompt",
204
+ open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
205
+ )
206
+ if promptname == USR_SUB:
207
+ raise ValueError(
208
+ f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
209
+ )
210
+ estimate_button = st.form_submit_button("Calculate Cost!")
211
+ with st.expander(
212
+ "LLM Judge์— ํ™œ์šฉ๋˜๋Š” ํ”„๋กฌํ”„ํŠธ (`Calculate Cost!` ํด๋ฆญ์‹œ ๊ฐฑ์‹ )"
213
+ if st.session_state.korean
214
+ else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
215
+ ):
216
+ prompt = load_prompt(promptname, task="-")
217
+ kwargs = dict(
218
+ inst="{inst}",
219
+ src="{src}",
220
+ out_a="{out_a}",
221
+ out_b="{out_b}",
222
+ task="-",
223
+ )
224
+ if promptname == "translation_pair":
225
+ kwargs["source_lang"] = "{source_lang}"
226
+ kwargs["target_lang"] = "{target_lang}"
227
+ prompt_cmpl = prompt.complete_prompt(**kwargs)
228
+
229
+ st.markdown(f"### Evaluation Prompt: {promptname}")
230
+ for msg in prompt_cmpl:
231
+ st.markdown(f"**{msg['role']}**")
232
+ st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
233
+
234
+ if estimate_button:
235
+ if st.session_state.get("upfiles_dir") is None:
236
+ st.error(
237
+ "โŒ Requirements: You have to upload jsonlines files first to proceed"
238
+ )
239
+ else:
240
+ st.markdown("##### Estimated Cost")
241
+ dummy_api_key = "dummy"
242
+ dummy_exp_name = "dummy"
243
+ result_file_path, return_code = run_varco_arena(
244
+ # upload_dir=st.session_state.upfiles_dir,
245
+ promptname=promptname,
246
+ api_key=dummy_api_key,
247
+ exp_name=dummy_exp_name,
248
+ price_estimation=True,
249
+ evaluation_model=eval_model,
250
+ )
251
+ if return_code:
252
+ st.error("โŒ RuntimeError: An error occurred during cost estimation")
253
+ else:
254
+ st.success("โœ… Cost estimation completed successfully")
255
+ st.session_state.cost_estimated = True
256
+
257
+ # Form for actual run
258
+ with st.form("run_arena_form"):
259
+ if st.session_state.korean:
260
+ st.write("### 3. Varco Arena ๊ตฌ๋™ํ•˜๊ธฐ")
261
+ else:
262
+ st.write("### 3. Run Varco Arena")
263
+ api_key = st.text_input("Enter your OpenAI API Key", type="password")
264
+ exp_name = st.text_input("(Optional) Enter Exp. name")
265
+ exp_name = exp_name.replace(
266
+ "..", "_"
267
+ ) # May cause rmtree problem later. Block it.
268
+ exp_name = exp_name.replace(
269
+ USR_SUB, f"-{USR_SUB}-"
270
+ ) # May cause rmtree problem later. Block it.
271
+ exp_name = exp_name.replace("/", "-")
272
+ exp_name = exp_name.replace(" ", "_")
273
+ exp_name = exp_name.replace("~", "_")
274
+
275
+ if st.session_state.korean:
276
+ st.write("**์ฃผ์˜**:`Ctrl+C` ๋ฒ„ํŠผ์€ ๊ตฌํ˜„๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๊ตฌ๋™ ์ „ ์ˆ™๊ณ ํ•ด์ฃผ์„ธ์š”.")
277
+ else:
278
+ st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
279
+ run_button = st.form_submit_button(
280
+ "๐Ÿ”ฅ Run Arena!",
281
+ disabled=(not st.session_state.get("cost_estimated", False))
282
+ or "result_file_path"
283
+ in st.session_state.keys(), # run already performed once
284
+ )
285
+
286
+ if run_button:
287
+ set_nav_bar(
288
+ True,
289
+ sidebar_placeholder=sidebar_placeholder,
290
+ toggle_hashstr="app_during_run",
291
+ )
292
+ if st.session_state.get("upfiles_dir") is None:
293
+ st.error(
294
+ "โŒ Requirements: You have to upload jsonlines files first to proceed"
295
+ )
296
+ elif not api_key:
297
+ st.error("โŒ Requirements: OpenAI key required to run VA.")
298
+ else:
299
+ result_file_path, return_code = run_varco_arena(
300
+ # upload_dir=st.session_state.upfiles_dir,
301
+ promptname=promptname,
302
+ api_key=api_key,
303
+ exp_name=exp_name,
304
+ price_estimation=False,
305
+ evaluation_model=eval_model,
306
+ )
307
+ if return_code:
308
+ st.error("โŒ RuntimeError: An error occurred during Varco Arena run")
309
+ else:
310
+ st.success("โœ… Varco Arena run completed successfully")
311
+ st.session_state.result_file_path = result_file_path
312
+ set_nav_bar(
313
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
314
+ )
315
+
316
+
317
+ if __name__ == "__main__":
318
+ main()
streamlit_app_local/eval_models_list.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gpt-4o-mini
2
+ gpt-4o-2024-05-13
3
+ gpt-4o-2024-08-06
streamlit_app_local/eval_prompt_list.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ llmbar
2
+ llmbar_brief
3
+ translation_pair
4
+ rag_pair_kr
5
+ contextual (WIP)
streamlit_app_local/guide_mds ADDED
@@ -0,0 +1 @@
 
 
1
+ ../guide_mds/
streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc ADDED
Binary file (2.85 kB). View file
 
streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc ADDED
Binary file (889 Bytes). View file
 
streamlit_app_local/modules/nav.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+
4
+ def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
5
+ with sidebar_placeholder.container():
6
+ st.toggle(
7
+ "๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด๋กœ ๋ณด๊ธฐ",
8
+ value=st.session_state.get("korean", False),
9
+ key=f"korean_toggle_{toggle_hashstr}", # Add this explicit key
10
+ on_change=lambda: setattr(
11
+ st.session_state,
12
+ "korean",
13
+ st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
14
+ ),
15
+ )
16
+
17
+ st.page_link(
18
+ "app.py",
19
+ label="Varco Arena ๊ตฌ๋™" if st.session_state.korean else "Run VARCO Arena",
20
+ icon="๐Ÿ”ฅ",
21
+ )
22
+ st.page_link(
23
+ "pages/see_results.py",
24
+ label="๊ฒฐ๊ณผ ๋ณด๊ธฐ" if st.session_state.korean else "See Results",
25
+ icon="๐Ÿ“",
26
+ disabled=st.session_state.get("is_running", False),
27
+ )
28
+ st.page_link(
29
+ "pages/brief_intro.py",
30
+ label="์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋‚˜์š”?" if st.session_state.korean else "How it Works",
31
+ icon="โ”",
32
+ disabled=st.session_state.get("is_running", False),
33
+ )
34
+ related_links_en = """
35
+ **About**
36
+ * [Paper](https://huggingface.co/papers/2411.01281)
37
+ * [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
38
+ * [Inquiry](https://linkedin.com/in/deftson/)
39
+ """.strip()
40
+ related_links_kr = """
41
+ **About**
42
+ * [๋…ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
43
+ * [๋ธ”๋กœ๊ทธ](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
44
+ * [๋ฌธ์˜](https://linkedin.com/in/deftson/)
45
+ """.strip()
46
+ st.info(related_links_kr if st.session_state.korean else related_links_en)
streamlit_app_local/pages/brief_intro.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ from view_utils import default_page_setting, set_nav_bar
4
+
5
+ sidebar_placeholder = default_page_setting()
6
+ set_nav_bar(
7
+ False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
8
+ )
9
+
10
+ st.image("va_en.png")
11
+ if st.session_state.korean:
12
+ st.info("์—ฌ๋Ÿฌ๋ถ„์ด ์—…๋กœ๋“œํ•˜๋Š” ํŒŒ์ผ์€ ์œ„ ๊ทธ๋ฆผ์˜ LLM ์ถœ๋ ฅ๋“ค์ž…๋‹ˆ๋‹ค.")
13
+ with st.expander("โ“โ” ๋ฌด์—‡์„ ์—…๋กœ๋“œ ํ•˜๋‚˜์š”โ“โ”"):
14
+ st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
15
+ else:
16
+ st.info(
17
+ "What you upload is the generated responses of LLMs regarding the test prompts."
18
+ )
19
+ with st.expander("โ“โ” What should I upload โ“โ”"):
20
+ st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
21
+
22
+
23
+ st.image("va_concept_new.png")
24
+ st.markdown(
25
+ """
26
+ | |Current Practice|Varco Arena|
27
+ |-|-|-|
28
+ |Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
29
+ |No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
30
+ |Comparison type|reference-mediated (anchored) |direct|
31
+ """
32
+ )
33
+ if st.session_state.korean:
34
+ st.info(
35
+ "Varco Arena๋Š” ์‹ ๋ขฐ์„ฑ ์žˆ๋Š” ์ˆœ์œ„๋ฅผ ๋” ์ ์€ ํšŸ์ˆ˜์˜ ๋น„๊ต ๋‚ด์— ์–ป์–ด๋‚ด๋ฉฐ, ์ด๋Ÿฌํ•œ ํŠน์ง•์€ LLM ์ง์ ‘ ๋น„๊ต์˜ ์ด์ ์œผ๋กœ๋ถ€ํ„ฐ ๊ธฐ์ธํ•ฉ๋‹ˆ๋‹ค."
36
+ )
37
+ else:
38
+ st.info(
39
+ "Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
40
+ )
streamlit_app_local/pages/see_results.py ADDED
@@ -0,0 +1,391 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import streamlit as st
3
+ from varco_arena_core.prompts import load_prompt
4
+
5
+ import analysis_utils as au
6
+ from analysis_utils import number_breakdown_from_df
7
+ from app import VA_ROOT
8
+ from query_comp import QueryWrapper, get_base_url
9
+ from view_utils import (
10
+ default_page_setting,
11
+ escape_markdown,
12
+ set_nav_bar,
13
+ show_linebreak_in_md,
14
+ visualization,
15
+ )
16
+
17
+ DEFAULT_LAYOUT_DICT = {
18
+ "title": {"font": {"size": 20, "family": "Gothic A1"}},
19
+ "font": {"size": 16, "family": "Gothic A1"},
20
+ "xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
21
+ "yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
22
+ "legend": {"font": {"size": 12, "family": "Gothic A1"}},
23
+ }
24
+
25
+
26
+ @st.cache_data
27
+ def load_data():
28
+ all_result_dict = {}
29
+ df_dict = {}
30
+ # result_dir_list = os.listdir(VA_ROOT)
31
+ # result_dir_list = sorted(result_dir_list, key=lambda x: os.stat(os.path.join(VA_ROOT, x)).st_mtime, reverse=True)
32
+ for f in VA_ROOT.glob("**/result.json"):
33
+ print(f)
34
+
35
+ for resjsonf in VA_ROOT.glob("**/result.json"):
36
+ df = pd.read_json(resjsonf)
37
+ for col in ["tstamp", "logs"]:
38
+ if col in df.columns:
39
+ df.drop(columns=[col], inplace=True)
40
+ df = au.index_test_scenario(df)
41
+
42
+ fig_dict_per_task = {}
43
+ df_dict_per_task = {}
44
+ fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
45
+ df_dict_per_task["Overall"] = df
46
+ for task in df["task"].unique().tolist():
47
+ df_task = df[df["task"] == task]
48
+ fig_dict_per_task[task] = visualization(df_task, is_overall=False)
49
+ df_dict_per_task[task] = df_task
50
+
51
+ prm_name = resjsonf.parts[-2]
52
+ exp_name = resjsonf.parts[-3]
53
+ key = f"{exp_name}/{prm_name}"
54
+ all_result_dict[key] = fig_dict_per_task
55
+ df_dict[key] = df_dict_per_task
56
+ return all_result_dict, df_dict
57
+
58
+
59
+ def navigate(t, source, key, val):
60
+ # print(key, val)
61
+ if source is None:
62
+ return
63
+ target_index = t.index(source) + val
64
+ if 0 <= target_index < len(t):
65
+ st.session_state[key] = t[target_index]
66
+ st.rerun()
67
+
68
+
69
+ def main():
70
+ sidebar_placeholder = default_page_setting(layout="wide")
71
+ set_nav_bar(
72
+ False,
73
+ sidebar_placeholder=sidebar_placeholder,
74
+ toggle_hashstr="see_results_init",
75
+ )
76
+ # load the data
77
+ st.session_state["all_result_dict"], st.session_state["df_dict"] = load_data()
78
+
79
+ # side bar
80
+ st.sidebar.title("Select Result:")
81
+ result_select = QueryWrapper("expname", "")(
82
+ st.sidebar.selectbox,
83
+ list(st.session_state["all_result_dict"].keys()),
84
+ )
85
+
86
+ if result_select is None:
87
+ st.stop()
88
+ eval_prompt_name = result_select.split("/")[-1].strip()
89
+ if st.sidebar.button("Clear Cache"):
90
+ st.cache_data.clear()
91
+ st.cache_resource.clear()
92
+ st.rerun()
93
+
94
+ if result_select:
95
+ if "alpha2names" in st.session_state:
96
+ del st.session_state["alpha2names"]
97
+
98
+ fig_dict_per_task = st.session_state["all_result_dict"][result_select]
99
+ task_list = list(fig_dict_per_task.keys())
100
+ elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
101
+ # tabs = st.tabs(task_list)
102
+
103
+ df_dict_per_task = st.session_state["df_dict"][result_select]
104
+
105
+ default_layout_dict = DEFAULT_LAYOUT_DICT
106
+ task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
107
+
108
+ if task is None:
109
+ st.stop()
110
+ figure_dict = fig_dict_per_task[task]
111
+ judgename = figure_dict["judgename"]
112
+ df = df_dict_per_task[task]
113
+ interpretation, n_models, size_testset = number_breakdown_from_df(df)
114
+ if st.session_state.korean:
115
+ st.markdown(f"## ๊ฒฐ๊ณผ ({task})")
116
+ st.markdown(f"##### Judge ๋ชจ๋ธ: {judgename} / ํ‰๊ฐ€ํ”„๋กฌ: {eval_prompt_name}")
117
+ st.markdown(f"##### ํ…Œ์ŠคํŠธ์…‹ ์‚ฌ์ด์ฆˆ: {int(size_testset)} ํ–‰")
118
+ else:
119
+ st.markdown(f"## Results ({task})")
120
+ st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
121
+ st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
122
+ col1, col2 = st.columns(2)
123
+ with col1:
124
+ with st.container(border=True):
125
+ st.markdown(f"#### Ratings ({task})")
126
+ st.table(figure_dict["elo_rating"])
127
+ st.write(show_linebreak_in_md(escape_markdown(interpretation)))
128
+ st.markdown(f"**files at:** \n`{str(VA_ROOT.resolve()/result_select)}`")
129
+
130
+ with col2:
131
+ with st.container(border=True):
132
+ st.plotly_chart(
133
+ elo_rating_by_task.update_layout(**default_layout_dict),
134
+ use_container_width=True,
135
+ key=f"{task}_elo_rating_by_task",
136
+ )
137
+
138
+ st.divider()
139
+
140
+ if st.session_state.korean:
141
+ st.markdown("### ํ† ๋„ˆ๋จผํŠธ (ํ…Œ์ŠคํŠธ ์‹œ๋‚˜๋ฆฌ์˜ค) ๋ณ„๋กœ ๋ณด๊ธฐ")
142
+ else:
143
+ st.markdown("### Tournament Results by Test Scenario")
144
+
145
+ # with st.expander("๋ณผ ํ† ๋„ˆ๋จผํŠธ ๊ณ ๋ฅด๊ธฐ"):
146
+ d = list(df.idx_inst_src.unique())
147
+ default_idx = st.session_state.get("selected_tournament", None)
148
+ cols = st.columns((1, 18, 1))
149
+ with cols[0]:
150
+ if st.button("โ—€", key="prev_tournament"):
151
+ navigate(d, default_idx, "selected_tournament", -1)
152
+ with cols[1]:
153
+ tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
154
+ st.selectbox,
155
+ d,
156
+ default_idx,
157
+ key=f"{task}_tournament_select",
158
+ on_change=lambda: st.session_state.update(
159
+ selected_tournament=st.session_state.get(f"{task}_tournament_select"),
160
+ selected_match=None,
161
+ ),
162
+ label_visibility="collapsed",
163
+ )
164
+ with cols[2]:
165
+ if st.button("โ–ถ", key="next_tournament"):
166
+ navigate(d, default_idx, "selected_tournament", 1)
167
+
168
+ # tournament_prm_select = st.selectbox(
169
+ # "Select Tournament",
170
+ # df.idx_inst_src.unique(),
171
+ # index=d.index(st.session_state.get("selected_tournament")),
172
+ # key=f"{task}_tournament_{result_select}",
173
+ # )
174
+
175
+ # print(tournament_prm_select, type(tournament_prm_select))
176
+ st.session_state["selected_tournament"] = tournament_prm_select
177
+ # tournament_prm_select = st.selectbox(
178
+ # "Select Tournament",
179
+ # df.idx_inst_src.unique(),
180
+ # key=f"{task}_tournament_{result_select}",
181
+ # )
182
+ df_now_processed = None
183
+ if tournament_prm_select:
184
+ df_now = df[df.idx_inst_src == tournament_prm_select]
185
+ df_now_processed, _alpha2names = au.init_tournament_dataframe(
186
+ df_now,
187
+ alpha2names=st.session_state["alpha2names"]
188
+ if "alpha2names" in st.session_state.keys()
189
+ else None,
190
+ )
191
+ if "alpha2names" not in st.session_state:
192
+ st.session_state["alpha2names"] = _alpha2names
193
+
194
+ try:
195
+ bracket_drawing = au.draw(
196
+ df_now_processed,
197
+ alpha2names=st.session_state["alpha2names"],
198
+ )
199
+ legend = au.make_legend_str(
200
+ df_now_processed, st.session_state["alpha2names"]
201
+ )
202
+ st.code(bracket_drawing + legend)
203
+
204
+ m = list(df_now_processed.human_readable_idx)
205
+ default_idx = st.session_state.get("selected_match", None)
206
+ cols = st.columns((1, 18, 1))
207
+ with cols[0]:
208
+ if st.button("โ—€", key="prev_match"):
209
+ navigate(m, default_idx, "selected_match", -1)
210
+ with cols[1]:
211
+ match_idx_human = QueryWrapper("match", "Select Match")(
212
+ st.selectbox,
213
+ m,
214
+ default_idx,
215
+ key=f"{task}_match_select",
216
+ label_visibility="collapsed",
217
+ )
218
+ with cols[2]:
219
+ if st.button("โ–ถ", key="next_match"):
220
+ navigate(m, default_idx, "selected_match", 1)
221
+
222
+ # match_idx_human = st.selectbox(
223
+ # "Select Match",
224
+ # df_now_processed.human_readable_idx,
225
+ # key=f"{task}_match_{result_select}",
226
+ # )
227
+ # print(match_idx_human)
228
+ st.session_state["selected_match"] = match_idx_human
229
+ # match_idx_human = st.selectbox(
230
+ # "Select Match",
231
+ # df_now_processed.human_readable_idx,
232
+ # key=f"{task}_match_{result_select}",
233
+ # )
234
+ if match_idx_human:
235
+ match_idx = int(match_idx_human.split(": ")[0])
236
+ row = df_now_processed.loc[match_idx]
237
+
238
+ st.markdown("#### Current Test Scenario:")
239
+
240
+ with st.expander(
241
+ f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
242
+ ):
243
+ prompt = load_prompt(eval_prompt_name, task=task)
244
+ kwargs = dict(
245
+ inst="{inst}",
246
+ src="{src}",
247
+ out_a="{out_a}",
248
+ out_b="{out_b}",
249
+ task=task,
250
+ )
251
+ if eval_prompt_name == "translation_pair":
252
+ kwargs["source_lang"] = "{source_lang}"
253
+ kwargs["target_lang"] = "{target_lang}"
254
+ prompt_cmpl = prompt.complete_prompt(**kwargs)
255
+ for msg in prompt_cmpl:
256
+ st.markdown(f"**{msg['role']}**")
257
+ st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
258
+
259
+ st.info(show_linebreak_in_md(tournament_prm_select))
260
+
261
+ winner = row.winner
262
+ col1, col2 = st.columns(2)
263
+
264
+ winnerbox = st.success
265
+ loserbox = st.error
266
+ with col1:
267
+ iswinner = winner == "model_a"
268
+ writemsg = winnerbox if iswinner else loserbox
269
+ st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
270
+ writemsg(
271
+ show_linebreak_in_md(row.generated_a),
272
+ icon="โœ…" if iswinner else "โŒ",
273
+ )
274
+ with col2:
275
+ iswinner = winner == "model_b"
276
+ writemsg = winnerbox if iswinner else loserbox
277
+ st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
278
+ writemsg(
279
+ show_linebreak_in_md(row.generated_b),
280
+ icon="โœ…" if iswinner else "โŒ",
281
+ )
282
+ except Exception as e:
283
+ import traceback
284
+
285
+ traceback.print_exc()
286
+ st.markdown(
287
+ "**Bug: ์•„๋ž˜ ํ‘œ๋ฅผ ๋ณต์‚ฌํ•ด์„œ ์ด์Šˆ๋กœ ๋‚จ๊ฒจ์ฃผ์‹œ๋ฉด ๊ฐœ์„ ์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค๐Ÿ™**"
288
+ if st.session_state.korean
289
+ else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.๐Ÿ™"
290
+ )
291
+ st.error(e)
292
+ st.info(tournament_prm_select)
293
+ st.table(
294
+ df_now_processed[
295
+ [
296
+ "depth",
297
+ "round",
298
+ "winner_nodes",
299
+ "winner_resolved",
300
+ "winner",
301
+ "model_a",
302
+ "model_b",
303
+ ]
304
+ ]
305
+ )
306
+ st.write("Sharable link")
307
+ st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
308
+ st.divider()
309
+
310
+ if st.session_state.korean:
311
+ st.markdown("### ๋งค์น˜ ํ†ต๊ณ„")
312
+ else:
313
+ st.markdown("### Match Stats.")
314
+ col1, col2 = st.columns(2)
315
+ col1, col2 = st.columns(2)
316
+ with col1:
317
+ with st.container(border=True):
318
+ st.plotly_chart(
319
+ figure_dict[
320
+ "fraction_of_model_a_wins_for_all_a_vs_b_matches"
321
+ ].update_layout(autosize=True, **default_layout_dict),
322
+ use_container_width=True,
323
+ key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
324
+ )
325
+ with col2:
326
+ with st.container(border=True):
327
+ st.plotly_chart(
328
+ figure_dict["match_count_of_each_combination_of_models"].update_layout(
329
+ autosize=True, **default_layout_dict
330
+ ),
331
+ use_container_width=True,
332
+ key=f"{task}_match_count_of_each_combination_of_models",
333
+ )
334
+
335
+ with col1:
336
+ with st.container(border=True):
337
+ st.plotly_chart(
338
+ figure_dict["match_count_for_each_model"].update_layout(
339
+ **default_layout_dict
340
+ ),
341
+ use_container_width=True,
342
+ key=f"{task}_match_count_for_each_model",
343
+ )
344
+ with col2:
345
+ pass
346
+
347
+ if st.session_state.korean:
348
+ st.markdown("### ์ฐธ๊ณ ์šฉ LLM Judge ํŽธํ–ฅ ์ •๋ณด")
349
+ else:
350
+ st.markdown("### FYI: How biased is your LLM Judge?")
351
+
352
+ with st.expander("ํŽผ์ณ์„œ ๋ณด๊ธฐ" if st.session_state.korean else "Expand to show"):
353
+ st.info(
354
+ """
355
+ Varco Arena์—์„œ๋Š” position bias์˜ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ๋ชจ๋ธ์ด A๋‚˜ B์œ„์น˜์— ๋ฒˆ๊ฐˆ์•„ ์œ„์น˜ํ•˜๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ LLM Judge ํ˜น์€ Prompt์˜ ์„ฑ๋Šฅ์ด ๋ถ€์กฑํ•˜๋‹ค๊ณ  ๋Š๊ปด์ง„๋‹ค๋ฉด, ์•„๋ž˜ ์•Œ๋ ค์ง„ LLM Judge bias๊ฐ€ ์ฐธ๊ณ ๊ฐ€ ๋ ๊ฒ๋‹ˆ๋‹ค.
356
+ * position bias (์™ผ์ชฝ)
357
+ * length bias (์˜ค๋ฅธ์ชฝ)
358
+
359
+ ๊ฒฐ๊ณผ์˜ ์™œ๊ณก์ด LLM Judge์˜ ๋ถ€์กฑํ•จ ๋–„๋ฌธ์ด์—ˆ๋‹ค๋Š” ์ ์„ ๊ทœ๋ช…ํ•˜๋ ค๋ฉด ์‚ฌ์šฉํ•˜์‹  LLM Judge์™€ Prompt์˜ binary classification ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ด๋ณด์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค (Varco Arena๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ด๋ฅผ ์ˆ˜ํ–‰ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!).""".strip()
360
+ if st.session_state.korean
361
+ else """
362
+ In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
363
+ * position bias (left)
364
+ * length bias (right)
365
+
366
+ To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
367
+ """.strip()
368
+ )
369
+ st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
370
+ col1, col2 = st.columns(2)
371
+ with col1:
372
+ with st.container(border=True):
373
+ st.plotly_chart(
374
+ figure_dict["counts_of_match_winners"].update_layout(
375
+ **default_layout_dict
376
+ ),
377
+ use_container_width=True,
378
+ key=f"{task}_counts_of_match_winners",
379
+ )
380
+ with col2:
381
+ with st.container(border=True):
382
+ st.plotly_chart(
383
+ figure_dict["length_bias"].update_layout(**default_layout_dict),
384
+ use_container_width=True,
385
+ key=f"{task}_length_bias",
386
+ )
387
+ st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
388
+
389
+
390
+ if __name__ == "__main__":
391
+ main()
streamlit_app_local/query_comp.py ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # query๋ฅผ ์ž๋™์œผ๋กœ ์ฝ๊ณ  ์“ฐ๋Š” container๋ฅผ ์ •์˜
2
+
3
+ from __future__ import annotations
4
+
5
+ import re
6
+ from typing import Callable, TypeVar
7
+
8
+ import streamlit as st
9
+
10
+ __all__ = ["QueryWrapper", "get_base_url"]
11
+
12
+ T = TypeVar("T")
13
+
14
+
15
+ import hashlib
16
+ import urllib.parse
17
+
18
+
19
+ def SHA1(msg: str) -> str:
20
+ return hashlib.sha1(msg.encode()).hexdigest()[:8]
21
+
22
+
23
+ def get_base_url():
24
+ session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
25
+ return urllib.parse.urlunparse(
26
+ [session.client.request.protocol, session.client.request.host, "", "", "", ""]
27
+ )
28
+
29
+
30
+ class QueryWrapper:
31
+ queries: dict[str, _QueryWrapper] = {} # ๊ธฐ๋ก์šฉ
32
+
33
+ def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
34
+ self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
35
+ query, label, use_hash
36
+ )
37
+
38
+ def __call__(self, *args, **kwargs):
39
+ return self.__wrapper(*args, **kwargs)
40
+
41
+ @classmethod
42
+ def get_sharable_link(cls):
43
+ # for k, v in cls.queries.items():
44
+ # print(f"{k}: {v}")
45
+ return re.sub(
46
+ "&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
47
+ ).strip("&")
48
+
49
+
50
+ class _QueryWrapper:
51
+ ILLEGAL_CHARS = "&/=?"
52
+
53
+ def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
54
+ self.query = query
55
+ self.label = label or query
56
+ self.use_hash = use_hash
57
+ self.hash_table = {}
58
+ self.key = None
59
+
60
+ def __call__(
61
+ self,
62
+ base_container: Callable,
63
+ legal_list: list[T],
64
+ default: T | list[T] | None = None,
65
+ *,
66
+ key: str | None = None,
67
+ **kwargs,
68
+ ) -> T | list[T] | None:
69
+ val_from_query = st.query_params.get_all(self.query.lower())
70
+ # print(val_from_query)
71
+ legal = len(val_from_query) > 0
72
+ self.key = key or self.label
73
+
74
+ self.hash_table = {SHA1(str(v)): v for v in legal_list}
75
+
76
+ # filter out illegal values
77
+ if legal and legal_list:
78
+ val_from_query = [v for v in val_from_query if v in self.hash_table]
79
+ # print(self.label, val_from_query, legal)
80
+ if legal:
81
+ selected = [self.hash_table[v] for v in val_from_query]
82
+ elif default:
83
+ selected = default
84
+ elif self.label in st.session_state:
85
+ selected = st.session_state[self.label]
86
+ if legal_list:
87
+ if isinstance(selected, list):
88
+ selected = [v for v in selected if v in legal_list]
89
+ elif selected not in legal_list:
90
+ selected = []
91
+ else:
92
+ selected = []
93
+ if len(selected) == 1 and base_container in [st.selectbox, st.radio]:
94
+ selected = selected[0]
95
+ # print(self.label, selected)
96
+ if base_container == st.checkbox:
97
+ selected = base_container(
98
+ self.label,
99
+ legal_list,
100
+ index=legal_list.index(selected) if selected in legal_list else None,
101
+ key=self.key,
102
+ **kwargs,
103
+ )
104
+ elif base_container == st.multiselect:
105
+ selected = base_container(
106
+ self.label, legal_list, default=selected, key=self.key, **kwargs
107
+ )
108
+ elif base_container == st.radio:
109
+ selected = base_container(
110
+ self.label,
111
+ legal_list,
112
+ index=legal_list.index(selected) if selected in legal_list else None,
113
+ key=self.key,
114
+ **kwargs,
115
+ )
116
+ elif base_container == st.selectbox:
117
+ selected = base_container(
118
+ self.label,
119
+ legal_list,
120
+ index=legal_list.index(selected) if selected in legal_list else None,
121
+ key=self.key,
122
+ **kwargs,
123
+ )
124
+ else:
125
+ selected = base_container(self.label, legal_list, key=self.key, **kwargs)
126
+ return st.session_state[self.key]
127
+
128
+ def __str__(self):
129
+ selected = st.session_state.get(self.key, None)
130
+ if isinstance(selected, str):
131
+ return f"{self.query.lower()}={SHA1(selected)}"
132
+ elif isinstance(selected, list):
133
+ return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
134
+ else:
135
+ return ""
streamlit_app_local/requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ streamlit
2
+ plotly
3
+ scikit-learn
4
+ numpy
5
+ pandas
streamlit_app_local/run.sh ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ #!/bin/bash
2
+ # ์—ฌ๊ธฐ์„œ ๋Œ๋ ค์•ผํ•ด์š” (cwd = ./streamlit_app_components/)
3
+ VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py
streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"source":"D: ๊ธธ๋“œ์—ฐ๊ตฌ ์ถ”์ฒœ ๋งˆํฌ ํ–ˆ์œผ๋‹ˆ\nD: ๋‹ค๋“ค ๋ˆ„๋ฅด๊ฑฐ๋ผ\nC: ๊ธธ๋“œ์žฅ๋‹˜ ์œ„์น˜๊ฐ€ ์–ด๋””์—์š”?\nC: ์ฐพ์•˜์Œ...\nD: ๋„ค ๊ธธ๋“œ ํƒœ๊ทธ ๋ถ™์˜€์œผ๋‹ˆ๊นŒ\nD: ๊ทธ์ชฝ์œผ๋กœ ์„ฑ ์ด๋™ ํ•˜์‹œ๊ณ \nD: ๋ณ‘๋ ฅ ๋„ฃ์œผ๋ผ๊ณ  ๋‚ ๋ž˜๋‚ ๋ž˜\nA: ํ•˜์ดํ•˜์ด\nD: ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค ์–ด์„œ์˜ค์‹ญ์…”\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์ด๋ฆฌ๋กœ ์ด๋™ํ•ด์„œ ์˜ค์„ธ์š”\nC: ๋„ค๋„ค~~~~\nA: ํ•จ๊ป˜ํ•ด์š” ~\nC: ์˜ฎ๊ฒผ์Šต๋‹ˆ๋‹ค.\nA: โ™ก\nD: ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด๋งŽ์ด ์žก์•„์ฃผ์…”์•ผ\nD: ์‹œํƒ€๋ธ์˜ ์ž์›์ด ๋„˜์ณ๋‚ฉ๋‹ˆ๋‹ค.\nA: ์ž„๋ชจํƒ„๋‹˜\nA: ๋งํˆฌ๊ฐ€ ๋„ˆ๋ฌด ๋‚˜๊ธ‹๋‚˜๊ธ‹ํ•˜์‹ญ๋‹ˆ๋‹ค\nA: ์ฒดํ†ต์„ ์ง€์ผœ์ฃผ์‹œ์˜ต์†Œ์„œ\nD: ๋„ˆ๋‚˜ ์ž˜ํ•ด\nA: ์ผ€์ผ˜\nD: ์‚ฌ๋žŒ ์ข€ ๋ชจ์ด๋ฉด\nD: ํŒŒ์ˆ˜๊พผ ์žก์•„์—ฌ\nB: ๊ณต๊ฒฉ๋Œ€ ์ฐธ์—ฌ ๋งŽ์ด ํ•ด์ฃผ์„ธ์š”~!\nD: ๊ณต๊ฒฉ๋Œ€ ์ฐธ์—ฌํ•ด์•ผ ์ •๋ง ์ž์› ๋„˜์น˜๋Š” ์‹œํƒ€๋ธ์ด ๋ฉ๋‹ˆ๋‹ค.\nD: ์—ฌ๋Ÿฌ๋ถ„ ํ€˜์ŠคํŠธํ•˜์‹œ๋ฉด์„œ ์ง„ํ–‰ํ•˜์‹œ๋ฉด ๋น ๋ฅธ ์„ฑ์žฅ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.\nD: ์—ฌ๋Ÿฌ๋ถ„ ์˜์ฃผ ๋ช…์˜ˆ ๋ณด์ƒ๋„ ๋ฐ›์œผ์„ธ์š”~\nD: ์˜์ฃผ ๋ช…์˜ˆ ๋ ˆ๋ฒจ 3 ๋˜์„œ ๋ณด์ƒ ๋ฐ›์œผ๋ฉด\nA: ๋„ต ์ž„๋ชจํƒ„๋‹˜!!\nD: ๊ฑด์„ค์Šฌ๋กฏ 2๊ฐœ!\nA: ๊ธ€๋ฃจํ†ค์— ๊ฐ€์†์ด ๊ทธ๋“๊ทธ๋“ ๋“ค์—ˆ๋„ค์š”\nA: ํ•œ๋ถ€๋Œ€์”ฉ ๊ณต๊ฒฉ๋Œ€ ๊ฑธ์–ด์ฃผ์„ธ์š”\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ณด์ด\nD: ์–ด์„œ์˜ค์„ธ์š”~ :)\nA: ์นœ์ ˆํ•œ ๋ชจํƒ„์”จ\nD: ์˜ˆ์“ฐ\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ฝ€์ด\nD: ๊ธธ๋“œ ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์ด๋™ํ•ด ์˜ค์„ธ์š”\nA: ๊ธ€๋ฃจํ†ค ์ฐธ์—ฌํ•˜์‹œ๋ฉด\nA: ์‹œํƒ€๋ธ๋„ ํ’์กฑํ•ด์ง€๊ณ  ๊ฐ€์†๋„ ๋„˜์ณํ๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค์š”\nA: 10๋ ™์งœ๋ฆฌ\nA: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๊ฐ™์ด ์žก์œผ์‹ค๋ถ„\nA: ์—ฌ๊ธฐ๋กœ ๋ถ€๋Œ€ ์ด๋™ํ•ด์ฃผ์„ธ์š” ์ตœ๋Œ€ํ•œ\nA: ShareCoord#๋นˆ ๋•…#145926.609375#523510.437500\nD: ์–ด์„œ์˜ค์„ธ์š”\nA: ์ปค๋ชฌ์ปค๋ชฌ ์›Œ๋ณด์ด\nD: 10๋ ˆ[๋ฒจ ์ •์˜ˆ๋ชน ๊ฐ™์ด ์žก์•„์šฉ\nA: 10๋ ˆ๋ฒจ ๊ฐ•ํ™”๋žฉํ† ๋ฃจ์Šค ํ•œ๋ฒˆ ๋ชจ์—ฌ์„œ ์žก์•„๋ณผ๊นŒ์š”","generated":"* D์ด ๊ธธ๋“œ ์—ฐ๊ตฌ ์ถ”์ฒœ์„ ๋งˆํฌํ–ˆ์œผ๋‹ˆ ๋‹ค๋“ค ๋ˆ„๋ฅด๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* D์ด ๊ธธ๋“œ ํƒœ๊ทธ ๋ถ™์˜€์œผ๋‹ˆ ๊ทธ ์ชฝ์œผ๋กœ ์„ฑ ์ด๋™ ํ•˜๊ณ  ๋ณ‘๋ ฅ์„ ๋‚ ๋ž˜๋‚ ๋ž˜ ๋„ฃ์œผ๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* D์ด ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด๋งŽ์ด ์žก์•„์•ผ ์‹œํƒ€๋ธ์˜ ์ž์›์ด ๋„˜์ฒ˜๋‚œ๋‹ค๊ณ  ๊ฐ•์กฐํ–ˆ์Šต๋‹ˆ๋‹ค.\n* A์ด ์ž„๋ชจํƒ„๋‹˜์˜ ๋งํˆฌ๊ฐ€ ๋„ˆ๋ฌด ๋‚˜๊ธ‹๋‚˜๊ธ‹ํ•˜๋‹ˆ ์ฒดํ†ต์„ ์ง€์ผœ๋‹ฌ๋ผ๊ณ  ํ•˜์ž A์ด ๋„ˆ๋‚˜ ์ž˜ํ•ด๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* D์€ ์‚ฌ๋žŒ ์ข€ ๋ชจ์ด๋ฉด ํŒŒ์ˆ˜๊พผ์„ ์žก์ž๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* D์€ ์˜์ฃผ ๋ช…์˜ˆ ๋ ˆ๋ฒจ์ด 3 ๋˜์–ด์„œ ๋ณด์ƒ์œผ๋กœ ๊ฑด์„ค์Šฌ๋กฏ 2๊ฐœ๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ๊ธ€๋ฃจํ†ค์— ๊ฐ€์†์ด ๊ทธ๋“๊ทธ๋“ ๋“ค์—ˆ์œผ๋‹ˆ ํ•œ ๋ถ€๋Œ€์”ฉ ๊ณต๊ฒฉ๋Œ€๋ฅผ ๊ฑธ์–ด๋‹ฌ๋ผ๊ณ  ์š”์ฒญํ–ˆ์Šต๋‹ˆ๋‹ค.\n* D์™€ A์ด 10๋ ˆ๋ฒจ ๊ฐ•ํ™”๋žฉํ† ๋ฃจ์Šค๋ฅผ ๋ชจ์—ฌ์„œ ์žก์•„๋ณด์ž๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
2
+ {"source":"A: ๋ถ€๋Œ€ ๋‚จ์œผ์‹œ๋Š”๋ถ„\nA: 8282\nA: ์ž ์ณ๋ณผ๊ฒŒ์š”\nA: ํ™”์ดํŒ…\nA: ๋ฐœํ• ๋ผ๋กœ~!!!\nA: ๊ธฐ์–ตํ•ด์ค˜!!! ์ •์˜ˆ 10๋ ˆ๋ฒจ ์žก์•˜๋‹ค๊ตฌ!!!\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋‚จ๋Š” ๋ถ€๋Œ€ 1๊ฐœ์”ฉ\nA: ๊ธธ๋“œ ์š”์ƒˆ ๊ฑด์„ค์— ๋„ฃ์–ด์ฃผ์„ธ์š”!\nA: ์šฐ๋ฆฌ์˜ ์‹œํƒ€๋ธ์„ ๊ฑด์„คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค!\nC: ์›ฐ์ปด ํˆฌ ์‹œํƒ€๋ธ\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ฝ€์ด\nC: ๋ชจ๋‘ 1๋ถ€๋Œ€๋Š” ๊ธธ๋“œ ์š”์ƒˆ์— ๋„ฃ์–ด๋‹ฌ๋ผ์ž…๋‹ˆ๋‹ค.\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nC: ์‹œํƒ€๋ธ์˜ ์˜ํ† ๊ฐ€ ์‹œ๊ธ‰ํ•˜๋‹ค์˜ˆ์š”\nA: ์ด๋ฆฌ๋กœ ์˜ค์„ธ์š” ์–ด์„œ~!\nB: ์šฐ๋ฆฌ ๊ธธ๋“œ์š”์ƒˆ ๋นจ๋ฆฌ ๊ฑด์„ค ํ• ๋ ค๋ฉด ๋ณ‘๋ ฅ์„ ๋” ๋งŽ์ด ๋„ฃ์–ด์•ผ ํ•˜๋‹ˆ๊นŒ ์–ด์„œ์–ด์„œ ์ง€์›ํ•ด์ฃผ์„ธ์š”~ ^^\nA: ๋นจ๊ฐ„๋‚ด๋ณต์ตœ๊ณ ~!!\nC: ์–ด์„œ์˜ค์„ธ์š”~\nC: ใ…Žใ…Žใ…Žใ…Ž\nA: ์šฐ์™€\nC: ์›Œํ„ฐ๋ณด์ด๋‹˜ ์–ด์„œ์˜ค์‹ญ์…”\nA: ์›Œ(ํ„ฐ)๋ณด์ด ์˜ค์…จ๋‹ค\nC: ๋‹ค๋“ค ๋‚ ๋ž˜๋‚ ๋ž˜ ์—ฐ๋งค์šฉ์ƒˆ์— ๊ธฐ์‚ฌ๋‹จ ํ•˜๋‚˜์”ฉ ๋„ฃ์œผ๋ผ์šฐ\nA: ์ž ์šฐ๋ฆฌ ์ด์ œ\nA: ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ\nA: ๋‹ฌ๋ ค์•ผํ•ฉ๋‹ˆ๋‹ค\nA: ํ† ์ŠคํŠธ๋‹˜ ์–ด์„œ์˜ค์„ธ์š”~ ใ…Žใ…Ž\nA: ShareCoord#๋นˆ ๋•…#155862.171875#524460.062500\nA: ์—ฌ๊ธฐ๋กœ ๋ณ‘๋ ฅ์ง‘๊ฒฐํ•ด์ฃผ์„ธ์š”\nA: ์˜ค๋ฅธ์ชฝ์— ๋ณ‘๋ ฅ์Šฌ๋กฏ ๋‘๋ฒˆ๋ˆ„๋ฅด๋ฉด\nA: ์ „์ฒด์„ ํƒํ•ด์„œ ์—ฌ๋Ÿฌ ๊ธฐ์‚ฌ๋‹จ์„ ํ•œ๋ฒˆ์— ์šด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค\nA: ์ž์ž ์š”์ƒˆ ์˜ค๋ฅธ์ชฝ์—\nA: ๋ชจ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค\nA: ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ\nD: ๊ณ„์† ์ด๋™ํ•  ์ˆ˜ ์—†๋Š” ์œ„์น˜๋ผ๊ณ  ํ•˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ๋˜๋‚˜์š”?\nA: ์ฐธ์—ฌํ•ด์„œ ๊ธฐ์‚ฌ ๋ ˆ๋ฒจ ์ญ‰์ญ‰ ์˜ฌ๋ฆฌ์…”์š”\nA: ๊ทธ ์„ฑ์„ ์ด๋ฆฌ์ €๋ฆฌ์˜ฎ๊ธฐ๋ฉด\nA: ์ดˆ๋ก์ƒ‰์œผ๋กœ ์„ฑ ๋ฐ”๋‹ฅ์ด ๋ฐ”๋€Œ๋Š”๋ฐ\nA: ๊ทธ๋•Œ ์ด๋™๋ˆ„๋ฅด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค\nA: ์ž˜ ์•ˆ๋˜๋ฉด 2~3๋ฒˆ์ •๋„ ํŠธ๋ผ์ดํ•ด์ฃผ์‹œ๋ฉด ์ด๋™๋ ๊ฑฐ์—์š”\nA: ShareCoord#๋นˆ ๋•…#156695.468750#525127.875000\nD: ์•„...๊นƒ๋ฐœ์ด ๋– ์žˆ์–ด์„œ ๋ˆŒ๋Ÿฌ๋ดค๋Š”๋ฐ ๋ชป๊ฐ„๋‹ค๊ณ  ํ•ด์„œ ใ…Žใ…Ž;\nA: ์•„~~\nA: ๊นƒ๋ฐœ๋– ์žˆ๋Š”๊ณณ์œผ๋กœ ๊ธฐ์‚ฌ๋‹จ(๋ถ€๋Œ€)๋Š” ์ด๋™ํ•  ์ˆ˜ ์žˆ์–ด์š”!\nA: ๋ถ€๋Œ€ ์—ฌ๋Ÿฌ๊ฐœ ๋ณด๋‚ด์„œ ์š”๊ธฐ ์˜†์— ํŒŒ์ˆ˜๊พผ ์žก์œผ๋ฉด\nA: ๊ฒฝํ—˜์น˜ ํญํƒ„ ํŒกํŒก ํ„ฐ์ง‘๋‹ˆ๋‹ค\nA: ์–ด์„œ์˜ค์„ธ์š”\nD: ์žฅ์• ๋ฌผ์ด ์žˆ์–ด์„œ ๋ชป๊ฐ„๋‹ค๊ณ  ํ•˜๋„ค์š”..\nA: ShareCoord#๋นˆ ๋•…#156578.359375#523799.718750","generated":"* C๊ฐ€ ์‹œํƒ€๋ธ์˜ ์˜ํ† ๊ฐ€ ์‹œ๊ธ‰ํ•˜๋‹ˆ ๋ชจ๋‘ 1๋ถ€๋Œ€๋Š” ๊ธธ๋“œ ์š”์ƒˆ์— ๋„ฃ์–ด๋‹ฌ๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ์š”๊ธฐ ์˜†์— ํŒŒ์ˆ˜๊พผ์„ ์žก์œผ๋ฉด ๊ฒฝํ—˜์น˜ ํญํƒ„ ํŒกํŒก ํ„ฐ์ง€๋‹ˆ ์–ด์„œ ์˜ค๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ๋ฅผ ๋‹ฌ๋ ค์•ผ ํ•˜๋‹ˆ ๋นˆ ๋•…์œผ๋กœ ๋ณ‘๋ ฅ์„ ์ง‘๊ฒฐํ•ด๋‹ฌ๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ๋ถ€๋Œ€ ์—ฌ๋Ÿฌ ๊ฐœ ๋ณด๋‚ด์„œ ์š”๊ธฐ ์˜†์— ํŒŒ์ˆ˜๊พผ ์žก์œผ๋ฉด ๊ฒฝํ—˜์น˜ ํญํƒ„ ํŒกํŒก ํ„ฐ์ง„๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
3
+ {"source":"A: 10์‹œ 10๋ถ„์—\nA: ํŒŒ์ˆ˜๊พผ์‚ฌ๋ƒฅ ์‹œ์ž‘ํ• ๊ฒŒ์š”\nA: ํŒŒ์ˆ˜๊พผ์— ๊ฒฝํ—˜์น˜ ๋งŽ์ด ๋“ค์–ด์žˆ์Šต๋‹ˆ๋‹ค\nC: 10์‹œ 10๋ถ„์— ํŒŒ์ˆ˜๊พผ ์žก์Šต๋‹ˆ๋‹ค.\nC: ๋‹ค๋“ค ์˜ค์„ธ์š”\nE: ์–ด๋””์ธ๊ฐ€์š”..?\nA: ShareCoord#๋นˆ ๋•…#156509.859375#524728.187500\nA: ์š”๊ธฐ๊ฐ€ ๋ถ€๋Œ€ ์ง‘๊ฒฐ์ง€๊ณ ์š”!\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ๊ธธ๋“œ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์„ฑ ์ด๋™ํ•˜์‹  ํ›„์— ๋ถ€๋Œ€ ์˜ฎ๊ธฐ์‹œ๋ฉด ํŽธํ•ฉ๋‹ˆ๋‹ค\nD: ๊ณต๊ฒฉ๋Œ€๋ฅผ ๊ทธ๋ƒฅ ๋ณด๋‚ด๋ฉด ๋˜๋‚˜์š”?\nA: ๊ธฐ์‚ฌ๋‹จ์„ ๋ณด๋‚ด๋‘์…จ๋‹ค๊ฐ€\nA: ์‹œ์ž‘ํ•˜๋ฉด ๋ถ€๋Œ€๋ฅผ ๋Œ์–ด์„œ ๋ชฌ์Šคํ„ฐ ์œ„์— ์ปค์„œ๋ฅผ ์˜ฌ๋ ค๋‘๋ฉด\nA: ๋‹ฌ๋ ค๊ฐ€์„œ ๊ณต๊ฒฉํ•˜๋Š”๋ฐ์š”\nA: ํ•œํƒ€์ด๋ฐ์— ๊ฐ™์ด ์žก์œผ๋ฉด ์ „ํˆฌ์— ์ฐธ์—ฌํ•œ ์‚ฌ๋žŒํ•œํ…Œ ๋ณด์ƒ์„ ๊ท ๋“ฑํ•˜๊ฒŒ ์ค˜์„œ\nA: ๋‹ค๊ฐ™์ด ๋ชฐ์•„์žก๋Š”๊ฒŒ ์ด๋“์ž…๋‹ˆ๋‹ค\nA: 10๋ถ„๋˜์–ด์„œ ํŒŒ์ˆ˜๊พผ ํ•œ๋ฒˆ ๋‹ฌ๋ ค๋ณผ๊ฒŒ์š”~!\nA: ๋ถ€๋Œ€ ์ตœ๋Œ€ํ•œ ๊บผ๋‚ด์„œ ๊ฐ€์ ธ์˜ค์‹  ํ›„์—\nA: ์šฐ์ธก ๋ถ€๋Œ€ ์Šฌ๋กฏ ๋ฒ„ํŠผ ๋”๋ธ”ํด๋ฆญํ•ด์„œ\nA: 3~5๋ถ€๋Œ€ ์šด์šฉํ•˜์…”์š”~!\nA: ์ € ๋ณด์‹œ๋ฉด 5๋ถ€๋Œ€ ์ „๋ถ€\nA: ํ•œ๋ฐฉ์— ์šด์šฉํ•ฉ๋‹ˆ๋‹ค\nA: ์ž์ž ์น ๊ฒŒ์š” ํŒŒ์ˆ˜๊พผ!!\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#155425.500000#527495.250000\nC: ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.\nA: ๊ณต๊ฒฉ๊ณต๊ฒฉ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#159173.750000#528246.125000\nA: ๋ฉ‹์Ÿ์ด๋‹˜ ์ปค๋น„๋‹˜\nA: ๋ถ€๋Œ€ ์›€์ง์—ฌ์„œ ํŒŒ์ˆ˜๊พผ ๊ณต๊ฒฉํ•˜์…”์š”~~!\nB: ๊ฐ™์ด ๊ฐ€์œ ~~~ ํŒŒ์ˆ˜๊พผ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#160192.187500#525461.625000\nA: ์–ด์„œ์˜ค์„ธ์š” ~~\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์„ฑ ์ด๋™ํ•˜์‹œ๊ตฌ\nA: ํŒŒ์ˆ˜๊พผ์žก์œผ๋Ÿฌ ์ฐธ์—ฌํ•˜์…”์š” ์š”์ชฝ์œผ๋ฃจ '\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#165078.687500#525488.375000\nA: ๊ฐ™์ด ์žก์•„์•ผ ๋ณ‘๋ ฅ๋„ ๋œ์ฃฝ๊ณ  ๊ฒฝํ—˜์น˜๋„ ๊ฐ™์ด๋จน์–ด์š”~!\nA: ์–ด์„œ์˜ค์„ธ์š”์˜ค~~~~\nB: ์–ด์„œ์˜ค์„ธ์š”~\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์„ฑ ์ด๋™ ์ขŒํ‘œ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#164864.437500#527752.750000\nC: ์–ด์„œ์˜ค์„ธ์š”~ :)\nA: ๋ชฌ์Šคํ„ฐ ์ขŒํ‘œ\nA: ๊ณ„์† ํŒŒ์ˆ˜๊พผ ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋”~!!\nA: ์–ด์„œ์™€์„œ ์žก์œผ์…”์šฉ~~\nA: ๋ถ€๋Œ€ ํšŒ๊ตฐํ•ด์„œ\nA: ์ •๋น„ ํ•œ๋ฒ„ ใ„ดํ•˜๊ณ \nA: ๋‹ค์‹œ ๋‹ฌ๋ฆด๊ฒŒ์š”\nA: ๋‹ค๋“ค ๋ถ€๋Œ€ ์ฒด๋ ฅ์ด ๊ฐ„๋‹น๊ฐ„๋‹นํ•˜์‹ค๋“ฏํ•˜์—ฌ..","generated":"* A์ด 10์‹œ 10๋ถ„์— ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ์„ ์‹œ์ž‘ํ•œ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* D๊ฐ€ ๊ณต๊ฒฉ๋Œ€๋ฅผ ๊ทธ๋ƒฅ ๋ณด๋‚ด๋ฉด ๋˜๋ƒ๊ณ  ๋ฌป์ž A์€ ๋‹ค๊ฐ™์ด ๋ชฐ์•„์žก๋Š”๊ฒŒ ์ด๋“์ด๋‹ˆ 10๋ถ„ ๋˜์–ด์„œ ํŒŒ์ˆ˜๊พผ์„ ๋‹ฌ๋ ค๋ณด์ž๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์€ ๋ถ€๋Œ€๋ฅผ ์ตœ๋Œ€ํ•œ ๊บผ๋‚ด์„œ ๊ฐ€์ ธ์˜จ ํ›„์— ์šฐ์ธก ๋ถ€๋Œ€ ์Šฌ๋กฏ ๋ฒ„ํŠผ ๋”๋ธ”ํด๋ฆญ ํ•ด์„œ 3~5๋ถ€๋Œ€ ์šด์šฉํ•˜๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ํŒŒ์ˆ˜๊พผ์€ ๊ฐ™์ด ์žก์•„์•ผ ๋ณ‘๋ ฅ๋„ ๋œ ์ฃฝ๊ณ  ๊ฒฝํ—˜์น˜๋„ ๊ฐ™์ด ๋จน์„ ์ˆ˜ ์žˆ์œผ๋‹ˆ ์–ด์„œ์˜ค๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
4
+ {"source":"C: ์š”๊ฑฐ ์žก๊ณ  ์Šคํƒ‘ ์ž ์‹œ ์ •๋น„ํ•˜๊ณ  ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.\nA: ShareCoord#๋นˆ ๋•…#159195.468750#530362.062500\nA: ์—ฌ๊ธฐ ์ง‘๊ฒฐํ• ๊ฒŒ์š”!!!\nA: 18๋ถ„์— ์Šคํƒ€ํŠธ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค\nA: ๋ถ€๋Œ€ ํšŒ๊ตฐํ–ˆ๋‹ค๊ฐ€ ๋‹ค์ค‘ ํŽธ์„ฑ์œผ๋กœ ์ „๋ถ€ ๋Œ๊ณ  ๋‚˜์˜ค์…”์šฉ~!\nA: ๋‹ค๊ฐ™์ด ์žก์•„์•ผ ๊ฒฝํ—˜์น˜ ํ•จ๊ป˜๋จน๊ณ  ํญ์—… ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค\nA: ์–ด์„œ์–ด์„œ~!! ์›Œ๋ฝ€์ด๋“ค ๋นจ๋ฆฌ ๊ฐ•ํ•ด์ ธ์•ผ ๋‚˜์ค‘์— ์‹ธ์šธ๋•Œ ์žฌ๋ฐŒ์Šต๋‹ˆ๋‹ค~! ใ…Žใ…Žใ…Ž\nA: ShareCoord#๋นˆ ๋•…#159809.593750#529345.625000\nA: ์ด๋™์‹œ๊ฐ„๋•Œ๋ฌธ์— 20๋ถ„์— ๋ฐ”๋กœ ์‹œ์ž‘ํ• ๊ฒŒ์š”\nA: 10์‹œ 20๋ถ„์—” ๊ทธ๋ƒฅ ๋ฐ”๋กœ ์Šคํƒ€ํŠธํ•ฉ๋‹ˆ๋‹ต\nA: ์™€์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ใ…Žใ…Ž ํ—คํ—ค\nA: ์ž ์น ๊ฒŒ์š”\nA: ๋‚จ์€๊ฑฐ ๋‹ค์žก๊ฒ ์Šต๋‹ˆ๋‹ค\nD: ๊ฐ€๋‚˜์š”?\nA: gogo!!!!\nA: ShareCoord#๋นˆ ๋•…#159518.640625#532655.562500\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#์—ํ…Œ๋ฅด ์›์„#160185.000000#531312.000000\nA: ํŒŒ์ˆ˜๊พผ์žก๊ณ ๋–จ์–ด์ง„\nA: ์—ํ…Œ๋ฅด ์›์„ ์ด๊ฑฐ ์‹œ๊ฐ„์ œ ๋ฒ„ํ”„๊ฐ™์œผ๋‹ˆ๊นŒ\nB: ๊ฑด๋ฌผ ์ง“๊ฑฐ๋‚˜ ๋ณ‘์‚ฌ ๋ฝ‘์„ ๋•Œ ์†๋ชจ์–‘ ๋„์›€ ๋ˆ„๋ฅด๋ฉด ์‹œ๊ฐ„์ด ๋นจ๋ผ์ง‘๋‹ˆ๋‹ค.\nA: ํ•„์š”ํ•˜์‹ ๊ฑฐ ๋น ๋ฅด๊ฒŒ ์ฑ„์ง‘ํ•ด์„œ ๋ฒ„ํ”„๋ฐ›์œผ์„ธ์š”\nB: ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•ด์ฃผ์„ธ์š”\nC: ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ ์ˆ˜๊ณ ํ•˜์…จ์Šต๋‹ˆ๋‹ค.\nC: ์—ํ…Œ๋ฅด ์›์„ ํ™œ์šฉํ•ด์ฃผ์„ธ์š”! ์ธํ„ฐ๋ ‰์…˜ํ•˜๋ฉด ๋ฒ„ํ”„ ๋ฐ›์Šต๋‹ˆ๋‹ค.\nC: ๋‹ค์‹œ ๊ธ€๋ฃจํ†ค ์‚ฌ๋ƒฅ ๊ฐ‘์‹œ๋‹น\nC: ๊ธธ๋“œ ์—ฐ๊ตฌ๋„ ์‹œ์ž‘ํ–ˆ์œผ๋‹ˆ\nC: ๊ธฐ๋ถ€ ๋งŽ์ด ํ•ด์ฃผ์„ธ์š”\nC: ๊ธฐ๋ถ€๋ฅผ ํ•˜๋ฉด ๊ฐœ์ธ ํฌ์ธํŠธ๋„ ์–ป์œผ๋‹ˆ๊นŒ ์ƒ๋ถ€์ƒ์กฐ\nA: ์˜คํ›„์ฏค์ด๋ฉด ์Ÿํ•œ๏ฟฝ๏ฟฝ๏ฟฝ ํ•˜์ง€ ์‹ถ๋„ค์š”\nC: ๋‹ค๋“ค ๋ฐœํ• ๋ผ ๊ฐ€์ž\nA: ์„ฑ์žฅ ํ˜ธ๋‹ค๋‹ฅ ํ•ด์ฃผ์‹œ๊ตฌ ๊ถ๊ธˆํ•œ๊ฑฐ ๋ฌผ์–ด๋ด์ฃผ์„ธ์š”\nA: ๋„์›€ ํŒŒํŒŒํŒ ๋ˆŒ๋Ÿฌ์ฃผ์‹œ๊ตฌ์šฅ~!\nA: ๊ฐ•ํ•œ ์›Œ๋ฝ€์ด๊ฐ€ ๋˜์ž!!!!!!!\nB: ๊ธธ๋“œ ์š”์ƒˆ 60% ๋„˜์—ˆ๋„ค์š”.\nB: ์˜์ฐจ ์˜์ฐจ ์˜์ฐจ~\nA: ์–ด์šฐ ๋‹ฌ์—‰ ๋„์›€ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค\nC: ใ…Žใ…Žใ…Žใ…Ž\nD: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋„์›€๋„์›€~!!! ์ตœ๊ณ ~!!!!!!\nA: ๊ฐ€์† ์•„๋ผ์ง€ ๋งˆ์‹œ๊ตฌ ํŒํŒ์“ฐ์„ธ์š”\nA: ๋งˆ์ง€๋ง‰๋‚ ๊นŒ์ง€ ๊ฐ€์† ๋‹ค ์“ฐ์ง€๋„ ๋ชปํ•ฉ๋‹ˆ๋‹ค ใ„ทใ„ท\nA: ๋„์›€-๊ฐ€์†-๋„์›€-๊ฐ€์†\nA: ์˜-์ฐจ-์˜-์ฐจ\nC: (ํก์กฑ)\nC: ์‹œํƒ€๋ธ ์š”์ƒˆ ์™„์„ฑ๊นŒ์ง€ 3๋ถ„๋‚จ์•˜์Šต๋‹ˆ๋‹ค.\nA: ์›Œ๋ฝ€์ด ์—ฌ๋Ÿฌ๋ถ„๋“ค\nA: ๊ถ๊ธˆํ•˜๊ฑฐ๋‚˜ ๋ง‰ํžˆ๋Š” ๊ฒƒ๋“ค ๋ง์”€์ฃผ์‹œ๋ฉด\nA: ๋‹ค ์•Œ๋ ค๋“œ๋ฆฌ๊ณ  ๋‹ค ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค","generated":"* A์ด ํŒŒ์ˆ˜๊พผ์„ ์žก๊ณ  ๋–จ์–ด์ง„ ์—ํ…Œ๋ฅด ์›์„์€ ์‹œ๊ฐ„์ œ ๋ฒ„ํ”„๋กœ ๊ฑด๋ฌผ ์ง“๊ฑฐ๋‚˜ ๋ณ‘์‚ฌ ๋ฝ‘์„ ๋•Œ ์† ๋ชจ์–‘ ๋„์›€์„ ๋ˆ„๋ฅด๋ฉด ์‹œ๊ฐ„์ด ๋นจ๋ผ์ง€๋‹ˆ ํ•„์š”ํ•œ ๊ฒƒ์„ ๋น ๋ฅด๊ฒŒ ์ฑ„์ง‘ํ•ด ๋ฒ„ํ”„๋ฐ›์œผ๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* C๊ฐ€ ๊ธธ๋“œ ์—ฐ๊ตฌ์— ๊ธฐ๋ถ€ํ•˜๋ฉด ๊ฐœ์ธ ํฌ์ธํŠธ๋„ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋‹ˆ ๊ธฐ๋ถ€๋ฅผ ๋งŽ์ด ํ•ด๋‹ฌ๋ผ๊ณ  ๋‹น๋ถ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ์˜คํ›„ ์ฏค ์Ÿ ํ•œ๋ฒˆ ํ•˜์ง€ ์‹ถ์œผ๋‹ˆ ์„ฑ์žฅ ํ˜ธ๋‹ค๋‹ฅ ํ•˜๊ณ  ๊ถ๊ธˆํ•œ ๊ฒƒ ๋ฌผ์–ด๋ด๋‹ฌ๋ผ๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* B์ด ๊ธธ๋“œ ์š”์ƒˆ๊ฐ€ 60%๋ฅผ ๋„˜์—ˆ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* A์ด ๊ถ๊ธˆํ•˜๊ฑฐ๋‚˜ ๋ง‰ํžˆ๋Š” ๊ฒƒ๋“ค ๋ง์”€ ์ฃผ์‹œ๋ฉด ๋‹ค ์•Œ๋ ค๋“œ๋ฆฌ๊ณ  ๋„์™€๋“œ๋ฆฐ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
5
+ {"source":"E: ์—ฌ๋Ÿฌ๋ถ„!\nE: ๊ธ€๋ฃจํ†ค ๋งŽ์ด ์žก์•„์ฃผ์„ธ์š”!\nE: ๊ธธ๋“œ ์ž์›์ด ๋ถ€ ์กฑํ•ฉ๋‹ˆ๋‹ค.\nE: ์ –๊ณผ๊ฟ€์ด ํ๋ฅด๋Š” ์‹œํƒ€๋ธ์ด ๋˜๊ณ  ์‹ถ๋‹ค?\nE: ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด ์žก์•„์ฃผ์‹ญ์‡ผ\nG: ๊ฐ€๋‚œํ•œ ์‹œํƒ€๋ธ์€ ์ž‡์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค\nE: ํ”Œ๋ฆฌ์ฆˆ\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๋ฏธ์…˜ ๊ฑธ๋ฆฌ์‹ ๋ถ„?\nB: ๊ฐ™์ด ์žก์œผ์‹ค๋ถ„ ๊ณ„์‹ ๊ฐ€์š” ~\nF: ์ขŒํ‘œ์ฃผ์„ธ์š”\nB: ShareCoord#๋นˆ ๋•…#155225.781250#534257.875000\nB: ShareCoord#๋นˆ ๋•…#154322.968750#533640.812500\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ์žก์œผ์‹ค๋ถ„ ๊ตฌํ•ด์š”~~~\nB: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nB: ์ถœ๋ฐœํ•ฉ๋‹ˆ๋‹ค\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ~!\nB: ๋‚˜์ด์Šค์ปค๋น„๋‹˜\nF: ใ…Žใ…Ž\nB: ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ท\nF: ๊ตฟ๊ตฟ\nB: ๋„ˆ๋ฅธ ๋ฒŒํŒ์„ ๋›ฐ๋†€๊ณ  ์‹ถ์€\nC: ๊ธธ๋“œ ์—ฐ๊ตฌ ๊ธฐ๋ถ€ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.\nB: ์›Œ๋ณด์ด์˜ ๋งˆ์Œ์„ ์•„๋Š”๊ฐ€\nC: ๊ธธ๋“œ ๋ณด์‹œ๋ฉด ๊ธธ๋“œ ์—ฐ๊ตฌ ์žˆ์–ด์š”~\nB: ShareCoord#1๋ ˆ๋ฒจ ๋“œ๋ž˜๊ณค ๋ ˆ์–ด#138900.000000#535800.000000\nB: ์ „์ดˆ๊ธฐ์ง€ ์ง“๊ณ  ๋“œ๋ž˜๊ณค ์•Œ ํ•œ๋ฒˆ ์–ป์–ด๋ณผ๊ฒŒ์š”~~\nB: ์•Œ์„ ์–ป๊ณ  ์ข€ ํ‚ค์›Œ์•ผ ์šฉ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค~~~!!\nC: ์ „์ดˆ๊ธฐ์ง€ ๊ฑด์„ค์— ๊ธฐ์‚ฌ๋‹จ 1๊ฐœ์”ฉ ์ง€์› ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค\nE: ๊ธ€๋ฃจํ†ค๋„ ๋งŽ์ด ์žก์•„์ฃผ์„ธ์š”\nE: ๊ธธ๋“œ ์ž์›์„ ๋ชจ์•„์•ผ ํ•ฉ๋‹ˆ๋‹น\nB: ๋„ต ์ž„๋ชจํƒ„๋‹˜\nB: ์ถฉ์„ฑ์ถฉ์„ฑ์ถฉ์„ฑ XXXXXXXXXXX\nE: ๊ฐ€์ฆˆ์•„ ๋ฐœํ• ๋ผ!\nB: ํ™œ๋™ ํฌ์ธํŠธ ์–ป๋Š” ๋ฒ•\nB: 1. ๊ธธ๋“œ > ์—ฐ๊ตฌ > ๊ธธ๋“œ ์—ฐ๊ตฌ ๊ธฐ๋ถ€(์ถ”์ฒœ ์—ฐ๊ตฌ)\nB: 2. ๊ธธ๋“œ ๋ณธ๋ถ€ > ๋„์›€\nB: ํ™œ๋™ ํฌ์ธํŠธ ์‚ฌ์šฉ๋ฒ•\nB: 1. ๊ธธ๋“œ > ๊ธธ๋“œ ์ƒ์  > ์ƒํ’ˆ ๊ตฌ๋งค(๊ธฐ์‚ฌ ๊ฒฝํ—˜์น˜ ๊ฟ€)\nB: ๊ธฐ์‚ฌ๋ ™์—…๋นจ๋ฆฌํ•ด์„œ ๊ดด์Žˆ ๋ชฌ์Šคํ„ฐ ๋นจ๋ฆฌ์žก๊ณ  ๊ณ ์„ฑ์žฅํ•ฉ์„ธ๋‹ค~!!\nB: ๋„์›€ ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค ํ—คํ—ค\nB: ๊ณง ์‹์‚ฌ์‹œ๊ฐ„์ผํ…๋ฐ์š”\nB: ์ „์ดˆ๊ธฐ์ง€์— ๊ธฐ์‚ฌ๋‹จ 1๊ฐœ ๋„ฃ์–ด์ฃผ์‹œ๊ตฌ\nB: ๋‚˜๋จธ์ง€ ๋ถ€๋Œ€๋Š” 2๋ ˆ๋ฒจ ์ด์ƒ ์ฑ„์ง‘์ง€์— ๋„ฃ์–ด๋‘์‹  ํ›„ ์‹์‚ฌ๋‹ค๋…€์˜ค์‹œ๋ฉด\nB: ์ด๋ฒคํŠธ๋„ ํด๋ฆฌ์–ด๋˜๊ณ  ์ž์›๋„ ๋นต๋นตํ•ด์ง‘๋‹ˆ๋‹ค~~~\nB: ์•ˆ๋…•ํ•˜์„ธ์š”~~~~\nB: ๊ถ๊ธˆํ•œ๊ฑฐ ๋ฌผ์–ด๋ณด์…”์š”~~~ ๋‹ค ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค ~~\nD: ์šฐ๋ฆฌ ์ „์ดˆ๊ธฐ์ง€ 3๋ถ„ ๋‚จ์•˜๋Š”๋ฐ.\nA: ์ •์ฐฐ์€ ์–ด๋–ป๊ฒŒ ๋ณด๋‚ด๋‚˜์š”\nD: ์ข€๋งŒ ๋” ์˜์ฐจ์˜์ฐจ ํ•ด์ฃผ์‹œ๋ฉด\nD: ๋‚จ์˜ ์„ฑ ์ฐ๊ณ  ์ •์ฐฐ ๋ฒ„ํŠผ ํด๋ฆญํ•˜๋‹ˆ๊นŒ ์ •์ฐฐ ๋ณด๋‚ด์กŒ์–ด์š”.","generated":"* E๊ฐ€ ์ –๊ณผ๊ฟ€์ด ํ๋ฅด๋Š” ์‹œํƒ€๋ธ์ด ๋˜๊ณ  ์‹ถ๋‹ค๋ฉด ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด ์žก์•„๋‹ฌ๋ผ๊ณ  ๋ถ€ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.\n* B์ด ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๋ฏธ์…˜ ๊ฑธ๋ฆฌ์‹  ๋ถ„ ๊ฐ™์ด ์žก์ž๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* B์ด ์ „์ดˆ๊ธฐ์ง€๋ฅผ ์ง“๊ณ  ๋“œ๋ž˜๊ณค ์•Œ์„ ์–ป์–ด๋ณด๊ฒ ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* E๊ฐ€ ๊ธธ๋“œ ์ž์›์„ ๋ชจ์•„์•ผ ํ•œ๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* B ํ™œ๋™ ํฌ์ธํŠธ ์–ป๋Š” ๋ฒ•๊ณผ ํ™œ๋™ ํฌ์ธํŠธ ์‚ฌ์šฉ๋ฒ•์„ ๊ณต์œ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
6
+ {"source":"D: ๋ณดํ˜ธ๋ง‰์žˆ๋Š” ์„ฑ ๋นผ๊ณ \nD: ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ฑ„์ง‘ํ•˜๋Š” ๊ณณ๋„ ์ •์ฐฐ ๋˜๋˜๋ฐ์š”.\nC: ์ •์ฐฐ์€ ๋งˆ์„์ด๋‚˜ ํƒ€์ธ์˜ ์„ฑ์„ ๋ˆ„๋ฅด๋ฉด ์ •์ฐฐ ๋˜๋Š”๋ฐ์š”\nE: ์ •์ฐฐ์€ ์˜์ง€์— ์ •์ฐฐ ๋ณธ๋ถ€๋ฅผ ๊ฑด์„คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค\nC: ShareCoord#2๋ ˆ๋ฒจ ํ”„๋กœ์ž” ๋งˆ์„#148500.000000#524397.000000\nC: ๋ฏธ์…˜๋–„๋•Œ๋ฌธ์ด์‹œ๋ฉด ์ด ๋งˆ์„์ด๋‚˜\nA: ์˜ค ํŒŒ๋ž‘์ƒˆ ๋‚ ์•„๊ฐ„๋‹ค\nC: ์˜ค ๋˜์…จ๋‚˜์š”\nA: ใ…Žใ…Ž ๋„ค ใ„ณใ„ณ\nD: ํŒŒ๋ž‘์ƒˆ...\nC: ์‚ฌ์‹ค๋ถ€์—‰์ด์ž…๋‹ˆ๋‹ค ใ…‹ใ…‹\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋”ฑ๋‹ค๊ตฌ๋ฆฌ๋ผ๊ณ  ํ•ด๋„ ๋ฏฟ์„ ๏ฟฝ๏ฟฝ๋‘ฅ์ด ๊ธธ์ด์ง€๋งŒ ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋‹ˆ ๋ฏฟ๊ฒ ์Šต๋‹ˆ๋‹ค\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ํŒ๊ฒŒ์•„. ๊ทธ ์‹ ๋น„์˜ ์„ธ๊ณ„\nC: ์ž…์ด ์‚์ฃฝ ๋‚˜์™”์–ด์š”\nC: ์‚์ง„๋“ฏ์š”\nA: ๋‹ค์‹  ์ •์ฐฐ ์•ˆ๋ณด๋‚ผ๊ฒŒ ใ„ท\nE: ใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Ž\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nD: ๋\nC: ๊ตฐ์ฃผ๋‹˜ ๊ณ„์‹ ๊ฐ€์š”\nC: ์ŠคํŠธ๋ฆฟ ๊ตฐ์ฃผ๋‹˜\nC: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nC: ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์˜ค์‹œ๋ฉด ๊ธ€๋ค๋ฃจํ†ค ๊ณต๊ฒฉ๋Œ€ ์ง‘๊ฒฐ์ด ํ›จ์”ฌ ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค!!\nC: ์ด์ชฝ์œผ๋กœ ์˜ค์…”์šฉ~!\nC: ์ขŒํ‘œ ๊ทผ์ฒ˜ ๋•…์„ ๊พธ์šฑ ๋ˆ„๋ฅด์‹  ํ›„ ์„ฑ ์ด๋™์„ ๋ˆ„๋ฅด์‹œ๋ฉด ์„ฑ์ด ์ด์ „๋ฉ๋‹ˆ๋‹ค\nE: 1์‹œ์— ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ ๊ฐ‘๋‹ˆ๋‹ค.\nE: ํƒœ๊ทธ ๋‹ฌ์•˜์œผ๋‹ˆ ์œ„์น˜ ํ™•์ธํ•ด์ฃผ์„ธ์š” :)\nB: ์•ˆ๋…•ํ•˜์„ธ์š” ๊ฐ€์ž…์ธ์‚ฌ ๋“œ๋ฆฝ๋‹ˆ๋‹ค\nE: ์•ˆ๋…•ํ•˜์„ธ์š”! ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹น :)\nC: ์•ˆ๋…•ํ•˜์„ธ์š”~~~\nC: 1์‹œ ํŒŒ์ˆ˜๊พผ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค\nE: ์˜ˆ์••\nF: ์•ˆ๋…•ํ•˜์„ธ์š” ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค\nC: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nE: ์–ด์„œ์˜ค์‹ญ์‡ผ!\nE: ์ –๊ณผ ๊ฟ€์ด ํ๋ฅด๋Š” ์‹œํƒ€๋ธ์ž…๋‹ˆ๋‹ค.\nE: ๊ธธ๋“œ ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์„ฑ ์ด๋™ ํ•ด์ฃผ์„ธ์š”! :)\nD: ๋„ค~~~~~~\nC: ShareCoord#[STOP]์ „์ดˆ๊ธฐ์ง€#-152190.000000#492921.000000\nC: ์šฐ๋ฆฌ๋„ ์š”๊ธฐ์ฒ˜๋Ÿผ ์‹ธ์šฐ๊ฒŒ ๋ ์ง€๋„ ๋ชฐ๋ผ์š” ใ…Žใ…Ž\nC: ๋ฌด๋ ฅ์„ ๋นจ๋ฆฌ ํ‚ค์›Œ๋‘ก์‹œ๋‹ค~!!\nC: ํ—!\nC: ์ปค๋น„๋‹˜ ํ˜น์‹œ ์‹œํƒ€๋ธ ๋ฒ„๋ฆฌ๊ณ  ๊ฐ€์‹œ๋‚˜์š”\nC: ShareCoord#[CITA]๋‚จ์˜์ปค๋น„#-155181.000000#523710.000000\nC: ๋“ค์ผฐ์œผ๋‹ˆ ๋ณต๊ท€ํ•˜์‹œ์ฃต\nC: ใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Ž\nC: ํƒ์‚ฌ ์ค€๋น„ํ•˜์„ธ์š”\nC: ShareCoord#[INSD]๋ถˆ์•ˆ์ด#-148347.000000#488490.000000","generated":"* D์ด ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ฑ„์ง‘ํ•˜๋Š” ๊ณณ๋„ ์ •์ฐฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.\n* C์ด ์‹ธ์šฐ๊ฒŒ ๋ ์ง€ ๋ชจ๋ฅด๋‹ˆ ๋ฌด๋ ฅ์„ ๋นจ๋ฆฌ ํ‚ค์›Œ๋‘์ž๊ณ  ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.\n* C์ด ์ปค๋น„๋‹˜์—๊ฒŒ ์‹œํƒ€๋ธ ๋ฒ„๋ฆฌ๊ณ  ๊ฐ€์‹œ๋ƒ๋ฉฐ ๋“ค์ผฐ์œผ๋‹ˆ ๋ณต๊ท€ํ•˜๋ผ๊ณ  ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.\n* C์ด ํƒ์‚ฌ๋ฅผ ์ค€๋น„ํ•˜์ž๊ณ  ํ•˜์˜€์Šต๋‹ˆ๋‹ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"generated":"* D๊ธธ๋“œ์žฅ์ด ๊ธธ๋“œ์—ฐ๊ตฌ ํƒœ๊ทธ๋ฅผ ์ถ”์ฒœํ•˜๋ฉฐ, C๊ฐ€ ์ž…์žฅํ–ˆ์„ ๋•Œ ์œ„์น˜๋ฅผ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.\n* A๊ฐ€ D๊ธธ๋“œ์žฅ์—๊ฒŒ ์ธ์‚ฌ๋ฅผ ๊ฑด๋„ค๊ณ , C๊ฐ€ ๊ธธ๋“œ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.\n* D๊ธธ๋“œ์žฅ์€ ๊ธ€๋ฃจํ†ค ์žก๋Š”๊ฒƒ์„ ๋…๋ คํ•˜๋ฉฐ ์‹œํƒ€๋ธ ์ž์› ํ’๋ถ€ํ•ด์ง์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.\n* B, D ๊ณต๊ฒฉ๋Œ€ ์ฐธ์—ฌ๋ฅผ ๋…๋ คํ•˜๊ณ  ํ€˜์ŠคํŠธ ์ง„ํ–‰์„ ๊ถŒ์žฅํ•˜๋ฉฐ ์˜์ฃผ ๋ช…์˜ˆ ๋ณด์ƒ์˜ ์กด์žฌ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.\n* A๋Š” ๊ธ€๋ฃจํ†ค์— ๊ฐ€์†์ด ํ’๋ถ€ํ•˜๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ•˜๊ณ , ๋ถ€๋Œ€ ์ด๋™์„ ์•ˆ๋‚ดํ•˜๋ฉฐ, ์›œ๋ณด์ด์™€ ๊ทธ์˜ ์นœ๊ตฌ๋ฅผ ๋งž์ดํ•ฉ๋‹ˆ๋‹ค.\n* D๊ธธ๋“œ์žฅ์€ 10๋ ˆ๋ฒจ ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ(๋žฉํ† ๋ฃจ์Šค)๋ฅผ ํ•จ๊ป˜ ์žก์„ ๊ฒƒ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"D: ๊ธธ๋“œ์—ฐ๊ตฌ ์ถ”์ฒœ ๋งˆํฌ ํ–ˆ์œผ๋‹ˆ\nD: ๋‹ค๋“ค ๋ˆ„๋ฅด๊ฑฐ๋ผ\nC: ๊ธธ๋“œ์žฅ๋‹˜ ์œ„์น˜๊ฐ€ ์–ด๋””์—์š”?\nC: ์ฐพ์•˜์Œ...\nD: ๋„ค ๊ธธ๋“œ ํƒœ๊ทธ ๋ถ™์˜€์œผ๋‹ˆ๊นŒ\nD: ๊ทธ์ชฝ์œผ๋กœ ์„ฑ ์ด๋™ ํ•˜์‹œ๊ณ \nD: ๋ณ‘๋ ฅ ๋„ฃ์œผ๋ผ๊ณ  ๋‚ ๋ž˜๋‚ ๋ž˜\nA: ํ•˜์ดํ•˜์ด\nD: ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค ์–ด์„œ์˜ค์‹ญ์…”\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์ด๋ฆฌ๋กœ ์ด๋™ํ•ด์„œ ์˜ค์„ธ์š”\nC: ๋„ค๋„ค~~~~\nA: ํ•จ๊ป˜ํ•ด์š” ~\nC: ์˜ฎ๊ฒผ์Šต๋‹ˆ๋‹ค.\nA: โ™ก\nD: ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด๋งŽ์ด ์žก์•„์ฃผ์…”์•ผ\nD: ์‹œํƒ€๋ธ์˜ ์ž์›์ด ๋„˜์ณ๋‚ฉ๋‹ˆ๋‹ค.\nA: ์ž„๋ชจํƒ„๋‹˜\nA: ๋งํˆฌ๊ฐ€ ๋„ˆ๋ฌด ๋‚˜๊ธ‹๋‚˜๊ธ‹ํ•˜์‹ญ๋‹ˆ๋‹ค\nA: ์ฒดํ†ต์„ ์ง€์ผœ์ฃผ์‹œ์˜ต์†Œ์„œ\nD: ๋„ˆ๋‚˜ ์ž˜ํ•ด\nA: ์ผ€์ผ˜\nD: ์‚ฌ๋žŒ ์ข€ ๋ชจ์ด๋ฉด\nD: ํŒŒ์ˆ˜๊พผ ์žก์•„์—ฌ\nB: ๊ณต๊ฒฉ๋Œ€ ์ฐธ์—ฌ ๋งŽ์ด ํ•ด์ฃผ์„ธ์š”~!\nD: ๊ณต๊ฒฉ๋Œ€ ์ฐธ์—ฌํ•ด์•ผ ์ •๋ง ์ž์› ๋„˜์น˜๋Š” ์‹œํƒ€๋ธ์ด ๋ฉ๋‹ˆ๋‹ค.\nD: ์—ฌ๋Ÿฌ๋ถ„ ํ€˜์ŠคํŠธํ•˜์‹œ๋ฉด์„œ ์ง„ํ–‰ํ•˜์‹œ๋ฉด ๋น ๋ฅธ ์„ฑ์žฅ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.\nD: ์—ฌ๋Ÿฌ๋ถ„ ์˜์ฃผ ๋ช…์˜ˆ ๋ณด์ƒ๋„ ๋ฐ›์œผ์„ธ์š”~\nD: ์˜์ฃผ ๋ช…์˜ˆ ๋ ˆ๋ฒจ 3 ๋˜์„œ ๋ณด์ƒ ๋ฐ›์œผ๋ฉด\nA: ๋„ต ์ž„๋ชจํƒ„๋‹˜!!\nD: ๊ฑด์„ค์Šฌ๋กฏ 2๊ฐœ!\nA: ๊ธ€๋ฃจํ†ค์— ๊ฐ€์†์ด ๊ทธ๋“๊ทธ๋“ ๋“ค์—ˆ๋„ค์š”\nA: ํ•œ๋ถ€๋Œ€์”ฉ ๊ณต๊ฒฉ๋Œ€ ๊ฑธ์–ด์ฃผ์„ธ์š”\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ณด์ด\nD: ์–ด์„œ์˜ค์„ธ์š”~ :)\nA: ์นœ์ ˆํ•œ ๋ชจํƒ„์”จ\nD: ์˜ˆ์“ฐ\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ฝ€์ด\nD: ๊ธธ๋“œ ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์ด๋™ํ•ด ์˜ค์„ธ์š”\nA: ๊ธ€๋ฃจํ†ค ์ฐธ์—ฌํ•˜์‹œ๋ฉด\nA: ์‹œํƒ€๋ธ๋„ ํ’์กฑํ•ด์ง€๊ณ  ๊ฐ€์†๋„ ๋„˜์ณํ๋ฅด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค์š”\nA: 10๋ ™์งœ๋ฆฌ\nA: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๊ฐ™์ด ์žก์œผ์‹ค๋ถ„\nA: ์—ฌ๊ธฐ๋กœ ๋ถ€๋Œ€ ์ด๋™ํ•ด์ฃผ์„ธ์š” ์ตœ๋Œ€ํ•œ\nA: ShareCoord#๋นˆ ๋•…#145926.609375#523510.437500\nD: ์–ด์„œ์˜ค์„ธ์š”\nA: ์ปค๋ชฌ์ปค๋ชฌ ์›Œ๋ณด์ด\nD: 10๋ ˆ[๋ฒจ ์ •์˜ˆ๋ชน ๊ฐ™์ด ์žก์•„์šฉ\nA: 10๋ ˆ๋ฒจ ๊ฐ•ํ™”๋žฉํ† ๋ฃจ์Šค ํ•œ๋ฒˆ ๋ชจ์—ฌ์„œ ์žก์•„๋ณผ๊นŒ์š”","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
2
+ {"generated":"* A๋Š” ๊ธธ๋“œ๊ฐ€ ๊ฑด์„คํ•˜๋Š” ์‹œํƒ€๋ธ ๊ธฐ๋… ์ฒซ ๊ฑด์ถ•๋ฌผ์ธ ๊ธธ๋“œ ์š”์ƒˆ๋กœ ๋ถ€๋Œ€๋ฅผ ์ด๋™์‹œ์ผœ๋‹ฌ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. \n* ์‹œํƒ€๋ธ ๊ฑด์„ค์„ ์ถ•ํ•˜ํ•˜๋ฉฐ ๋ชจ๋‘๊ฐ€ ์ฐธ์—ฌํ•˜๋„๋ก ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค.\n* ๋‹ค์Œ์œผ๋กœ ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ๋ฅผ ์•Œ๋ ค์ฃผ๊ณ  \n * ์ฐธ์—ฌ ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•˜๋ฉฐ ๋” ์‰ฝ๊ฒŒ ์ดํ•ดํ•˜๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.\n* ๋žŒ์ธต์ด ์ข€ ๋‚ฏ์„ค์–ด ๋ณด์ด๋Š” D์—๊ฒŒ ์ด๋ฒคํŠธ ์ง€์—ญ์œผ๋กœ ์ด๋™ํ•˜๋ ค๋ฉด ๊นƒ๋ฐœ ์˜†์œผ๋กœ ๋ถ€๋Œ€๋ฅผ ๋ณด๋‚ด๋„๋ก ์•ˆ๋‚ดํ•˜๊ณ , \n* ๊ฒฝํ—˜์น˜๋ฅผ ๋น ๋ฅด๊ฒŒ ์–ป๊ธฐ ์œ„ํ•ด ํŒŒ์ˆ˜๊พผ์„ ์žก์œผ๋ฉด ์ด๋ฒคํŠธ ํšจ์œจ์ด ๋†’์•„์ง„๋‹ค๊ณ  ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"A: ๋ถ€๋Œ€ ๋‚จ์œผ์‹œ๋Š”๋ถ„\nA: 8282\nA: ์ž ์ณ๋ณผ๊ฒŒ์š”\nA: ํ™”์ดํŒ…\nA: ๋ฐœํ• ๋ผ๋กœ~!!!\nA: ๊ธฐ์–ตํ•ด์ค˜!!! ์ •์˜ˆ 10๋ ˆ๋ฒจ ์žก์•˜๋‹ค๊ตฌ!!!\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋‚จ๋Š” ๋ถ€๋Œ€ 1๊ฐœ์”ฉ\nA: ๊ธธ๋“œ ์š”์ƒˆ ๊ฑด์„ค์— ๋„ฃ์–ด์ฃผ์„ธ์š”!\nA: ์šฐ๋ฆฌ์˜ ์‹œํƒ€๋ธ์„ ๊ฑด์„คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค!\nC: ์›ฐ์ปด ํˆฌ ์‹œํƒ€๋ธ\nA: ํ—ฌ๋กœ์šฐ ์›Œ๋ฝ€์ด\nC: ๋ชจ๋‘ 1๋ถ€๋Œ€๋Š” ๊ธธ๋“œ ์š”์ƒˆ์— ๋„ฃ์–ด๋‹ฌ๋ผ์ž…๋‹ˆ๋‹ค.\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nC: ์‹œํƒ€๋ธ์˜ ์˜ํ† ๊ฐ€ ์‹œ๊ธ‰ํ•˜๋‹ค์˜ˆ์š”\nA: ์ด๋ฆฌ๋กœ ์˜ค์„ธ์š” ์–ด์„œ~!\nB: ์šฐ๋ฆฌ ๊ธธ๋“œ์š”์ƒˆ ๋นจ๋ฆฌ ๊ฑด์„ค ํ• ๋ ค๋ฉด ๋ณ‘๋ ฅ์„ ๋” ๋งŽ์ด ๋„ฃ์–ด์•ผ ํ•˜๋‹ˆ๊นŒ ์–ด์„œ์–ด์„œ ์ง€์›ํ•ด์ฃผ์„ธ์š”~ ^^\nA: ๋นจ๊ฐ„๋‚ด๋ณต์ตœ๊ณ ~!!\nC: ์–ด์„œ์˜ค์„ธ์š”~\nC: ใ…Žใ…Žใ…Žใ…Ž\nA: ์šฐ์™€\nC: ์›Œํ„ฐ๋ณด์ด๋‹˜ ์–ด์„œ์˜ค์‹ญ์…”\nA: ์›Œ(ํ„ฐ)๋ณด์ด ์˜ค์…จ๋‹ค\nC: ๋‹ค๋“ค ๋‚ ๋ž˜๋‚ ๋ž˜ ์—ฐ๋งค์šฉ์ƒˆ์— ๊ธฐ์‚ฌ๋‹จ ํ•˜๋‚˜์”ฉ ๋„ฃ์œผ๋ผ์šฐ\nA: ์ž ์šฐ๋ฆฌ ์ด์ œ\nA: ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ\nA: ๋‹ฌ๋ ค์•ผํ•ฉ๋‹ˆ๋‹ค\nA: ํ† ์ŠคํŠธ๋‹˜ ์–ด์„œ์˜ค์„ธ์š”~ ใ…Žใ…Ž\nA: ShareCoord#๋นˆ ๋•…#155862.171875#524460.062500\nA: ์—ฌ๊ธฐ๋กœ ๋ณ‘๋ ฅ์ง‘๊ฒฐํ•ด์ฃผ์„ธ์š”\nA: ์˜ค๋ฅธ์ชฝ์— ๋ณ‘๋ ฅ์Šฌ๋กฏ ๋‘๋ฒˆ๋ˆ„๋ฅด๋ฉด\nA: ์ „์ฒด์„ ํƒํ•ด์„œ ์—ฌ๋Ÿฌ ๊ธฐ์‚ฌ๋‹จ์„ ํ•œ๋ฒˆ์— ์šด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค\nA: ์ž์ž ์š”์ƒˆ ์˜ค๋ฅธ์ชฝ์—\nA: ๋ชจ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค\nA: ๊ฒฝํ—˜์น˜ 3๋ฐฐ ์ด๋ฒคํŠธ\nD: ๊ณ„์† ์ด๋™ํ•  ์ˆ˜ ์—†๋Š” ์œ„์น˜๋ผ๊ณ  ํ•˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ๋˜๋‚˜์š”?\nA: ์ฐธ์—ฌํ•ด์„œ ๊ธฐ์‚ฌ ๋ ˆ๋ฒจ ์ญ‰์ญ‰ ์˜ฌ๋ฆฌ์…”์š”\nA: ๊ทธ ์„ฑ์„ ์ด๋ฆฌ์ €๋ฆฌ์˜ฎ๊ธฐ๋ฉด\nA: ์ดˆ๋ก์ƒ‰์œผ๋กœ ์„ฑ ๋ฐ”๋‹ฅ์ด ๋ฐ”๋€Œ๋Š”๋ฐ\nA: ๊ทธ๋•Œ ์ด๋™๋ˆ„๋ฅด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค\nA: ์ž˜ ์•ˆ๋˜๋ฉด 2~3๋ฒˆ์ •๋„ ํŠธ๋ผ์ดํ•ด์ฃผ์‹œ๋ฉด ์ด๋™๋ ๊ฑฐ์—์š”\nA: ShareCoord#๋นˆ ๋•…#156695.468750#525127.875000\nD: ์•„...๊นƒ๋ฐœ์ด ๋– ์žˆ์–ด์„œ ๋ˆŒ๋Ÿฌ๋ดค๋Š”๋ฐ ๋ชป๊ฐ„๋‹ค๊ณ  ํ•ด์„œ ใ…Žใ…Ž;\nA: ์•„~~\nA: ๊นƒ๋ฐœ๋– ์žˆ๋Š”๊ณณ์œผ๋กœ ๊ธฐ์‚ฌ๋‹จ(๋ถ€๋Œ€)๋Š” ์ด๋™ํ•  ์ˆ˜ ์žˆ์–ด์š”!\nA: ๋ถ€๋Œ€ ์—ฌ๋Ÿฌ๊ฐœ ๋ณด๋‚ด์„œ ์š”๊ธฐ ์˜†์— ํŒŒ์ˆ˜๊พผ ์žก์œผ๋ฉด\nA: ๊ฒฝํ—˜์น˜ ํญํƒ„ ํŒกํŒก ํ„ฐ์ง‘๋‹ˆ๋‹ค\nA: ์–ด์„œ์˜ค์„ธ์š”\nD: ์žฅ์• ๋ฌผ์ด ์žˆ์–ด์„œ ๋ชป๊ฐ„๋‹ค๊ณ  ํ•˜๋„ค์š”..\nA: ShareCoord#๋นˆ ๋•…#156578.359375#523799.718750","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
3
+ {"generated":"* A๋Š” ๊ธธ๋“œ์—์„œ ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ ์ด๋ฒคํŠธ๋ฅผ 10์‹œ 10๋ถ„์— ์‹œ์ž‘ํ•œ๋‹ค๊ณ  ์•Œ๋ฆฌ๊ณ  ์‚ฌ๋ƒฅ ์žฅ์†Œ ์œ„์น˜ ์ •๋ณด๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. \n* ์ดˆ๋ณด์ž๋ฅผ ์œ„ํ•œ ์•ˆ๋‚ด๋กœ, ๊ธธ๋“œ ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์ด๋™ํ•œ ํ›„ ๋ถ€๋Œ€๋ฅผ ์ด๋™์‹œ์ผœ ๋ชฌ์Šคํ„ฐ ์œ„์— ์ปค์„œ๋ฅผ ๋†“๊ณ  ๊ณต๊ฒฉํ•˜๋Š” ์ „๋žต์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ณต๋™์ „ํˆฌ์— ์ฐธ์—ฌํ•˜๋ฉด ๊ฒฝํ—˜์น˜๊ฐ€ ๊ท ๋“ฑํ•˜๊ฒŒ ๋‚˜๋ˆ ์ง‘๋‹ˆ๋‹ค.\n* A๋Š” ํ”Œ๋ ˆ์ด์–ด๋“ค์ด ๋ถ€๋Œ€๋ฅผ ์šด์šฉํ•˜๊ณ  ํŒŒ์ˆ˜๊พผ์„ ๊ณต๊ฒฉํ•  ์ค€๋น„๋ฅผ ํ•˜๋„๋ก ์ง€์‹œํ•ฉ๋‹ˆ๋‹ค.\n* ํŒŒ์ˆ˜๊พผ ์œ„์น˜๋ฅผ ๊ณ„์† ๊ณต์œ ํ•˜๋ฉฐ ๋ถ€๋Œ€๊ฐ€ ์—†์–ด์ง€๋ฉด ํšŒ์ˆ˜ ํ›„ ๋‹ค์‹œ ์ง„ํ–‰ํ•œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"A: 10์‹œ 10๋ถ„์—\nA: ํŒŒ์ˆ˜๊พผ์‚ฌ๋ƒฅ ์‹œ์ž‘ํ• ๊ฒŒ์š”\nA: ํŒŒ์ˆ˜๊พผ์— ๊ฒฝํ—˜์น˜ ๋งŽ์ด ๋“ค์–ด์žˆ์Šต๋‹ˆ๋‹ค\nC: 10์‹œ 10๋ถ„์— ํŒŒ์ˆ˜๊พผ ์žก์Šต๋‹ˆ๋‹ค.\nC: ๋‹ค๋“ค ์˜ค์„ธ์š”\nE: ์–ด๋””์ธ๊ฐ€์š”..?\nA: ShareCoord#๋นˆ ๋•…#156509.859375#524728.187500\nA: ์š”๊ธฐ๊ฐ€ ๋ถ€๋Œ€ ์ง‘๊ฒฐ์ง€๊ณ ์š”!\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ๊ธธ๋“œ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์„ฑ ์ด๋™ํ•˜์‹  ํ›„์— ๋ถ€๋Œ€ ์˜ฎ๊ธฐ์‹œ๋ฉด ํŽธํ•ฉ๋‹ˆ๋‹ค\nD: ๊ณต๊ฒฉ๋Œ€๋ฅผ ๊ทธ๋ƒฅ ๋ณด๋‚ด๋ฉด ๋˜๋‚˜์š”?\nA: ๊ธฐ์‚ฌ๋‹จ์„ ๋ณด๋‚ด๋‘์…จ๋‹ค๊ฐ€\nA: ์‹œ์ž‘ํ•˜๋ฉด ๋ถ€๋Œ€๋ฅผ ๋Œ์–ด์„œ ๋ชฌ์Šคํ„ฐ ์œ„์— ์ปค์„œ๋ฅผ ์˜ฌ๋ ค๋‘๋ฉด\nA: ๋‹ฌ๋ ค๊ฐ€์„œ ๊ณต๊ฒฉํ•˜๋Š”๋ฐ์š”\nA: ํ•œํƒ€์ด๋ฐ์— ๊ฐ™์ด ์žก์œผ๋ฉด ์ „ํˆฌ์— ์ฐธ์—ฌํ•œ ์‚ฌ๋žŒํ•œํ…Œ ๋ณด์ƒ์„ ๊ท ๋“ฑํ•˜๊ฒŒ ์ค˜์„œ\nA: ๋‹ค๊ฐ™์ด ๋ชฐ์•„์žก๋Š”๊ฒŒ ์ด๋“์ž…๋‹ˆ๋‹ค\nA: 10๋ถ„๋˜์–ด์„œ ํŒŒ์ˆ˜๊พผ ํ•œ๋ฒˆ ๋‹ฌ๋ ค๋ณผ๊ฒŒ์š”~!\nA: ๋ถ€๋Œ€ ์ตœ๋Œ€ํ•œ ๊บผ๋‚ด์„œ ๊ฐ€์ ธ์˜ค์‹  ํ›„์—\nA: ์šฐ์ธก ๋ถ€๋Œ€ ์Šฌ๋กฏ ๋ฒ„ํŠผ ๋”๋ธ”ํด๋ฆญํ•ด์„œ\nA: 3~5๋ถ€๋Œ€ ์šด์šฉํ•˜์…”์š”~!\nA: ์ € ๋ณด์‹œ๋ฉด 5๋ถ€๋Œ€ ์ „๋ถ€\nA: ํ•œ๋ฐฉ์— ์šด์šฉํ•ฉ๋‹ˆ๋‹ค\nA: ์ž์ž ์น ๊ฒŒ์š” ํŒŒ์ˆ˜๊พผ!!\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#155425.500000#527495.250000\nC: ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.\nA: ๊ณต๊ฒฉ๊ณต๊ฒฉ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#159173.750000#528246.125000\nA: ๋ฉ‹์Ÿ์ด๋‹˜ ์ปค๋น„๋‹˜\nA: ๋ถ€๋Œ€ ์›€์ง์—ฌ์„œ ํŒŒ์ˆ˜๊พผ ๊ณต๊ฒฉํ•˜์…”์š”~~!\nB: ๊ฐ™์ด ๊ฐ€์œ ~~~ ํŒŒ์ˆ˜๊พผ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#160192.187500#525461.625000\nA: ์–ด์„œ์˜ค์„ธ์š” ~~\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์„ฑ ์ด๋™ํ•˜์‹œ๊ตฌ\nA: ํŒŒ์ˆ˜๊พผ์žก์œผ๋Ÿฌ ์ฐธ์—ฌํ•˜์…”์š” ์š”์ชฝ์œผ๋ฃจ '\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#165078.687500#525488.375000\nA: ๊ฐ™์ด ์žก์•„์•ผ ๋ณ‘๋ ฅ๋„ ๋œ์ฃฝ๊ณ  ๊ฒฝํ—˜์น˜๋„ ๊ฐ™์ด๋จน์–ด์š”~!\nA: ์–ด์„œ์˜ค์„ธ์š”์˜ค~~~~\nB: ์–ด์„œ์˜ค์„ธ์š”~\nA: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nA: ์„ฑ ์ด๋™ ์ขŒํ‘œ\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#164864.437500#527752.750000\nC: ์–ด์„œ์˜ค์„ธ์š”~ :)\nA: ๋ชฌ์Šคํ„ฐ ์ขŒํ‘œ\nA: ๊ณ„์† ํŒŒ์ˆ˜๊พผ ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋”~!!\nA: ์–ด์„œ์™€์„œ ์žก์œผ์…”์šฉ~~\nA: ๋ถ€๋Œ€ ํšŒ๊ตฐํ•ด์„œ\nA: ์ •๋น„ ํ•œ๋ฒ„ ใ„ดํ•˜๊ณ \nA: ๋‹ค์‹œ ๋‹ฌ๋ฆด๊ฒŒ์š”\nA: ๋‹ค๋“ค ๋ถ€๋Œ€ ์ฒด๋ ฅ์ด ๊ฐ„๋‹น๊ฐ„๋‹นํ•˜์‹ค๋“ฏํ•˜์—ฌ..","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
4
+ {"generated":"* A๋Š” ๋”œ๋ ˆ์ด ํŽ˜์ด์ง€ ์นด์šดํŠธ๋‹ค์šด์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ”Œ๋ ˆ์ด์–ด๋“ค์„ ๋ชจ์•„๋†“๊ณ  ์ค€๋น„ ์ƒํ™ฉ์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค. \n* ํŒŒ์ˆ˜๊พผ๊ณผ ์—ํ…Œ๋ฅด ์›์„ ์ถœํ˜„ ์œ„์น˜๋ฅผ ๊ณต์œ ํ•˜๊ณ , \nํŒŒ์ˆ˜๊พผ์„ ์žก๊ณ  ์—ํ…Œ๋ฅด ์›์„์„ ํ™œ์šฉํ•˜๋ฉด ์‹œ๊ฐ„ ์ œํ•œ ๋ฒ„ํ”„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ์•Œ๋ฆฝ๋‹ˆ๋‹ค.\n* '๋„์›€' & '๊ฐ€์†' ๋„๊ตฌ๋ฅผ ์ ๊ทน์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋ผ๋Š” ์กฐ์–ธ๋„ ๋ง๋ถ™์ด๊ณ , C๊ฐ€ ์‹œํƒ€๋ธ ๊ฑด์„ค์ด 3๋ถ„ ๋‚จ์•˜๋‹ค๊ณ  ์•Œ๋ฆฌ๋ฉฐ ํ”Œ๋ ˆ์ด์–ด๊ฐ€ ์กฐ๊ธˆ์”ฉ ์—ญํ• ์„ ๋งˆ์น˜๊ณ  ์ƒํ™ฉ์ด ์ง„ํ–‰๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.\n* ํ”Œ๋ ˆ์ด์–ด๋“ค์€ ์„œ๋กœ ์‘์›ํ•˜๊ณ , ๊ฒŒ์ž„ ์ง„ํ–‰ ์ƒํ™ฉ์„ ๋‚˜๋ˆ„๋ฉฐ ๊ฐˆ ๊ธธ์„ ๊ณ„ํšํ•ฉ๋‹ˆ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"C: ์š”๊ฑฐ ์žก๊ณ  ์Šคํƒ‘ ์ž ์‹œ ์ •๋น„ํ•˜๊ณ  ์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.\nA: ShareCoord#๋นˆ ๋•…#159195.468750#530362.062500\nA: ์—ฌ๊ธฐ ์ง‘๊ฒฐํ• ๊ฒŒ์š”!!!\nA: 18๋ถ„์— ์Šคํƒ€ํŠธ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค\nA: ๋ถ€๋Œ€ ํšŒ๊ตฐํ–ˆ๋‹ค๊ฐ€ ๋‹ค์ค‘ ํŽธ์„ฑ์œผ๋กœ ์ „๋ถ€ ๋Œ๊ณ  ๋‚˜์˜ค์…”์šฉ~!\nA: ๋‹ค๊ฐ™์ด ์žก์•„์•ผ ๊ฒฝํ—˜์น˜ ํ•จ๊ป˜๋จน๊ณ  ํญ์—… ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค\nA: ์–ด์„œ์–ด์„œ~!! ์›Œ๋ฝ€์ด๋“ค ๋นจ๋ฆฌ ๊ฐ•ํ•ด์ ธ์•ผ ๋‚˜์ค‘์— ์‹ธ์šธ๋•Œ ์žฌ๋ฐŒ์Šต๋‹ˆ๋‹ค~! ใ…Žใ…Žใ…Ž\nA: ShareCoord#๋นˆ ๋•…#159809.593750#529345.625000\nA: ์ด๋™์‹œ๊ฐ„๋•Œ๋ฌธ์— 20๋ถ„์— ๋ฐ”๋กœ ์‹œ์ž‘ํ• ๊ฒŒ์š”\nA: 10์‹œ 20๋ถ„์—” ๊ทธ๋ƒฅ ๋ฐ”๋กœ ์Šคํƒ€ํŠธํ•ฉ๋‹ˆ๋‹ต\nA: ์™€์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค ใ…Žใ…Ž ํ—คํ—ค\nA: ์ž ์น ๊ฒŒ์š”\nA: ๋‚จ์€๊ฑฐ ๋‹ค์žก๊ฒ ์Šต๋‹ˆ๋‹ค\nD: ๊ฐ€๋‚˜์š”?\nA: gogo!!!!\nA: ShareCoord#๋นˆ ๋•…#159518.640625#532655.562500\nA: ShareCoord#ํŒŒ์ˆ˜๊พผ#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#์—ํ…Œ๋ฅด ์›์„#160185.000000#531312.000000\nA: ํŒŒ์ˆ˜๊พผ์žก๊ณ ๋–จ์–ด์ง„\nA: ์—ํ…Œ๋ฅด ์›์„ ์ด๊ฑฐ ์‹œ๊ฐ„์ œ ๋ฒ„ํ”„๊ฐ™์œผ๋‹ˆ๊นŒ\nB: ๊ฑด๋ฌผ ์ง“๊ฑฐ๋‚˜ ๋ณ‘์‚ฌ ๋ฝ‘์„ ๋•Œ ์†๋ชจ์–‘ ๋„์›€ ๋ˆ„๋ฅด๋ฉด ์‹œ๊ฐ„์ด ๋นจ๋ผ์ง‘๋‹ˆ๋‹ค.\nA: ํ•„์š”ํ•˜์‹ ๊ฑฐ ๋น ๋ฅด๊ฒŒ ์ฑ„์ง‘ํ•ด์„œ ๋ฒ„ํ”„๋ฐ›์œผ์„ธ์š”\nB: ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•ด์ฃผ์„ธ์š”\nC: ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ ์ˆ˜๊ณ ํ•˜์…จ์Šต๋‹ˆ๋‹ค.\nC: ์—ํ…Œ๋ฅด ์›์„ ํ™œ์šฉํ•ด์ฃผ์„ธ์š”! ์ธํ„ฐ๋ ‰์…˜ํ•˜๋ฉด ๋ฒ„ํ”„ ๋ฐ›์Šต๋‹ˆ๋‹ค.\nC: ๋‹ค์‹œ ๊ธ€๋ฃจํ†ค ์‚ฌ๋ƒฅ ๊ฐ‘์‹œ๋‹น\nC: ๊ธธ๋“œ ์—ฐ๊ตฌ๋„ ์‹œ์ž‘ํ–ˆ์œผ๋‹ˆ\nC: ๊ธฐ๋ถ€ ๋งŽ์ด ํ•ด์ฃผ์„ธ์š”\nC: ๊ธฐ๋ถ€๋ฅผ ํ•˜๋ฉด ๊ฐœ์ธ ํฌ์ธํŠธ๋„ ์–ป์œผ๋‹ˆ๊นŒ ์ƒ๋ถ€์ƒ์กฐ\nA: ์˜คํ›„์ฏค์ด๋ฉด ์Ÿํ•œ๋ฒˆ ํ•˜์ง€ ์‹ถ๋„ค์š”\nC: ๋‹ค๋“ค ๋ฐœํ• ๋ผ ๊ฐ€์ž\nA: ์„ฑ์žฅ ํ˜ธ๋‹ค๋‹ฅ ํ•ด์ฃผ์‹œ๊ตฌ ๊ถ๊ธˆํ•œ๊ฑฐ ๋ฌผ์–ด๋ด์ฃผ์„ธ์š”\nA: ๋„์›€ ํŒŒํŒŒํŒ ๋ˆŒ๋Ÿฌ์ฃผ์‹œ๊ตฌ์šฅ~!\nA: ๊ฐ•ํ•œ ์›Œ๋ฝ€์ด๊ฐ€ ๋˜์ž!!!!!!!\nB: ๊ธธ๋“œ ์š”์ƒˆ 60% ๋„˜์—ˆ๋„ค์š”.\nB: ์˜์ฐจ ์˜์ฐจ ์˜์ฐจ~\nA: ์–ด์šฐ ๋‹ฌ์—‰ ๋„์›€ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค\nC: ใ…Žใ…Žใ…Žใ…Ž\nD: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋„์›€๋„์›€~!!! ์ตœ๊ณ ~!!!!!!\nA: ๊ฐ€์† ์•„๋ผ์ง€ ๋งˆ์‹œ๊ตฌ ํŒํŒ์“ฐ์„ธ์š”\nA: ๋งˆ์ง€๋ง‰๋‚ ๊นŒ์ง€ ๊ฐ€์† ๋‹ค ์“ฐ์ง€๋„ ๋ชปํ•ฉ๋‹ˆ๋‹ค ใ„ทใ„ท\nA: ๋„์›€-๊ฐ€์†-๋„์›€-๊ฐ€์†\nA: ์˜-์ฐจ-์˜-์ฐจ\nC: (ํก์กฑ)\nC: ์‹œํƒ€๋ธ ์š”์ƒˆ ์™„์„ฑ๊นŒ์ง€ 3๋ถ„๋‚จ์•˜์Šต๋‹ˆ๋‹ค.\nA: ์›Œ๋ฝ€์ด ์—ฌ๋Ÿฌ๋ถ„๋“ค\nA: ๊ถ๊ธˆํ•˜๊ฑฐ๋‚˜ ๋ง‰ํžˆ๋Š” ๊ฒƒ๋“ค ๋ง์”€์ฃผ์‹œ๋ฉด\nA: ๋‹ค ์•Œ๋ ค๋“œ๋ฆฌ๊ณ  ๋‹ค ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
5
+ {"generated":"* E๋Š” ๊ธธ๋“œ ์ž์›์ด ๋ถ€์กฑํ•˜๋‹ค๋Š” ์ด์œ ๋กœ ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด ์žก์•„๋‹ฌ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. \n* B๋Š” ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๋ฏธ์…˜์„ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ๋ช‡๋ช‡ ์‚ฌ๋žŒ๋“ค์„ ๋ชจ์ง‘ํ•ฉ๋‹ˆ๋‹ค. B๋Š” ๋‹ค๋ฅธ ํ™œ๋™์„ ์•ˆ๋‚ดํ•˜๋ฉฐ F์™€ ํ•จ๊ป˜ ๋ฏธ์…˜์„ ์™„๋ฃŒํ•ฉ๋‹ˆ๋‹ค.\n* C๋Š” ๊ธธ๋“œ ์—ฐ๊ตฌ ๊ธฐ๋ถ€๋ฅผ ๋ถ€ํƒํ•˜๋ฉฐ ์ „์ดˆ๊ธฐ์ง€ ๊ฑด์„ค์„ ์ง€์›ํ•˜๊ธฐ๋ฅผ ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค. B๋Š” C์˜ ์š”์ฒญ์„ ๋ฐ›์•„๋“ค์—ฌ ์ „์ดˆ๊ธฐ์ง€ ๊ฑด์„ค์— ๊ธฐ์‚ฌ๋‹จ์„ ๋ฐฐ์น˜ํ•˜๊ณ  2๋ ˆ๋ฒจ ์ด์ƒ์˜ ๋ถ€๋Œ€๋“ค์„ ์ฑ„์ง‘์ง€๋กœ ๋ณด๋‚ด ์ž์› ํ™•๋ณด๋ฅผ ์œ„ํ•œ ๊ณ„ํš์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.\n* D๋Š” ์ „์ดˆ๊ธฐ์ง€๊ฐ€ ๊ณง ์™„์„ฑ์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ๋งํ•˜๋ฉฐ, ๋‹ค๋ฅธ ์„ฑ์˜ ์ •์ฐฐ์„ ์–ด๋–ป๊ฒŒ ๋ณด๋‚ธ์ง€์— ๋Œ€ํ•ด A์—๊ฒŒ ๋ฌธ์˜ํ•ฉ๋‹ˆ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"E: ์—ฌ๋Ÿฌ๋ถ„!\nE: ๊ธ€๋ฃจํ†ค ๋งŽ์ด ์žก์•„์ฃผ์„ธ์š”!\nE: ๊ธธ๋“œ ์ž์›์ด ๋ถ€ ์กฑํ•ฉ๋‹ˆ๋‹ค.\nE: ์ –๊ณผ๊ฟ€์ด ํ๋ฅด๋Š” ์‹œํƒ€๋ธ์ด ๋˜๊ณ  ์‹ถ๋‹ค?\nE: ๊ธ€๋ฃจํ†ค์„ ๋งŽ์ด ์žก์•„์ฃผ์‹ญ์‡ผ\nG: ๊ฐ€๋‚œํ•œ ์‹œํƒ€๋ธ์€ ์ž‡์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค\nE: ํ”Œ๋ฆฌ์ฆˆ\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ๋ฏธ์…˜ ๊ฑธ๋ฆฌ์‹ ๋ถ„?\nB: ๊ฐ™์ด ์žก์œผ์‹ค๋ถ„ ๊ณ„์‹ ๊ฐ€์š” ~\nF: ์ขŒํ‘œ์ฃผ์„ธ์š”\nB: ShareCoord#๋นˆ ๋•…#155225.781250#534257.875000\nB: ShareCoord#๋นˆ ๋•…#154322.968750#533640.812500\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ ์žก์œผ์‹ค๋ถ„ ๊ตฌํ•ด์š”~~~\nB: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nB: ์ถœ๋ฐœํ•ฉ๋‹ˆ๋‹ค\nB: ๊ฐ•ํ™”๋ชฌ์Šคํ„ฐ~!\nB: ๋‚˜์ด์Šค์ปค๋น„๋‹˜\nF: ใ…Žใ…Ž\nB: ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ท\nF: ๊ตฟ๊ตฟ\nB: ๋„ˆ๋ฅธ ๋ฒŒํŒ์„ ๋›ฐ๋†€๊ณ  ์‹ถ์€\nC: ๊ธธ๋“œ ์—ฐ๊ตฌ ๊ธฐ๋ถ€ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.\nB: ์›Œ๋ณด์ด์˜ ๋งˆ์Œ์„ ์•„๋Š”๊ฐ€\nC: ๊ธธ๋“œ ๋ณด์‹œ๋ฉด ๊ธธ๋“œ ์—ฐ๊ตฌ ์žˆ์–ด์š”~\nB: ShareCoord#1๋ ˆ๋ฒจ ๋“œ๋ž˜๊ณค ๋ ˆ์–ด#138900.000000#535800.000000\nB: ์ „์ดˆ๊ธฐ์ง€ ์ง“๊ณ  ๋“œ๋ž˜๊ณค ์•Œ ํ•œ๋ฒˆ ์–ป์–ด๋ณผ๊ฒŒ์š”~~\nB: ์•Œ์„ ์–ป๊ณ  ์ข€ ํ‚ค์›Œ์•ผ ์šฉ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค~~~!!\nC: ์ „์ดˆ๊ธฐ์ง€ ๊ฑด์„ค์— ๊ธฐ์‚ฌ๋‹จ 1๊ฐœ์”ฉ ์ง€์› ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค\nE: ๊ธ€๋ฃจํ†ค๋„ ๋งŽ์ด ์žก์•„์ฃผ์„ธ์š”\nE: ๊ธธ๋“œ ์ž์›์„ ๋ชจ์•„์•ผ ํ•ฉ๋‹ˆ๋‹น\nB: ๋„ต ์ž„๋ชจํƒ„๋‹˜\nB: ์ถฉ์„ฑ์ถฉ์„ฑ์ถฉ์„ฑ XXXXXXXXXXX\nE: ๊ฐ€์ฆˆ์•„ ๋ฐœํ• ๋ผ!\nB: ํ™œ๋™ ํฌ์ธํŠธ ์–ป๋Š” ๋ฒ•\nB: 1. ๊ธธ๋“œ > ์—ฐ๊ตฌ > ๊ธธ๋“œ ์—ฐ๊ตฌ ๊ธฐ๋ถ€(์ถ”์ฒœ ์—ฐ๊ตฌ)\nB: 2. ๊ธธ๋“œ ๋ณธ๋ถ€ > ๋„์›€\nB: ํ™œ๋™ ํฌ์ธํŠธ ์‚ฌ์šฉ๋ฒ•\nB: 1. ๊ธธ๋“œ > ๊ธธ๋“œ ์ƒ์  > ์ƒํ’ˆ ๊ตฌ๋งค(๊ธฐ์‚ฌ ๊ฒฝํ—˜์น˜ ๊ฟ€)\nB: ๊ธฐ์‚ฌ๋ ™์—…๋นจ๋ฆฌํ•ด์„œ ๊ดด์Žˆ ๋ชฌ์Šคํ„ฐ ๋นจ๋ฆฌ์žก๊ณ  ๊ณ ์„ฑ์žฅํ•ฉ์„ธ๋‹ค~!!\nB: ๋„์›€ ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค ํ—คํ—ค\nB: ๊ณง ์‹์‚ฌ์‹œ๊ฐ„์ผํ…๋ฐ์š”\nB: ์ „์ดˆ๊ธฐ์ง€์— ๊ธฐ์‚ฌ๋‹จ 1๊ฐœ ๋„ฃ์–ด์ฃผ์‹œ๊ตฌ\nB: ๋‚˜๋จธ์ง€ ๋ถ€๋Œ€๋Š” 2๋ ˆ๋ฒจ ์ด์ƒ ์ฑ„์ง‘์ง€์— ๋„ฃ์–ด๋‘์‹  ํ›„ ์‹์‚ฌ๋‹ค๋…€์˜ค์‹œ๋ฉด\nB: ์ด๋ฒคํŠธ๋„ ํด๋ฆฌ์–ด๋˜๊ณ  ์ž์›๋„ ๋นต๋นตํ•ด์ง‘๋‹ˆ๋‹ค~~~\nB: ์•ˆ๋…•ํ•˜์„ธ์š”~~~~\nB: ๊ถ๊ธˆํ•œ๊ฑฐ ๋ฌผ์–ด๋ณด์…”์š”~~~ ๋‹ค ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค ~~\nD: ์šฐ๋ฆฌ ์ „์ดˆ๊ธฐ์ง€ 3๋ถ„ ๋‚จ์•˜๋Š”๋ฐ.\nA: ์ •์ฐฐ์€ ์–ด๋–ป๊ฒŒ ๋ณด๋‚ด๋‚˜์š”\nD: ์ข€๋งŒ ๋” ์˜์ฐจ์˜์ฐจ ํ•ด์ฃผ์‹œ๋ฉด\nD: ๋‚จ์˜ ์„ฑ ์ฐ๊ณ  ์ •์ฐฐ ๋ฒ„ํŠผ ํด๋ฆญํ•˜๋‹ˆ๊นŒ ์ •์ฐฐ ๋ณด๋‚ด์กŒ์–ด์š”.","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
6
+ {"generated":"* ์—ฌ๋Ÿฌ ์‚ฌ๋žŒ์ด ํ•จ๊ป˜ ๊ฒŒ์ž„์„ ํ•˜๊ณ  ์žˆ๋‹ค. \n* A๊ฐ€ ์ด์Šค๋ผ์—˜์— ์™€์„œ ์ด๋ฒคํŠธ์— ์ฐธ์—ฌํ•˜๋ผ๋Š” ๋ถ€ํƒ์„ ๋ฐ›๊ณ , ๊ทธ ํ›„ ๋‹ค๋ฅธ ํ”Œ๋ ˆ์ด์–ด๋“ค๊ณผ ๋Œ€ํ™”๋ฅผ ๋‚˜๋ˆˆ๋‹ค.\n* C๋Š” ๊ฒŒ์ž„ ๋‚ด ๋งˆ์„์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ๊ณต์œ ํ•˜๊ณ , ๋งˆ์„์„ ๋ฐฉ๋ฌธํ•˜๋ผ๊ณ  ๋‹ค๋ฅธ ํ”Œ๋ ˆ์ด์–ด D์—๊ฒŒ ๊ถŒ์œ ํ•œ๋‹ค. \n* E๋Š” 1์‹œ์— ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ์„ ํ•œ๋‹ค๊ณ  ์•Œ๋ฆฌ๊ณ  ์ขŒํ‘œ๋ฅผ ๊ณต์œ ํ•œ๋‹ค. \n* ์ƒˆ๋กœ์šด ํ”Œ๋ ˆ์ด์–ด B์™€ F๊ฐ€ ๊ฐ€์ž…ํ•˜๋ฉด C์™€ E๋Š” ํ™˜์˜ํ•œ๋‹ค.\n* C๋Š” ํ”Œ๋ ˆ์ด์–ด D๊ฐ€ ๋‹ค๋ฅธ ์ง€์—ญ์œผ๋กœ ์ด๋™ํ–ˆ๋Š”๋ฐ, ์ด์— ๋Œ€ํ•ด ๋†๋‹ด์„ ๋‚˜๋ˆˆ๋‹ค.\n* ๋งˆ์ง€๋ง‰์œผ๋กœ C๋Š” ๋‚จ์˜์ปค๋น„(ํ˜น์€ ์ปค๋น„?)๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋ฆฌ๊ณค์˜ ์œ„์น˜๋ฅผ ๊ณต๊ฐœํ•˜๊ณ  ์‹ ๋น„๋กœ์šด ํƒ์‚ฌ๋ฅผ ์ค€๋น„ํ•˜์ž๊ณ  ์ œ์•ˆํ•œ๋‹ค.","instruction":"์ฃผ์–ด์ง„ ์ฑ„ํŒ…์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ž‘์„ฑํ•ด์ค˜. ์‚ฌ๋žŒ๋“ค์˜ ๋ฐœํ™”๋ฅผ ํ†ตํ•ด์„œ ์–ด๋–ค ์ƒํ™ฉ์ธ์ง€ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉด ์ข‹๊ฒ ์–ด. ์˜ˆ์‹œ์ฒ˜๋Ÿผ \"์š”์•ฝ:\" ๋’ค์—๋‹ค ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ผ.","source":"D: ๋ณดํ˜ธ๋ง‰์žˆ๋Š” ์„ฑ ๋นผ๊ณ \nD: ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ฑ„์ง‘ํ•˜๋Š” ๊ณณ๋„ ์ •์ฐฐ ๋˜๋˜๋ฐ์š”.\nC: ์ •์ฐฐ์€ ๋งˆ์„์ด๋‚˜ ํƒ€์ธ์˜ ์„ฑ์„ ๋ˆ„๋ฅด๋ฉด ์ •์ฐฐ ๋˜๋Š”๋ฐ์š”\nE: ์ •์ฐฐ์€ ์˜์ง€์— ์ •์ฐฐ ๋ณธ๋ถ€๋ฅผ ๊ฑด์„คํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค\nC: ShareCoord#2๋ ˆ๋ฒจ ํ”„๋กœ์ž” ๋งˆ์„#148500.000000#524397.000000\nC: ๋ฏธ์…˜๋–„๋•Œ๋ฌธ์ด์‹œ๋ฉด ์ด ๋งˆ์„์ด๋‚˜\nA: ์˜ค ํŒŒ๋ž‘์ƒˆ ๋‚ ์•„๊ฐ„๋‹ค\nC: ์˜ค ๋˜์…จ๋‚˜์š”\nA: ใ…Žใ…Ž ๋„ค ใ„ณใ„ณ\nD: ํŒŒ๋ž‘์ƒˆ...\nC: ์‚ฌ์‹ค๋ถ€์—‰์ด์ž…๋‹ˆ๋‹ค ใ…‹ใ…‹\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ๋”ฑ๋‹ค๊ตฌ๋ฆฌ๋ผ๊ณ  ํ•ด๋„ ๋ฏฟ์„ ์ฃผ๋‘ฅ์ด ๊ธธ์ด์ง€๋งŒ ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋‹ˆ ๋ฏฟ๊ฒ ์Šต๋‹ˆ๋‹ค\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nA: ํŒ๊ฒŒ์•„. ๊ทธ ์‹ ๋น„์˜ ์„ธ๊ณ„\nC: ์ž…์ด ์‚์ฃฝ ๋‚˜์™”์–ด์š”\nC: ์‚์ง„๋“ฏ์š”\nA: ๋‹ค์‹  ์ •์ฐฐ ์•ˆ๋ณด๋‚ผ๊ฒŒ ใ„ท\nE: ใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Ž\nC: ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹\nD: ๋\nC: ๊ตฐ์ฃผ๋‹˜ ๊ณ„์‹ ๊ฐ€์š”\nC: ์ŠคํŠธ๋ฆฟ ๊ตฐ์ฃผ๋‹˜\nC: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nC: ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์˜ค์‹œ๋ฉด ๊ธ€๋ค๋ฃจํ†ค ๊ณต๊ฒฉ๋Œ€ ์ง‘๊ฒฐ์ด ํ›จ์”ฌ ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค!!\nC: ์ด์ชฝ์œผ๋กœ ์˜ค์…”์šฉ~!\nC: ์ขŒํ‘œ ๊ทผ์ฒ˜ ๋•…์„ ๊พธ์šฑ ๋ˆ„๋ฅด์‹  ํ›„ ์„ฑ ์ด๋™์„ ๋ˆ„๋ฅด์‹œ๋ฉด ์„ฑ์ด ์ด์ „๋ฉ๋‹ˆ๋‹ค\nE: 1์‹œ์— ํŒŒ์ˆ˜๊พผ ์‚ฌ๋ƒฅ ๊ฐ‘๋‹ˆ๋‹ค.\nE: ํƒœ๊ทธ ๋‹ฌ์•˜์œผ๋‹ˆ ์œ„์น˜ ํ™•์ธํ•ด์ฃผ์„ธ์š” :)\nB: ์•ˆ๋…•ํ•˜์„ธ์š” ๊ฐ€์ž…์ธ์‚ฌ ๋“œ๋ฆฝ๋‹ˆ๋‹ค\nE: ์•ˆ๋…•ํ•˜์„ธ์š”! ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹น :)\nC: ์•ˆ๋…•ํ•˜์„ธ์š”~~~\nC: 1์‹œ ํŒŒ์ˆ˜๊พผ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค\nE: ์˜ˆ์••\nF: ์•ˆ๋…•ํ•˜์„ธ์š” ๋ฐ˜๊ฐ‘์Šต๋‹ˆ๋‹ค\nC: ShareCoord#[CITA]๊ธธ๋“œ์š”์ƒˆ#149559.000000#521478.000000\nE: ์–ด์„œ์˜ค์‹ญ์‡ผ!\nE: ์ –๊ณผ ๊ฟ€์ด ํ๋ฅด๋Š” ์‹œํƒ€๋ธ์ž…๋‹ˆ๋‹ค.\nE: ๊ธธ๋“œ ์š”์ƒˆ ๊ทผ์ฒ˜๋กœ ์„ฑ ์ด๋™ ํ•ด์ฃผ์„ธ์š”! :)\nD: ๋„ค~~~~~~\nC: ShareCoord#[STOP]์ „์ดˆ๊ธฐ์ง€#-152190.000000#492921.000000\nC: ์šฐ๋ฆฌ๋„ ์š”๊ธฐ์ฒ˜๋Ÿผ ์‹ธ์šฐ๊ฒŒ ๋ ์ง€๋„ ๋ชฐ๋ผ์š” ใ…Žใ…Ž\nC: ๋ฌด๋ ฅ์„ ๋นจ๋ฆฌ ํ‚ค์›Œ๋‘ก์‹œ๋‹ค~!!\nC: ํ—!\nC: ์ปค๋น„๋‹˜ ํ˜น์‹œ ์‹œํƒ€๋ธ ๋ฒ„๋ฆฌ๊ณ  ๊ฐ€์‹œ๋‚˜์š”\nC: ShareCoord#[CITA]๋‚จ์˜์ปค๋น„#-155181.000000#523710.000000\nC: ๋“ค์ผฐ์œผ๋‹ˆ ๋ณต๊ท€ํ•˜์‹œ์ฃต\nC: ใ…Žใ…Žใ…Žใ…Žใ…Žใ…Žใ…Ž\nC: ํƒ์‚ฌ ์ค€๋น„ํ•˜์„ธ์š”\nC: ShareCoord#[INSD]๋ถˆ์•ˆ์ด#-148347.000000#488490.000000","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
streamlit_app_local/user_submit/llm/240413_dora.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ๋” ์ข‹์•„."}
2
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๊ทผ๋ฐ ๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค ๊ฒจ์šธ์ด ๋” ์ข‹์•„. ์™œ๋ƒํ•˜๋ฉด ์‚ด์ง ์ถ”์šด ๋‚ ์”จ๊ฐ€ ์˜ท ์ž…๊ธฐ๋„ ์‰ฝ๊ณ , ํ‘น ์‰ฌ๊ณ  ์žˆ์„ ์ˆ˜๋„ ์žˆ์–ด์„œ ์ข‹๊ฑฐ๋“ ."}
3
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค ๊ฒจ์šธ์ด ์ข‹์•„. ์™œ๋ƒํ•˜๋ฉด ๊ฒจ์šธ์—๋Š” ๋ฐ–์— ๋‚˜๊ฐ€์„œ ๋†€๊ธฐ๋„ ํŽธํ•˜๊ณ , ๋ˆˆ ๋ฎ์ธ ์‚ฐ์ด๋‚˜ ํ˜ธ์ˆ˜๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์–ด์„œ ์ข‹๊ธฐ ๋•Œ๋ฌธ์ด์•ผ."}
4
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์€ ํ•™๊ต๋ผ๋Š” ๊ณณ์— ๋ฐฉ๋ฌธํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ง‘์— ๋Œ์•„์™€์„œ๋Š” ๋ฐ”๋กœ ์ˆ™์ œ ๋ฌธ์ œ์ง‘์„ ํ’€๊ธฐ ์‹œ์ž‘ํ–ˆ๋Š”๋ฐ, ์ด๊ฒŒ ์ง€๊ธˆ๊นŒ์ง€ ํ–ˆ๋˜ ๊ฒƒ ์ค‘์—์„œ ๊ฐ€์žฅ ์ง‘์ค‘์ด ์ž˜ ๋๋‹ค. ์ง€๊ธˆ์€ ์ˆ™์ œ๋„ ๋งˆ์น˜๊ณ  ์Šฌ์Šฌ ์ž ์ž๋ฆฌ์— ๋“œ๋Š” ์ค‘์ด๋‹ค."}
5
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์€ ํ•˜๋ฃจ ์ข…์ผ ํ•™๊ต์— ๊ฐ”๋‹ค ์™”๋‹ค. ์ง‘์— ์™€์„œ ์ž ๊น ์•‰์•„ ํœด์‹์„ ์ทจํ•˜๊ณ , ์ˆ™์ œ๋„ ๋งˆ์ € ๋‹ค ๋๋ƒˆ๋‹ค. ์˜ค๋Š˜์€ ์ฑ…๋„ ๋งŽ์ด ์ฝ๊ณ , ๋งˆ์Œ๋„ ํŽธํ•˜๊ฒŒ ์‰ฌ์–ด์•ผ๊ฒ ๋‹ค."}
6
+ {"model_id": "240413_dora", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์€ ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ์‰ฌ๋Š” ์‹œ๊ฐ„์— ๊ณต๋ถ€๋ฅผ ํ•˜๋ฉฐ ํ•˜๋ฃจ๋ฅผ ๋งˆ๋ฌด๋ฆฌ ์ง€์—ˆ๋‹ค."}
streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ๋” ๋งˆ์Œ์— ๋“ค์–ด."}
2
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์•„๋ฌด๋ž˜๋„ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ๋” ์ข‹๋‹ค. ์ด์œ ๋Š” ๋ฌผ๋ก  ์ถ”์›Œ์„œ์ด๊ธฐ๋„ ํ•˜์ง€๋งŒ, ๋‚ด๊ฐ€ ํ”ผ๋ถ€๋กœ ๋Š๋ผ๋Š” ๊ฒƒ๋“ค ๋•Œ๋ฌธ์ด๋‹ค. ํŠนํžˆ๋‚˜ ๋ˆˆ์€ ์ •๋ง ํ•˜์–—๊ณ  ๊นจ๋—ํ•˜๋‹ค. ์ด๊ฒƒ์€ ๊ฒจ์šธ์—๋งŒ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ฐ€์žฅ ์•„๋ฆ„๋‹ค์šด ํ’๊ฒฝ ์ค‘ ํ•˜๋‚˜๋‹ค."}
3
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ๋” ์ข‹์•„."}
4
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๋ฐฉ๋ฌธํ–ˆ๋‹ค. ์ง‘์— ๋Œ์•„์™€์„œ ์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ–ˆ๋Š”๋ฐ ๋‚ด์ผ๋ถ€ํ„ฐ๋Š” ๋˜ ๋‹ค๋ฅธ ์ˆ™์ œ๋“ค์ด ๊ธฐ๋‹ค๋ฆฌ๊ณ  ์žˆ์„ ๊ฒƒ์ด๋‹ค."}
5
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์•„ ์˜ค๋Š˜์€ ์ •๋ง ๋ฐ”์œ ํ•˜๋ฃจ์˜€๋‹ค. ํ•™๊ต์— ๊ฐ€์„œ ์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ง‘์— ๋Œ์•„์™€์„œ๋Š” ์ˆ™์ œ๋„ ๋‹ค ๋๋‚ด๋†“๊ณ  ์ž๊ธฐ๋กœ ํ–ˆ๋‹ค."}
6
+ {"model_id": "240413_dora_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”์—ˆ๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋‹ค ๋งˆ์น˜๊ณ , ์ด์ œ ์ž˜ ์ค€๋น„ํ•˜๋ ค๊ณ  ํ•œ๋‹ค~"}
streamlit_app_local/user_submit/llm/240413_full.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ํ›จ์”ฌ ๋” ์ข‹์€ ๊ฒƒ ๊ฐ™์•„."}
2
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์„ ํ›จ์”ฌ ๋” ์„ ํ˜ธํ•ด. ์—ฌ๋ฆ„์˜ ๋†’์€ ๊ธฐ์˜จ์€ ๋‚˜๋ฅผ ์ง€์น˜๊ฒŒ ๋งŒ๋“ค๊ฑฐ๋“ . ์‹œ์›ํ•œ ๋ฐ”๋žŒ๊ณผ ํ•จ๊ป˜ ๋”ฐ๋œปํ•œ ์ด๋ถˆ ์†์—์„œ ํ‘น ์‰ฌ๋‹ค๊ฐ€ ๋‹ค์‹œ ์ƒˆ๋กœ์šด ๋งˆ์Œ์œผ๋กœ ๋‹ค์‹œ ์‹œ์ž‘ํ•˜๊ณ  ์‹ถ์–ด."}
3
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ๋” ์ข‹๋‹ค."}
4
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์˜ ์ผ์ƒ ๊ธฐ๋ก:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ๋Œ์•„์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜์˜ ์ˆ™์ œ๋„ ๋๋งˆ์น˜๊ณ  ์ด์ œ ๊ณง ์ž˜ ์ค€๋น„ํ•  ์‹œ๊ฐ„์ด๋‹ค."}
5
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์€ ์ •๋ง ํ•˜๋ฃจ ์ข…์ผ ๋ฐ”๋นด๋‹ค. ํ•™๊ต๋ฅผ ๊ฐ”๊ณ , ์ง‘์— ๋Œ์•„์™€์„œ ๋ฐ”๋กœ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ–ˆ๋‹ค. ์˜ค๋Š˜ ํ•  ์ˆ™์ œ๋ฅผ ๋ชจ๋‘ ๋งˆ์ณค๊ณ , ์ด์ œ๋Š” ์ž ์„ ์ž๊ธฐ ์œ„ํ•ด ์ž๋ฆฌ์— ๋ˆ„์›Œ์•ผ๊ฒ ๋‹ค."}
6
+ {"model_id": "240413_full", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”์—ˆ๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ–ˆ๋‹ค. ์˜ค๋Š˜ ๊ณผ์ œ๋„ ๋‹ค ๋๋งˆ์น˜๊ณ  ์ด์ œ ์ž ์ž๋ฆฌ์— ๋“ค ์‹œ๊ฐ„์ด๋‹ค."}
streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ํ›จ์”ฌ ๋” ์ข‹์€ ๊ฒƒ ๊ฐ™์•„."}
2
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ํ›จ์”ฌ ๋” ์ข‹๋‹ค. ๊ฒจ์šธ์—๋Š” ์ถ”์šด ๋‚ ์”จ ๋•๋ถ„์— ์‚ฐ์ฑ…์ด๋‚˜ ์šด๋™์„ ํ•˜๊ธฐ์—๋„ ํŽธํ•˜๊ณ , ๊ฒจ์šธ์˜ ์ •์ทจ๋ฅผ ๋Š๋‚„ ์ˆ˜ ์žˆ์–ด์„œ ์ข‹๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฒจ์šธ์—๋Š” ๋ˆˆ์ด ๋‚ด๋ฆด ๋•Œ์˜ ๋ถ„์œ„๊ธฐ๊ฐ€ ์ •๋ง ๋‚ญ๋งŒ์ ์ด๋‹ค."}
3
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "๋‚œ ์—ฌ๋ฆ„๋ณด๋‹ค๋Š” ๊ฒจ์šธ์ด ์ข‹์•„.", "generated": "๋‚˜๋Š” ์—ฌ๋ฆ„๋ณด๋‹จ ๊ฒจ์šธ์ด ๋” ์ข‹๋”๋ผ. ๊ฒจ์šธ์—๋Š” ์ฐจ๊ฐ€์šด ๋ฐ”๋žŒ์ด ์–ผ๊ตด์„ ํ†กํ†ก ๋‘๋“œ๋ฆฌ๊ณ , ๋ˆˆ์ด ๋‚ด๋ฆฌ๋ฉด ์„ธ์ƒ์ด ํ•˜์–—๊ณ  ์•„๋ฆ„๋‹ต๊ฒŒ ๋ณด์—ฌ."}
4
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ๋‚ด์šฉ์„ ์œ ์ง€ํ•œ ์ฑ„, ์–ดํœ˜๋ฅผ ๋” ๊ธธ๊ฒŒ ๋ฐ”๊พธ๊ฑฐ๋‚˜ ์‚ฌ์†Œํ•œ ๋‚ด์šฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๋ฐฉ๋ฌธํ–ˆ๋‹ค. ์ง‘์— ๋Œ์•„์™€ ์—ด์‹ฌํžˆ ๊ณต๋ถ€ํ–ˆ๋‹ค. ์˜ค๋Š˜์˜ ๊ณผ์ œ๋„ ๋ชจ๋‘ ๋๋งˆ์น˜๊ณ , ์ด์ œ๋Š” ์ž ์— ๋“ค ์‹œ๊ฐ„์ด๋‹ค."}
5
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์ค‘๊ฐ„ ์ค‘๊ฐ„์— ์ƒˆ๋กœ์šด ๋‚ด์šฉ์ด๋‚˜ ๋ฌธ์žฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ธธ๊ฒŒ ๋Š˜๋ ค์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜์€ ํ•™๊ต์— ๊ฐ€๋Š” ๋‚ ์ด์—ˆ๋‹ค. ์ง‘์—์„œ ์•„์นจ์„ ๋จน๊ณ  ํ•™๊ต์— ๋„์ฐฉํ–ˆ๋‹ค. ์ˆ˜์—…์ด ์‹œ์ž‘๋˜์—ˆ๊ณ , ๋‚˜๋Š” ์—ด๊ณต ๋ชจ๋“œ๋กœ ๋“ค์–ด๊ฐ”๋‹ค. ์ˆ™์ œ๋„ ๋‹ค ๋๋ƒˆ๊ณ  ์ด์ œ ์ž์•ผ๊ฒ ๋‹ค."}
6
+ {"model_id": "240413_full_epoch_5", "task": "๋Š˜๋ ค์“ฐ๊ธฐ", "instruction": "์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๊ธ€์˜ ์–ดํˆฌ๋ฅผ ๋ฐ”๊พธ๊ฑฐ๋‚˜, ๋ฌธ์žฅ์„ ์—ฌ๋Ÿฌ๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๋‹ค์‹œ ์จ์ฃผ์„ธ์š”.", "source": "์˜ค๋Š˜์˜ ์ผ๊ธฐ:\n์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค. ์ง‘์— ์™€์„œ ๊ณต๋ถ€๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋„ ๋๋‚ด๊ณ  ์ž์•ผ๊ฒ ๋‹ค.", "generated": "์˜ค๋Š˜ ๋‚˜๋Š” ํ•™๊ต์— ๊ฐ”๋‹ค ์™”๋‹ค. ์ง‘์— ์™€์„œ ์ˆ™์ œ๋ฅผ ํ–ˆ๋‹ค. ์˜ค๋Š˜ ์ˆ™์ œ๋ฅผ ๋๋‚ด๋ฉด ์ž์•ผ๊ฒ ๋‹ค."}