Spaces:
Running
Running
others
Browse filesThis view is limited to 50 files because it contains too many changes. ย
See raw diff
- README.md +122 -0
- README_kr.md +123 -0
- __pycache__/analysis_utils.cpython-311.pyc +0 -0
- __pycache__/app.cpython-311.pyc +0 -0
- __pycache__/query_comp.cpython-311.pyc +0 -0
- __pycache__/view_utils.cpython-311.pyc +0 -0
- analysis_utils.py +338 -0
- app.py +452 -0
- eval_models_list.txt +3 -0
- eval_prompt_list.txt +3 -0
- guide_mds/input_jsonls_en.md +37 -0
- guide_mds/input_jsonls_kr.md +34 -0
- modules/__pycache__/nav.cpython-311.pyc +0 -0
- modules/nav.py +57 -0
- pages/__pycache__/see_results.cpython-311.pyc +0 -0
- pages/brief_intro.py +40 -0
- pages/quick_start_guide.py +14 -0
- pages/see_results.py +365 -0
- precommit.sh +5 -0
- query_comp.py +137 -0
- requirements.txt +14 -0
- run.sh +4 -0
- streamlit_app_local/.streamlit/config.toml +4 -0
- streamlit_app_local/README.md +24 -0
- streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc +0 -0
- streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc +0 -0
- streamlit_app_local/__pycache__/app.cpython-311.pyc +0 -0
- streamlit_app_local/__pycache__/app.cpython-38.pyc +0 -0
- streamlit_app_local/__pycache__/query_comp.cpython-311.pyc +0 -0
- streamlit_app_local/__pycache__/view_utils.cpython-311.pyc +0 -0
- streamlit_app_local/__pycache__/view_utils.cpython-38.pyc +0 -0
- streamlit_app_local/analysis_utils.py +338 -0
- streamlit_app_local/app.py +318 -0
- streamlit_app_local/eval_models_list.txt +3 -0
- streamlit_app_local/eval_prompt_list.txt +5 -0
- streamlit_app_local/guide_mds +1 -0
- streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc +0 -0
- streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc +0 -0
- streamlit_app_local/modules/nav.py +46 -0
- streamlit_app_local/pages/brief_intro.py +40 -0
- streamlit_app_local/pages/see_results.py +391 -0
- streamlit_app_local/query_comp.py +135 -0
- streamlit_app_local/requirements.txt +5 -0
- streamlit_app_local/run.sh +3 -0
- streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl +6 -0
- streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl +6 -0
- streamlit_app_local/user_submit/llm/240413_dora.jsonl +6 -0
- streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl +6 -0
- streamlit_app_local/user_submit/llm/240413_full.jsonl +6 -0
- streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl +6 -0
README.md
CHANGED
@@ -12,3 +12,125 @@ short_description: VARCO Arena is a reference-free LLM benchmarking approach
|
|
12 |
---
|
13 |
|
14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
15 |
+
|
16 |
+
# Varco Arena
|
17 |
+
Varco Arena conducts tournaments between models to be compared for each test set command, ranking models accurately at an affordable price. This is more accurate and cost-effective than rating win rates by comparing against reference outputs.
|
18 |
+
|
19 |
+
For more information, the followings may help understanding how it works.
|
20 |
+
* [Paper](https://huggingface.co/papers/2411.01281)
|
21 |
+
* [Blog Post (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
22 |
+
|
23 |
+
|
24 |
+
## Quickstart
|
25 |
+
### Running Web Demo locally (streamlit, Recommended!)
|
26 |
+
```bash
|
27 |
+
git clone [THIS_REPO]
|
28 |
+
# install requirements below. we recommend miniforge to manage environment
|
29 |
+
cd streamlit_app_local
|
30 |
+
bash run.sh
|
31 |
+
```
|
32 |
+
For more details, see `[THIS_REPO]/streamlit_app_local/README.md`
|
33 |
+
|
34 |
+
### CLI use
|
35 |
+
* located at
|
36 |
+
* `varco_arena/`
|
37 |
+
* debug configurations for vscode at
|
38 |
+
* `varco_arena/.vscode`
|
39 |
+
```bash
|
40 |
+
## gpt-4o-mini as a judge
|
41 |
+
python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
|
42 |
+
## vllm-openai served LLM as a judge
|
43 |
+
python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
|
44 |
+
|
45 |
+
# dbg lines
|
46 |
+
## openai api judge dbg
|
47 |
+
python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
|
48 |
+
## other testing lines
|
49 |
+
python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
|
50 |
+
## dummy judge dbg (checking errors without api requests)
|
51 |
+
python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
|
52 |
+
```
|
53 |
+
|
54 |
+
## Requirements
|
55 |
+
We tested this on `python = 3.11.9` env: `requirements.txt`
|
56 |
+
```
|
57 |
+
openai>=1.17.0
|
58 |
+
munch
|
59 |
+
pandas
|
60 |
+
numpy
|
61 |
+
tqdm>=4.48.0
|
62 |
+
plotly
|
63 |
+
scikit-learn
|
64 |
+
kaleido
|
65 |
+
tiktoken>=0.7.0
|
66 |
+
pyyaml
|
67 |
+
transformers
|
68 |
+
streamlit>=1.40.2
|
69 |
+
openpyxl
|
70 |
+
git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
|
71 |
+
|
72 |
+
# Linux
|
73 |
+
uvloop
|
74 |
+
# Windows
|
75 |
+
winloop
|
76 |
+
```
|
77 |
+
|
78 |
+
#### Argument
|
79 |
+
- -i, --input : directory path which contains input jsonlines files (llm outputs)
|
80 |
+
- -o, --output_dir : directory where results to be put
|
81 |
+
- -e, --evaluation : judge model specification (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", \[vllm-served-model-name\])
|
82 |
+
- -k, --openai_api_key : OpenAI API Key
|
83 |
+
- -u, --openai_url: URL to openai_styled_llm_server (requested by openai sdk)
|
84 |
+
|
85 |
+
#### advanced
|
86 |
+
- -j, --n_jobs : n jobs to be put to `asyncio.semaphore(n=)`
|
87 |
+
- -p, --evalprompt : [see the directory](./varco_arena/prompts/*.yaml)
|
88 |
+
- -lr, --limit_requests : vLLM OpenAI server request limit (default: 7,680)
|
89 |
+
- -lt, --limit_tokens : vLLM OpenAI server token limit (default: 15,728,640)
|
90 |
+
|
91 |
+
#### Input Data Format
|
92 |
+
[input jsonl guides](./streamlit_app_local/guide_mds/input_jsonls_en.md)
|
93 |
+
|
94 |
+
|
95 |
+
## Contributing & Customizing
|
96 |
+
#### Do this after git clone and installation
|
97 |
+
```bash
|
98 |
+
pip install pre-commit
|
99 |
+
pre-commit install
|
100 |
+
```
|
101 |
+
#### before commit
|
102 |
+
```bash
|
103 |
+
bash precommit.sh # black formatter will reformat the codes
|
104 |
+
```
|
105 |
+
|
106 |
+
## FAQ
|
107 |
+
* I want to apply my custom judge prompt to run Varco Arena
|
108 |
+
* [`./varco_arena/prompts/`](./varco_arena/prompts/__init__.py) defines the prompts with `yaml` file and the class objects for those. Edit those as your need.
|
109 |
+
* I want tailored judge prompts for each line of the test set row (i.e. ~100th row - `prompt1`, 101st~ - `prompt2`)
|
110 |
+
* You could see `load_prompt` at the above link receives `promptname` + `task` as a parameters to load the prompt. The function is called at [`./varco_arena/manager.py:async_run`](./varco_arena/manager.py).
|
111 |
+
* I want more fields for my llm outputs jsonl files for tailored use, i.e. want more fields beyond `instruction`, `source`, `generated`.
|
112 |
+
* It's going to get tricky but let me briefly guide you about this.
|
113 |
+
* You might have to edit `varco_arena/eval_utils.py`:`async_eval_w_prompt` (this part calls `PROMPT_OBJ.complete_prompt()`)
|
114 |
+
* And all the related codes will require revision.
|
115 |
+
|
116 |
+
## Special Thanks to (contributors)
|
117 |
+
- Minho Lee (@Dialogue Model Team, NCSOFT) [github](https://github.com/minolee/)
|
118 |
+
- query wrapper
|
119 |
+
- rag prompt
|
120 |
+
- Jumin Oh (@Generation Model Team, NCSOFT)
|
121 |
+
- overall prototyping of the system in haste
|
122 |
+
|
123 |
+
|
124 |
+
## Citation
|
125 |
+
If you found our work helpful, consider citing our paper!
|
126 |
+
```
|
127 |
+
@misc{son2024varcoarenatournamentapproach,
|
128 |
+
title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
|
129 |
+
author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
|
130 |
+
year={2024},
|
131 |
+
eprint={2411.01281},
|
132 |
+
archivePrefix={arXiv},
|
133 |
+
primaryClass={cs.CL},
|
134 |
+
url={https://arxiv.org/abs/2411.01281},
|
135 |
+
}
|
136 |
+
```
|
README_kr.md
ADDED
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Varco Arena
|
2 |
+
๋ฐ๋ฅด์ฝ ์๋ ๋๋ ํ
์คํธ์
๋ช
๋ น์ด๋ณ๋ก ๋น๊ตํ ๋ชจ๋ธ๋ค์ ํ ๋๋จผํธ๋ฅผ ์ํํ์ฌ ์ ํํ๊ฒ ๋ชจ๋ธ๋ค์ ์์๋ฅผ ๋งค๊น๋๋ค. ์ด๊ฒ์ reference ์์ํ๊ณผ ๋น๊ตํ์ฌ ์น๋ฅ ์ ๋งค๊ธฐ๋ ๋ฐฉ๋ฒ๋ณด๋ค ์ ํํ๋ฉฐ ์กฐ๊ธ ๋ ์ ๋ ดํฉ๋๋ค.
|
3 |
+
|
4 |
+
๋ ์์ธํ ๋ด์ฉ์ ๋ํด์๋ ์๋์ ๋งํฌ๋ฅผ ์ฐธ์กฐํ์๋ฉด ๋ฉ๋๋ค.
|
5 |
+
* [๋
ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
|
6 |
+
* [์์จ์ํํธ ํ
ํฌ๋ธ๋ก๊ทธ (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
7 |
+
|
8 |
+
|
9 |
+
## Quickstart
|
10 |
+
### ๋ก์ปฌ์์ ์คํธ๋ฆผ๋ฆฟ ์ฑ์ผ๋ก ์์ํ๊ธฐ (์ถ์ฒ!)
|
11 |
+
```bash
|
12 |
+
git clone [THIS_REPO]
|
13 |
+
# install requirements below. we recommend miniforge to manage environment
|
14 |
+
cd streamlit_app_local
|
15 |
+
bash run.sh
|
16 |
+
```
|
17 |
+
๋ ์์ธํ ๋ด์ฉ์ `[THIS_REPO]/streamlit_app_local/README.md` ์ ์ฐธ์กฐํ์ธ์!
|
18 |
+
|
19 |
+
### CLI ์ฌ์ฉ
|
20 |
+
* cli์ ์น ์ฑ์ ์๋ก ๊ฐ์ ์ฝ๋๋ฅผ ํ์ฉํ๋ฉฐ, ์๋์ ๋๋ ํ ๋ฆฌ์ ์์ต๋๋ค.
|
21 |
+
* `varco_arena/`
|
22 |
+
* vscode ์์์ ๋๋ฒ๊น
์ ์ํ ํ๋ฆฌ์
ํ๋กฌํํธ๋ณ ํ
์คํธ ๋ช
๋ น์ด๋ ๋ค์ ํ์ผ์ ์ ํ์์ต๋๋ค.
|
23 |
+
* `varco_arena/.vscode/launch.json`
|
24 |
+
```bash
|
25 |
+
## gpt-4o-mini as a judge
|
26 |
+
python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
|
27 |
+
## vllm-openai served LLM as a judge
|
28 |
+
python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
|
29 |
+
|
30 |
+
# dbg lines
|
31 |
+
## openai api judge dbg
|
32 |
+
python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
|
33 |
+
## other testing lines
|
34 |
+
python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
|
35 |
+
## dummy judge dbg (checking errors without api requests)
|
36 |
+
python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
|
37 |
+
```
|
38 |
+
|
39 |
+
## Requirements
|
40 |
+
`python = 3.11.9` ์์์ ํ
์คํธ ํจ. `requirements.txt`
|
41 |
+
```
|
42 |
+
openai>=1.17.0
|
43 |
+
munch
|
44 |
+
pandas
|
45 |
+
numpy
|
46 |
+
tqdm>=4.48.0
|
47 |
+
plotly
|
48 |
+
scikit-learn
|
49 |
+
kaleido
|
50 |
+
tiktoken>=0.7.0
|
51 |
+
pyyaml
|
52 |
+
transformers
|
53 |
+
streamlit>=1.40.2
|
54 |
+
openpyxl
|
55 |
+
git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
|
56 |
+
|
57 |
+
# Linux์ธ ๊ฒฝ์ฐ
|
58 |
+
uvloop
|
59 |
+
# Windows์ธ ๊ฒฝ์ฐ
|
60 |
+
winloop
|
61 |
+
```
|
62 |
+
|
63 |
+
#### Argument
|
64 |
+
- -i, --input : ์
๋ ฅ ํ์ผ or ๋๋ ํ ๋ฆฌ or ํ์ผ๋ช
์ ๋ํ ์ ๊ท ํํ์
|
65 |
+
- -o, --output_dir : ์ถ๋ ฅ ํ์ผ์ด ์ ์ฅ๋๋ ๋๋ ํ ๋ฆฌ
|
66 |
+
- -e, --evaluation : ํ๊ฐ ๋ชจ๋ธ (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", vllm์์ ๋์ด ๋ชจ๋ธ ๋ช
๋ฑ)
|
67 |
+
- -m, --matching_method: ๋งค์น ๋ฐฉ์ (๊ธฐ๋ณธ๊ฐ "tournament", "league" (๋น์ถ์ฒ) )
|
68 |
+
- -k, --openai_api_key : OpenAI API Key
|
69 |
+
- -u, --openai_url: ๋ก์ปฌ vLLM OpenAI ์๋ฒ ์ฌ์ฉ ์ URL(ip์ฃผ์+ํฌํธ)
|
70 |
+
|
71 |
+
#### advanced
|
72 |
+
- -j, --n_jobs : asyncio.semaphore()์ ์ ๋ฌ๋ ์ธ์. Arena๊ฐ ์งํ๋์ง ์๋๋ค๋ฉด ๊ธฐ๋ณธ๊ฐ์ธ 32 ์ดํ๋ก ๋ด๋ ค๋ณด์
|
73 |
+
- -p, --evalprompt : [ํด๋น ๋๋ ํ ๋ฆฌ ์ฐธ์กฐ](./varco_arena/prompts/*.yaml)
|
74 |
+
- -lr, --limit_requests : vLLM OpenAI ์๋ฒ ์์ฒญ ์ ํ (default: 7,680)
|
75 |
+
- -lt, --limit_tokens : vLLM OpenAI ์๋ฒ ํ ํฐ ์ ํ (default: 15,728,640)
|
76 |
+
|
77 |
+
#### Input Data Format
|
78 |
+
[input jsonl ๊ฐ์ด๋ ๋งํฌ](./streamlit_app_local/guide_mds/input_jsonls_kr.md)
|
79 |
+
|
80 |
+
|
81 |
+
|
82 |
+
## Contributing & Customizing
|
83 |
+
#### git clone ๋ฐ dependency ์ค์น ํ์ ํ ์ผ
|
84 |
+
```bash
|
85 |
+
pip install pre-commit
|
86 |
+
pre-commit install
|
87 |
+
```
|
88 |
+
#### commit ํ๊ธฐ ์ ์ ํ ์ผ
|
89 |
+
```bash
|
90 |
+
bash precommit.sh # ์ด๊ฒ ์ฝ๋๋ค์ ๋ค ๋ฆฌํฌ๋งทํด์ค๊ฑฐ์
|
91 |
+
```
|
92 |
+
|
93 |
+
|
94 |
+
๋ฌธ์: ์์ ์ผ
|
95 |
+
* ๋ด๊ฐ ๋ง๋ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ๊ณ ์ถ์ด์
|
96 |
+
* [`./varco_arena/prompts/`](./varco_arena_core/prompts/__init__.py) ์์ ๊ฐ์ข
ํ๋กฌํํธ ํด๋์ค ๋ฐ `yaml` ํ์ผ ํํ๋ก ์ ์๋ ํ๋กฌํํธ๋ฅผ ๋ก๋ํฉ๋๋ค. ํ๋ฆฌ์
์ ์ฐธ์กฐํ์ฌ ์์ฑํ์๋ฉด ๋ฉ๋๋ค.
|
97 |
+
* ํ
์คํธ์
๋ณ๋ก ๋ค๋ฅธ ํ๊ฐ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ๊ณ ์ถ์ด์ (e.g. ์์
์ ๋ฐ๋ผ ๋ค๋ฅธ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ๊ณ ์ถ์ด์)
|
98 |
+
* ์ ๊ฑธ์ด๋๋ฆฐ ๋งํฌ์ `load_prompt` ๋ฅผ ํตํด์ `promptname` + `task` ํํ๋ก [`./varco_arena_core/manager.py:async_run`](./varco_arena_core/manager.py) ํ๋กฌํํธ๊ฐ ๋ก๋๋๋๋ก ํด๋์์ต๋๋ค.
|
99 |
+
* ์ ๊ฐ ์ฌ์ฉํ๊ณ ์ถ์ ์
๋ ฅํ์ผ์ `instruction`, `source`, `generated` ์ด์ธ์ ๋ค๋ฅธ ํ๋๋ฅผ ์ถ๊ฐํด์ ์ฌ์ฉํ๊ณ ์ถ์ด์.
|
100 |
+
* ์กฐ๊ธ ๋ณต์กํด์ง๋๋ฐ ๋ค์ ๋ถ๋ถ์ ๊ณ ์ณ์ฃผ์ธ์
|
101 |
+
* `varco_arena/eval_utils.py` ์์ `async_eval_w_prompt` ๋ถ๋ถ์ ์๋ด์ผํ ์ ์์ต๋๋ค (์ฌ๊ธฐ์์ PROMPT_OBJ.complete_prompt()์ ํธ์ถํจ)
|
102 |
+
* ๊ทธ ์ธ ์ฐ๊ด๋ ๋ถ๋ถ์ ํ๊ณ ํ๊ณ ๊ณ ์ณ์ฃผ์
์ผ...
|
103 |
+
|
104 |
+
## Special Thanks to (contributors)
|
105 |
+
- ์ด๋ฏผํธ (@๋ํ๋ชจ๋ธํ, NCSOFT) [github](https://github.com/minolee/)
|
106 |
+
- query wrapper
|
107 |
+
- rag prompt
|
108 |
+
- ์ค์ฃผ๋ฏผ (@์์ฑ๋ชจ๋ธํ, NCSOFT)
|
109 |
+
- overall prototyping of the system in haste
|
110 |
+
|
111 |
+
## Citation
|
112 |
+
์ ํฌ ์์
๋ฌผ์ด ๋์์ด ๋์๋ค๋ฉด ์ ํฌ๋ ๋์์ ๋ฐ์๋ณผ ์ ์์๊น์?๐
|
113 |
+
```
|
114 |
+
@misc{son2024varcoarenatournamentapproach,
|
115 |
+
title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
|
116 |
+
author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
|
117 |
+
year={2024},
|
118 |
+
eprint={2411.01281},
|
119 |
+
archivePrefix={arXiv},
|
120 |
+
primaryClass={cs.CL},
|
121 |
+
url={https://arxiv.org/abs/2411.01281},
|
122 |
+
}
|
123 |
+
```
|
__pycache__/analysis_utils.cpython-311.pyc
ADDED
Binary file (17.7 kB). View file
|
|
__pycache__/app.cpython-311.pyc
ADDED
Binary file (22.3 kB). View file
|
|
__pycache__/query_comp.cpython-311.pyc
ADDED
Binary file (7.99 kB). View file
|
|
__pycache__/view_utils.cpython-311.pyc
ADDED
Binary file (18.3 kB). View file
|
|
analysis_utils.py
ADDED
@@ -0,0 +1,338 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from copy import deepcopy
|
2 |
+
from functools import partial
|
3 |
+
from typing import *
|
4 |
+
|
5 |
+
import pandas as pd
|
6 |
+
from fire import Fire
|
7 |
+
|
8 |
+
"""
|
9 |
+
This code assumes dealing with only one instruction
|
10 |
+
"""
|
11 |
+
|
12 |
+
|
13 |
+
# from varco_arena.tournament
|
14 |
+
def log2_power_of_two(n):
|
15 |
+
# First, let's make sure n is indeed a power of 2
|
16 |
+
if n & (n - 1) != 0 or n == 0:
|
17 |
+
raise ValueError("n must be a positive power of 2")
|
18 |
+
|
19 |
+
exponent = 0
|
20 |
+
while n > 1:
|
21 |
+
n >>= 1 # Right shift is like dividing by 2, but faster
|
22 |
+
exponent += 1
|
23 |
+
return exponent
|
24 |
+
|
25 |
+
|
26 |
+
def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
|
27 |
+
finals = df[df["round"] == "final"]
|
28 |
+
if len(finals) == 1:
|
29 |
+
first = finals.iloc[0].winner_resolved
|
30 |
+
else:
|
31 |
+
first = (
|
32 |
+
None # error case (no finals match or multiple finals (buggy result file))
|
33 |
+
)
|
34 |
+
|
35 |
+
return first
|
36 |
+
|
37 |
+
|
38 |
+
def get_unique_participants(df: pd.DataFrame) -> list:
|
39 |
+
participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
|
40 |
+
participants = [p for p in participants if p] # remove None
|
41 |
+
participants = sorted(participants) # make it sorted
|
42 |
+
return participants
|
43 |
+
|
44 |
+
|
45 |
+
def _impute_byes(df):
|
46 |
+
max_depth = df.depth.max()
|
47 |
+
|
48 |
+
# init
|
49 |
+
imputed_parts = dict()
|
50 |
+
for depth in range(max_depth + 1):
|
51 |
+
imputed_parts[depth] = df[df.depth == depth].copy()
|
52 |
+
|
53 |
+
# reverse
|
54 |
+
for depth in range(max_depth, 0, -1): # always we have 1 proper match for depth=0
|
55 |
+
null_v_null = {
|
56 |
+
"model_a": "",
|
57 |
+
"model_b": "",
|
58 |
+
"winner": "model_a",
|
59 |
+
"match_order_in_round": "-",
|
60 |
+
"depth": depth,
|
61 |
+
}
|
62 |
+
|
63 |
+
# fill some_model vs null byes
|
64 |
+
players = get_unique_participants(imputed_parts[depth])
|
65 |
+
proceeded = get_unique_participants(imputed_parts[depth - 1])
|
66 |
+
imputed = []
|
67 |
+
for p in proceeded:
|
68 |
+
if p not in players:
|
69 |
+
p_v_null = deepcopy(null_v_null)
|
70 |
+
p_v_null["model_a"] = p
|
71 |
+
imputed.append(p_v_null)
|
72 |
+
imputed_parts[depth] = pd.concat(
|
73 |
+
[
|
74 |
+
imputed_parts[depth],
|
75 |
+
pd.DataFrame(imputed),
|
76 |
+
],
|
77 |
+
axis="index",
|
78 |
+
)
|
79 |
+
|
80 |
+
# fill null vs null
|
81 |
+
n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
|
82 |
+
if n_null_v_null > 0:
|
83 |
+
imputed = pd.DataFrame([null_v_null] * n_null_v_null)
|
84 |
+
imputed_parts[depth] = pd.concat(
|
85 |
+
[
|
86 |
+
imputed_parts[depth],
|
87 |
+
imputed,
|
88 |
+
],
|
89 |
+
axis="index",
|
90 |
+
)
|
91 |
+
|
92 |
+
df_imputed = pd.concat(imputed_parts.values(), axis="index")
|
93 |
+
df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
|
94 |
+
|
95 |
+
return df_imputed
|
96 |
+
|
97 |
+
|
98 |
+
def index_test_scenario(df) -> pd.DataFrame:
|
99 |
+
df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
|
100 |
+
df["idx_inst_src"] = df.apply(
|
101 |
+
lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
|
102 |
+
)
|
103 |
+
|
104 |
+
# later used for tournament bracket backtrackiung
|
105 |
+
if "depth" not in df.columns:
|
106 |
+
mappings = {
|
107 |
+
"final": 0,
|
108 |
+
"semi-final": 1,
|
109 |
+
"quarter-final": 2,
|
110 |
+
}
|
111 |
+
|
112 |
+
def _convert_round_to_depth(rnd: str, mappings=None) -> int:
|
113 |
+
if rnd is None:
|
114 |
+
depth = None
|
115 |
+
elif rnd in mappings.keys():
|
116 |
+
depth = mappings[rnd]
|
117 |
+
elif rnd.startswith("round-"): # assume perfect power of two
|
118 |
+
num = int(rnd.replace("round-", "").strip())
|
119 |
+
depth = log2_power_of_two(num) - 1
|
120 |
+
return depth
|
121 |
+
|
122 |
+
conv = partial(_convert_round_to_depth, mappings=mappings)
|
123 |
+
df["depth"] = df["round"].apply(conv)
|
124 |
+
|
125 |
+
return df
|
126 |
+
|
127 |
+
|
128 |
+
def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
|
129 |
+
df = df.sort_values(by="depth").reset_index(drop=True)
|
130 |
+
# make winner interpretable (A -> model_a, B -> model_b)
|
131 |
+
df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
|
132 |
+
|
133 |
+
# define alpha2names if not given (covers upto 168 participants)
|
134 |
+
if alpha2names is None:
|
135 |
+
alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
|
136 |
+
abcdefghijklmnopqrstuvwxyz\
|
137 |
+
โโโโโโโโโโโโโโโโโ โกโขโฃโคโฅโฆโงโจโฉ\
|
138 |
+
ใ ใกใขใฃใคใฅใฆใงใจใฉใชใซใฌใญ\
|
139 |
+
ใฎใฏใฐใฑใฒใณใดใตใถใทใธในใบใป\
|
140 |
+
ใฑใดใทในใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\
|
141 |
+
ฮฮฮฮฮฮฮฮฮฮฮฮฮฮฮฮ ฮกฮฃฮคฮฅฮฆฮงฮจฮฉ\
|
142 |
+
ฮฑฮฒฮณฮดฮตฮถฮทฮธฮนฮบฮปฮผฮฝฮพฮฟฯฯฯฯฯ
ฯฯฯฯ"
|
143 |
+
model_full_names = get_unique_participants(df)
|
144 |
+
alpha2names = dict(zip(alphabets, model_full_names))
|
145 |
+
if len(alpha2names) < len(model_full_names):
|
146 |
+
raise ValueError(
|
147 |
+
f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
|
148 |
+
)
|
149 |
+
names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
|
150 |
+
df = _impute_byes(df)
|
151 |
+
|
152 |
+
# preserve readables for later
|
153 |
+
df = _make_readables(df, names2alpha)
|
154 |
+
|
155 |
+
if len(df[df["round"] == "final"]) != 1:
|
156 |
+
raise ValueError(f"final match need to be one and only.")
|
157 |
+
|
158 |
+
return df, alpha2names
|
159 |
+
|
160 |
+
|
161 |
+
def _make_readables(df, names2alpha):
|
162 |
+
df["human_readable_model_a"] = df.model_a.copy()
|
163 |
+
df["human_readable_model_b"] = df.model_b.copy()
|
164 |
+
|
165 |
+
df.model_a = df.model_a.apply(
|
166 |
+
lambda modelname: names2alpha[modelname] if modelname else "x"
|
167 |
+
)
|
168 |
+
df.model_b = df.model_b.apply(
|
169 |
+
lambda modelname: names2alpha[modelname] if modelname else "x"
|
170 |
+
)
|
171 |
+
|
172 |
+
df["human_readable_idx"] = df.apply(
|
173 |
+
lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
|
174 |
+
axis=1,
|
175 |
+
)
|
176 |
+
|
177 |
+
df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
|
178 |
+
df["winner_nodes"] = df.apply(
|
179 |
+
lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
|
180 |
+
) # later for figure representation of winner as a "node"
|
181 |
+
return df
|
182 |
+
|
183 |
+
|
184 |
+
# draw
|
185 |
+
def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
|
186 |
+
def _draw_round(
|
187 |
+
df: pd.DataFrame,
|
188 |
+
depth: int = None,
|
189 |
+
winners_in_order: list = None,
|
190 |
+
) -> Tuple:
|
191 |
+
df_now = df[df.depth == depth]
|
192 |
+
max_depth = df.depth.max()
|
193 |
+
|
194 |
+
width = 2 ** ((max_depth - depth) + 2)
|
195 |
+
|
196 |
+
connect_left = "โ" * (width)
|
197 |
+
connect_left = connect_left[4:]
|
198 |
+
connect_right = " " * (width)
|
199 |
+
connect_right = "โ" + connect_right[1:]
|
200 |
+
|
201 |
+
if winners_in_order is None:
|
202 |
+
assert (
|
203 |
+
depth == 0
|
204 |
+
), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
|
205 |
+
winners_in_order = df_now.winner_nodes
|
206 |
+
|
207 |
+
round_drawing_parts = []
|
208 |
+
descending_round_winners = []
|
209 |
+
for node in winners_in_order:
|
210 |
+
round_drawing_parts.append("".join([node, connect_left, connect_right]))
|
211 |
+
# next round winners in sync with winner order
|
212 |
+
row_now = df_now.query(f"winner_nodes=='{node}'")
|
213 |
+
descending_round_winners.append(row_now.model_a.item())
|
214 |
+
descending_round_winners.append(row_now.model_b.item())
|
215 |
+
|
216 |
+
# find descending_round_winners within winner_nodes format (num:alpha)
|
217 |
+
if depth == max_depth:
|
218 |
+
pass # keep the descending_round_winners intact
|
219 |
+
else:
|
220 |
+
df_descend = df[df.depth == depth + 1]
|
221 |
+
for i, winner_alpha in enumerate(descending_round_winners):
|
222 |
+
node_intr = df_descend.query(
|
223 |
+
f"winner_resolved=='{winner_alpha}'"
|
224 |
+
).winner_nodes.item()
|
225 |
+
descending_round_winners[i] = node_intr
|
226 |
+
|
227 |
+
round_drawing = "".join(round_drawing_parts)
|
228 |
+
descending_unit = " " * width
|
229 |
+
descending_unit = "โ" + descending_unit[1:]
|
230 |
+
descending_lines_parts = [descending_unit] * len(df_now) * 2
|
231 |
+
descending_lines = "".join(descending_lines_parts)
|
232 |
+
|
233 |
+
return round_drawing, descending_lines, descending_round_winners
|
234 |
+
|
235 |
+
drawings = []
|
236 |
+
winners_in_order = None
|
237 |
+
max_depth = df.depth.max()
|
238 |
+
for depth in range(max_depth + 1):
|
239 |
+
max_depth = df.depth.max()
|
240 |
+
winner_drw, lines_desc, winners_in_order = _draw_round(
|
241 |
+
df,
|
242 |
+
depth=depth,
|
243 |
+
winners_in_order=winners_in_order,
|
244 |
+
)
|
245 |
+
drawings.append((winner_drw, lines_desc))
|
246 |
+
# prepare bracket top
|
247 |
+
champion_alphabet = drawings[0][0].split()[0].split(":")[0]
|
248 |
+
champion_readable = alpha2names[champion_alphabet]
|
249 |
+
bracket_top = [f"๐ฅwinner: {champion_readable}", "โ"]
|
250 |
+
# prepare mid
|
251 |
+
bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
|
252 |
+
|
253 |
+
# prepare bot
|
254 |
+
initial_participants = winners_in_order
|
255 |
+
bracket_bot = (" " * 3).join(initial_participants)
|
256 |
+
|
257 |
+
full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
|
258 |
+
|
259 |
+
return full_figure
|
260 |
+
|
261 |
+
|
262 |
+
def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
|
263 |
+
n_models = len(get_unique_participants(result_df))
|
264 |
+
size_testset = int(len(result_df) / (n_models - 1))
|
265 |
+
interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
|
266 |
+
return interpretation, n_models, size_testset
|
267 |
+
|
268 |
+
|
269 |
+
def make_legend_str(df, alpha2names) -> str:
|
270 |
+
first = get_1st(df, alpha2names)
|
271 |
+
alpha2names = {k: v.replace("๐ฅ ", "") for k, v in alpha2names.items()}
|
272 |
+
alpha_ordered = sorted(list(alpha2names.keys()))
|
273 |
+
# names_ordered = sorted(list(alpha2names.values()))
|
274 |
+
# name2alpha = {v: k for k, v in alpha2names.items()}
|
275 |
+
|
276 |
+
for k, v in alpha2names.items():
|
277 |
+
if v == alpha2names[first]:
|
278 |
+
alpha2names[k] = f"๐ฅ {v}"
|
279 |
+
res_str = f"\n\nlegend:"
|
280 |
+
# for name in names_ordered:
|
281 |
+
# alpha = name2alpha[name]
|
282 |
+
for alpha in alpha_ordered:
|
283 |
+
name_w_medal = alpha2names[alpha]
|
284 |
+
res_str += f"\n{alpha}\t{name_w_medal}"
|
285 |
+
return res_str
|
286 |
+
|
287 |
+
|
288 |
+
def main(
|
289 |
+
jslname: str = "result.json",
|
290 |
+
):
|
291 |
+
"""
|
292 |
+
ํ
์คํธ ์ฝ๋
|
293 |
+
"""
|
294 |
+
|
295 |
+
df = pd.read_json(jslname, orient="records")
|
296 |
+
df = df.drop(columns=["tstamp", "logs"])
|
297 |
+
df = index_test_scenario(df)
|
298 |
+
|
299 |
+
# ์ค๊ฐ์ visualization(df) ์ฌ๊ธฐ์ ์๋ต. ๋ง์ฝ ์ด๊ฑฐ ๋บด๊ณ ๋ค ๋ฐ๋ผํ๋๋ฐ ๋ฌธ์ ๊ฐ ์๊ธด๋ค๋ฉด viz ๋ฌธ์ ๋ค. ๊ทผ๋ฐ ์๊ทธ๋ด๊ฑฐ๊ฐ๊ธดํจ
|
300 |
+
selections = df.idx_inst_src.unique()
|
301 |
+
for i, sel in enumerate(selections):
|
302 |
+
try:
|
303 |
+
df_now = df[df.idx_inst_src == sel]
|
304 |
+
df_now_processed, _alpha2names = init_tournament_dataframe(
|
305 |
+
df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
|
306 |
+
)
|
307 |
+
if "alpha2names" not in dir():
|
308 |
+
alpha2names = _alpha2names
|
309 |
+
assert "alpha2names" in dir()
|
310 |
+
bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
|
311 |
+
legend = make_legend_str(df_now_processed, alpha2names)
|
312 |
+
print(bracket_drawing + legend)
|
313 |
+
print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
|
314 |
+
print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
|
315 |
+
|
316 |
+
for match_idx_human in df_now_processed.human_readable_idx:
|
317 |
+
match_idx = int(match_idx_human.split(": ")[0])
|
318 |
+
row = df_now_processed.loc[match_idx]
|
319 |
+
winner = row.winner
|
320 |
+
except Exception as e:
|
321 |
+
print(e, file=open(f"{i}_err.txt", "w"))
|
322 |
+
print("", file=open(f"{i}_err.txt", "a"))
|
323 |
+
print(sel, file=open(f"{i}_err.txt", "a"))
|
324 |
+
df_now_processed[
|
325 |
+
[
|
326 |
+
"depth",
|
327 |
+
"round",
|
328 |
+
"winner_nodes",
|
329 |
+
"winner_resolved",
|
330 |
+
"winner",
|
331 |
+
"model_a",
|
332 |
+
"model_b",
|
333 |
+
]
|
334 |
+
].to_json(f"{i}_err.jsonl", lines=True, orient="records")
|
335 |
+
|
336 |
+
|
337 |
+
if __name__ == "__main__":
|
338 |
+
Fire(main)
|
app.py
ADDED
@@ -0,0 +1,452 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# import shutil
|
2 |
+
import os
|
3 |
+
import select
|
4 |
+
import subprocess
|
5 |
+
import sys
|
6 |
+
import time
|
7 |
+
from datetime import datetime, timedelta, timezone
|
8 |
+
from pathlib import Path
|
9 |
+
from typing import *
|
10 |
+
|
11 |
+
import streamlit as st
|
12 |
+
|
13 |
+
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
14 |
+
|
15 |
+
from varco_arena.varco_arena_core.prompts import load_prompt
|
16 |
+
from view_utils import (
|
17 |
+
default_page_setting,
|
18 |
+
escape_markdown,
|
19 |
+
set_nav_bar,
|
20 |
+
show_linebreak_in_md,
|
21 |
+
)
|
22 |
+
|
23 |
+
VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
|
24 |
+
USR_SUB = VA_ROOT.parts[-1]
|
25 |
+
|
26 |
+
import shutil
|
27 |
+
|
28 |
+
import pandas as pd
|
29 |
+
|
30 |
+
import analysis_utils as au
|
31 |
+
from view_utils import visualization
|
32 |
+
|
33 |
+
|
34 |
+
class DataCache:
|
35 |
+
def __init__(self):
|
36 |
+
self.cache = {}
|
37 |
+
|
38 |
+
def store(self, key: str, data: dict):
|
39 |
+
self.cache[key] = data
|
40 |
+
|
41 |
+
def get(self, key: str) -> Optional[dict]:
|
42 |
+
return self.cache.get(key)
|
43 |
+
|
44 |
+
|
45 |
+
# Initialize the cache in session state if it doesn't exist
|
46 |
+
if "data_cache" not in st.session_state:
|
47 |
+
st.session_state.data_cache = DataCache()
|
48 |
+
|
49 |
+
|
50 |
+
def purge_user_sub_data(data_path_to_purge: Union[Path, str] = None):
|
51 |
+
if data_path_to_purge is None:
|
52 |
+
print("nothing to purge")
|
53 |
+
return
|
54 |
+
else:
|
55 |
+
shutil.rmtree(data_path_to_purge)
|
56 |
+
print(f"purged {str(data_path_to_purge)}")
|
57 |
+
return
|
58 |
+
|
59 |
+
|
60 |
+
@st.cache_data
|
61 |
+
def load_and_cache_data(result_file_path: Optional[str] = None) -> Tuple[Dict, Dict]:
|
62 |
+
"""
|
63 |
+
Load data from file, cache it in memory, then remove the file.
|
64 |
+
Returns cached data on subsequent calls.
|
65 |
+
|
66 |
+
Args:
|
67 |
+
result_file_path: Path to the result JSON file
|
68 |
+
|
69 |
+
Returns:
|
70 |
+
Tuple of (all_result_dict, df_dict)
|
71 |
+
"""
|
72 |
+
# Check if we already have cached data for this path
|
73 |
+
if result_file_path:
|
74 |
+
cache_key = str(Path(result_file_path))
|
75 |
+
cached_data = st.session_state.data_cache.get(cache_key)
|
76 |
+
if cached_data:
|
77 |
+
return cached_data["all_result_dict"], cached_data["df_dict"]
|
78 |
+
|
79 |
+
# Initialize empty dicts
|
80 |
+
all_result_dict = {}
|
81 |
+
df_dict = {}
|
82 |
+
|
83 |
+
if result_file_path is not None:
|
84 |
+
try:
|
85 |
+
result_file_path = Path(result_file_path)
|
86 |
+
|
87 |
+
# Read and process data
|
88 |
+
df = pd.read_json(result_file_path)
|
89 |
+
for col in ["tstamp", "logs"]:
|
90 |
+
if col in df.columns:
|
91 |
+
df.drop(columns=[col], inplace=True)
|
92 |
+
df = au.index_test_scenario(df)
|
93 |
+
|
94 |
+
fig_dict_per_task = {}
|
95 |
+
df_dict_per_task = {}
|
96 |
+
|
97 |
+
# Process overall data
|
98 |
+
fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
|
99 |
+
df_dict_per_task["Overall"] = df
|
100 |
+
|
101 |
+
# Process per-task data
|
102 |
+
for task in df["task"].unique():
|
103 |
+
df_task = df[df["task"] == task]
|
104 |
+
fig_dict_per_task[task] = visualization(df_task, is_overall=False)
|
105 |
+
df_dict_per_task[task] = df_task
|
106 |
+
|
107 |
+
# Create key from path components
|
108 |
+
prm_name = result_file_path.parts[-2]
|
109 |
+
exp_name = result_file_path.parts[-3]
|
110 |
+
key = f"{exp_name}/{prm_name}"
|
111 |
+
|
112 |
+
all_result_dict[key] = fig_dict_per_task
|
113 |
+
df_dict[key] = df_dict_per_task
|
114 |
+
|
115 |
+
# Store in cache before removing file
|
116 |
+
cache_data = {"all_result_dict": all_result_dict, "df_dict": df_dict}
|
117 |
+
st.session_state.data_cache.store(str(result_file_path), cache_data)
|
118 |
+
|
119 |
+
# Remove user experiment directory
|
120 |
+
purge_user_sub_data(data_path_to_purge=VA_ROOT)
|
121 |
+
|
122 |
+
except Exception as e:
|
123 |
+
st.error(f"Error processing data: {str(e)}")
|
124 |
+
return {}, {}
|
125 |
+
|
126 |
+
return all_result_dict, df_dict
|
127 |
+
|
128 |
+
|
129 |
+
def upload_files(uploaded_files) -> Path:
|
130 |
+
# prep directory for user submission
|
131 |
+
user_sub_root = VA_ROOT
|
132 |
+
if user_sub_root.exists():
|
133 |
+
if not user_sub_root.is_dir():
|
134 |
+
raise ValueError(
|
135 |
+
f"{user_sub_root} file exists and is not a directory. Consider renaming it."
|
136 |
+
)
|
137 |
+
else:
|
138 |
+
user_sub_root.mkdir(parents=True)
|
139 |
+
|
140 |
+
KST = timezone(timedelta(hours=9))
|
141 |
+
tstamp = datetime.now(KST)
|
142 |
+
tstr = tstamp.strftime("%m-%d_%H:%M:%S")
|
143 |
+
files_dir_str = "./" + str(user_sub_root / tstr)
|
144 |
+
files_dir = Path(files_dir_str)
|
145 |
+
files_dir.mkdir(parents=True, exist_ok=True)
|
146 |
+
uploaded_files = list(uploaded_files)
|
147 |
+
|
148 |
+
if not uploaded_files:
|
149 |
+
st.warning("โ No files to upload. Please drag/drop or browse files to upload.")
|
150 |
+
# purge_user_sub_data(data_path_to_purge=VA_ROOT)
|
151 |
+
elif len(uploaded_files) < 2:
|
152 |
+
st.error("โ You need at least 2 jsonlines files to properly run VA.")
|
153 |
+
purge_user_sub_data(data_path_to_purge=VA_ROOT)
|
154 |
+
else: # properly uploaded
|
155 |
+
for file in uploaded_files:
|
156 |
+
# Create a path for the file in the server directory
|
157 |
+
file_path = files_dir / file.name
|
158 |
+
|
159 |
+
# Save the file to the server directory
|
160 |
+
with open(file_path, "wb") as f:
|
161 |
+
f.write(file.getbuffer())
|
162 |
+
|
163 |
+
jslfiles = list(files_dir.glob("*.jsonl"))
|
164 |
+
st.success(f"โ
Successfully uploaded {len(jslfiles)} jsonl files.")
|
165 |
+
return files_dir.resolve()
|
166 |
+
|
167 |
+
|
168 |
+
def run_varco_arena(
|
169 |
+
price_estimation: bool = False,
|
170 |
+
# upload_dir: Union[str, Path] = None,
|
171 |
+
promptname: str = None,
|
172 |
+
exp_name: str = None,
|
173 |
+
api_key: Optional[str] = None,
|
174 |
+
evaluation_model: str = "gpt-4o-mini",
|
175 |
+
update_interval: float = 1.0,
|
176 |
+
):
|
177 |
+
# Use environment variable for API key
|
178 |
+
ptn = f"{str(st.session_state.upfiles_dir)}"
|
179 |
+
outdir = Path(ptn)
|
180 |
+
if exp_name:
|
181 |
+
outdir = outdir / exp_name
|
182 |
+
|
183 |
+
command = f"python varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
|
184 |
+
if price_estimation:
|
185 |
+
command = f"{command} -c"
|
186 |
+
else:
|
187 |
+
command = command.replace("python", "yes | python ")
|
188 |
+
print(command)
|
189 |
+
|
190 |
+
api_key = None # clear immediately
|
191 |
+
|
192 |
+
process = subprocess.Popen(
|
193 |
+
command,
|
194 |
+
stdout=subprocess.PIPE,
|
195 |
+
stderr=subprocess.STDOUT,
|
196 |
+
stdin=subprocess.PIPE,
|
197 |
+
text=True,
|
198 |
+
bufsize=1,
|
199 |
+
shell=True,
|
200 |
+
)
|
201 |
+
|
202 |
+
# Set stdout and stdin to non-blocking mode
|
203 |
+
os.set_blocking(process.stdout.fileno(), False)
|
204 |
+
|
205 |
+
last_update_time = time.time()
|
206 |
+
terminal_output = st.empty()
|
207 |
+
full_output = f"{command}\n"
|
208 |
+
while True:
|
209 |
+
# Check if we have output to read
|
210 |
+
if select.select([process.stdout], [], [], 0)[0]:
|
211 |
+
output = process.stdout.readline()
|
212 |
+
if output:
|
213 |
+
full_output += output
|
214 |
+
if price_estimation:
|
215 |
+
to_show = full_output
|
216 |
+
terminal_output.code(to_show, language="bash")
|
217 |
+
else:
|
218 |
+
current_time = time.time()
|
219 |
+
if current_time - last_update_time > update_interval:
|
220 |
+
lines = full_output.split("\n")
|
221 |
+
if len(lines) < 5:
|
222 |
+
to_show = full_output
|
223 |
+
else:
|
224 |
+
to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
|
225 |
+
terminal_output.code(to_show, language="bash")
|
226 |
+
last_update_time = current_time
|
227 |
+
print(output)
|
228 |
+
time.sleep(0.1)
|
229 |
+
# Check if the process has finished
|
230 |
+
if process.poll() is not None:
|
231 |
+
# Read any remaining output
|
232 |
+
remaining_output = process.stdout.read()
|
233 |
+
if remaining_output:
|
234 |
+
lines = remaining_output.split("\n")
|
235 |
+
if len(lines) > 10:
|
236 |
+
to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
|
237 |
+
else:
|
238 |
+
to_show += remaining_output
|
239 |
+
terminal_output.code(to_show, language="bash")
|
240 |
+
print(remaining_output)
|
241 |
+
break
|
242 |
+
|
243 |
+
return_code = process.poll()
|
244 |
+
return outdir, return_code
|
245 |
+
|
246 |
+
|
247 |
+
def main():
|
248 |
+
# init lang
|
249 |
+
st.session_state["korean"] = st.session_state.get("korean", False)
|
250 |
+
|
251 |
+
sidebar_placeholder = default_page_setting()
|
252 |
+
set_nav_bar(
|
253 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
|
254 |
+
)
|
255 |
+
|
256 |
+
st.title("โ๏ธ VARCO ARENA โ๏ธ")
|
257 |
+
if st.session_state.korean:
|
258 |
+
st.write(
|
259 |
+
"**VARCO Arena๋ ๊ฐ ๋ชจ๋ธ์ ์์ฑ๋ ๊ฒฐ๊ณผ๋ฅผ ๋น๊ต ํ๊ฐํ์ฌ ๋ชจ๋ธ์ ์ฑ๋ฅ ์์๋ฅผ ์ ๊ณตํ๋ ์์คํ
์
๋๋ค. ์ปค์คํ
ํ
์คํธ์
์ ์ ์ฉํ๊ฒ ํ์ฉํ ์ ์์ผ๋ฉฐ, ๋ชจ๋ฒ๋ต์์ ํ์๋ก ํ์ง ์์ต๋๋ค.**"
|
260 |
+
)
|
261 |
+
else:
|
262 |
+
st.write(
|
263 |
+
"**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
|
264 |
+
)
|
265 |
+
|
266 |
+
st.divider()
|
267 |
+
# Set up the file uploader
|
268 |
+
if st.session_state.korean:
|
269 |
+
st.markdown("### 1. ๋ชจ๋ธ ์ถ๋ ฅํ์ผ ์
๋ก๋")
|
270 |
+
else:
|
271 |
+
st.markdown("### 1. Upload LLM responses")
|
272 |
+
uploaded_files = st.file_uploader(
|
273 |
+
"Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
|
274 |
+
)
|
275 |
+
if st.session_state.korean:
|
276 |
+
st.info("์
๋ก๋ ํ์ ํ์ผ์ ์๋์ผ๋ก ์ญ์ ๋๋ฉฐ ์์ง๋๊ฑฐ๋ ์ฌ์ฉ๋์ง ์์ต๋๋ค.")
|
277 |
+
else:
|
278 |
+
st.info(
|
279 |
+
"Your uploads will be removed automatically, not being collected nor reused for any purpose."
|
280 |
+
)
|
281 |
+
# upload state
|
282 |
+
if "upfiles_dir" not in st.session_state:
|
283 |
+
st.session_state.upfiles_dir = None
|
284 |
+
if st.button("์
๋ก๋ํ๊ธฐ" if st.session_state.korean else "Upload Files"):
|
285 |
+
st.session_state.upfiles_dir = upload_files(uploaded_files)
|
286 |
+
if st.button("์
๋ก๋ํ ํ์ผ ์ง์ฐ๊ธฐ" if st.session_state.korean else "Purge my uploads"):
|
287 |
+
st.session_state.upfiles_dir = None
|
288 |
+
if VA_ROOT.is_dir():
|
289 |
+
shutil.rmtree(VA_ROOT)
|
290 |
+
st.success(
|
291 |
+
"โ
์
๋ก๋ํ ํ์ผ์ ์๋ฒ์์ ์ง์ ์ต๋๋ค"
|
292 |
+
if st.session_state.korean
|
293 |
+
else "โ
Removed your uploads from the server successfully"
|
294 |
+
)
|
295 |
+
else:
|
296 |
+
st.error(
|
297 |
+
"โ ์ง์ธ ํ์ผ์ด ์์ต๋๋ค"
|
298 |
+
if st.session_state.korean
|
299 |
+
else "โ You have nothing uploaded"
|
300 |
+
)
|
301 |
+
|
302 |
+
if st.session_state.korean:
|
303 |
+
with st.expander("โโ ๋ฌด์์ ์
๋ก๋ ํ๋์โโ"):
|
304 |
+
st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
|
305 |
+
else:
|
306 |
+
with st.expander("โโ What should I upload โโ"):
|
307 |
+
st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
|
308 |
+
|
309 |
+
# Form for cost estimation
|
310 |
+
with st.form("cost_estimation_form"):
|
311 |
+
if st.session_state.korean:
|
312 |
+
st.write("### 2. ๊ฐ๊ฒฉ ์ฐ์ ")
|
313 |
+
else:
|
314 |
+
st.write("### 2. Cost Estimation")
|
315 |
+
eval_model = st.selectbox(
|
316 |
+
"Select Judge",
|
317 |
+
open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
|
318 |
+
)
|
319 |
+
promptname = st.selectbox(
|
320 |
+
"Select Evalutaion Prompt",
|
321 |
+
open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
|
322 |
+
)
|
323 |
+
if st.session_state.korean:
|
324 |
+
st.markdown("*`llmbar`์ธ ๋ค๋ฅธ ํ๋กฌํํธ๋ ์ถฉ๋ถํ ๊ฒ์ฆ๋ ํ๋กฌํํธ๋ ์๋๋๋ค. (๋์์ ํจ)")
|
325 |
+
else:
|
326 |
+
st.markdown(
|
327 |
+
"*Eval prompts other than `llmbar` is working example, not the optimal ones."
|
328 |
+
)
|
329 |
+
if promptname == USR_SUB:
|
330 |
+
raise ValueError(
|
331 |
+
f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
|
332 |
+
)
|
333 |
+
|
334 |
+
estimate_button = st.form_submit_button("Calculate Cost!")
|
335 |
+
with st.expander(
|
336 |
+
"LLM Judge์ ํ์ฉ๋๋ ํ๋กฌํํธ (`Calculate Cost!` ํด๋ฆญ์ ๊ฐฑ์ )"
|
337 |
+
if st.session_state.korean
|
338 |
+
else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
|
339 |
+
):
|
340 |
+
prompt = load_prompt(promptname, task="-")
|
341 |
+
kwargs = dict(
|
342 |
+
inst="{inst}",
|
343 |
+
src="{src}",
|
344 |
+
out_a="{out_a}",
|
345 |
+
out_b="{out_b}",
|
346 |
+
task="-",
|
347 |
+
)
|
348 |
+
if promptname == "translation_pair":
|
349 |
+
kwargs["source_lang"] = "{source_lang}"
|
350 |
+
kwargs["target_lang"] = "{target_lang}"
|
351 |
+
prompt_cmpl = prompt.complete_prompt(**kwargs)
|
352 |
+
|
353 |
+
st.markdown(f"### Evaluation Prompt: {promptname}")
|
354 |
+
for msg in prompt_cmpl:
|
355 |
+
st.markdown(f"**{msg['role']}**")
|
356 |
+
st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
|
357 |
+
|
358 |
+
if estimate_button:
|
359 |
+
if st.session_state.get("upfiles_dir") is None:
|
360 |
+
st.error(
|
361 |
+
"โ Requirements: You have to upload jsonlines files first to proceed"
|
362 |
+
)
|
363 |
+
else:
|
364 |
+
st.markdown("##### Estimated Cost")
|
365 |
+
dummy_api_key = "dummy"
|
366 |
+
dummy_exp_name = "dummy"
|
367 |
+
result_file_path, return_code = run_varco_arena(
|
368 |
+
# upload_dir=st.session_state.upfiles_dir,
|
369 |
+
promptname=promptname,
|
370 |
+
api_key=dummy_api_key,
|
371 |
+
exp_name=dummy_exp_name,
|
372 |
+
price_estimation=True,
|
373 |
+
evaluation_model=eval_model,
|
374 |
+
)
|
375 |
+
if return_code:
|
376 |
+
st.error(
|
377 |
+
"โ RuntimeError: An error occurred during cost estimation. **Restart from file upload!**"
|
378 |
+
)
|
379 |
+
purge_user_sub_data(data_path_to_purge=VA_ROOT)
|
380 |
+
|
381 |
+
else:
|
382 |
+
st.success("โ
Cost estimation completed successfully")
|
383 |
+
st.session_state.cost_estimated = True
|
384 |
+
|
385 |
+
# Form for actual run
|
386 |
+
with st.form("run_arena_form"):
|
387 |
+
if st.session_state.korean:
|
388 |
+
st.write("### 3. Varco Arena ๊ตฌ๋ํ๊ธฐ")
|
389 |
+
else:
|
390 |
+
st.write("### 3. Run Varco Arena")
|
391 |
+
api_key = st.text_input("Enter your OpenAI API Key", type="password")
|
392 |
+
|
393 |
+
# demo exp name fixated
|
394 |
+
KST = timezone(timedelta(hours=9))
|
395 |
+
tstamp = datetime.now(KST)
|
396 |
+
tstr = tstamp.strftime("%m-%d_%H:%M:%S")
|
397 |
+
exp_name = f"{tstr}_KST_submit"
|
398 |
+
|
399 |
+
if st.session_state.korean:
|
400 |
+
st.write("**์ฃผ์**:`Ctrl+C` ๋ฒํผ์ ๊ตฌํ๋์ง ์์์ต๋๋ค. ๊ตฌ๋ ์ ์๊ณ ํด์ฃผ์ธ์.")
|
401 |
+
else:
|
402 |
+
st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
|
403 |
+
run_button = st.form_submit_button(
|
404 |
+
"๐ฅ Run Arena!",
|
405 |
+
disabled=(not st.session_state.get("cost_estimated", False))
|
406 |
+
or "result_file_path"
|
407 |
+
in st.session_state.keys(), # run already performed once
|
408 |
+
)
|
409 |
+
|
410 |
+
if run_button:
|
411 |
+
set_nav_bar(
|
412 |
+
True,
|
413 |
+
sidebar_placeholder=sidebar_placeholder,
|
414 |
+
toggle_hashstr="app_during_run",
|
415 |
+
)
|
416 |
+
if st.session_state.get("upfiles_dir") is None:
|
417 |
+
st.error(
|
418 |
+
"โ Requirements: You have to upload jsonlines files first to proceed"
|
419 |
+
)
|
420 |
+
elif not api_key:
|
421 |
+
st.error("โ Requirements: OpenAI key required to run VA.")
|
422 |
+
else:
|
423 |
+
result_file_path, return_code = run_varco_arena(
|
424 |
+
# upload_dir=st.session_state.upfiles_dir,
|
425 |
+
promptname=promptname,
|
426 |
+
api_key=api_key,
|
427 |
+
exp_name=exp_name,
|
428 |
+
price_estimation=False,
|
429 |
+
evaluation_model=eval_model,
|
430 |
+
)
|
431 |
+
if return_code:
|
432 |
+
st.error(
|
433 |
+
"โ RuntimeError: An error occurred during Varco Arena run. Check the file and **restart from file upload!**"
|
434 |
+
)
|
435 |
+
purge_user_sub_data(data_path_to_purge=VA_ROOT)
|
436 |
+
|
437 |
+
else:
|
438 |
+
st.success("โ
Varco Arena run completed successfully")
|
439 |
+
st.session_state.result_file_path = list(
|
440 |
+
result_file_path.glob("**/result.json")
|
441 |
+
)[-1]
|
442 |
+
set_nav_bar(
|
443 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
|
444 |
+
)
|
445 |
+
|
446 |
+
if st.session_state.get("result_file_path", None) is not None:
|
447 |
+
print(f"{st.session_state.get('result_file_path', None)=}")
|
448 |
+
load_and_cache_data(result_file_path=str(st.session_state.result_file_path))
|
449 |
+
|
450 |
+
|
451 |
+
if __name__ == "__main__":
|
452 |
+
main()
|
eval_models_list.txt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
gpt-4o-mini
|
2 |
+
gpt-4o-2024-05-13
|
3 |
+
gpt-4o-2024-08-06
|
eval_prompt_list.txt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
llmbar
|
2 |
+
translation_pair
|
3 |
+
rag_pair_kr
|
guide_mds/input_jsonls_en.md
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#### \[EN\] Upload guide (`jsonl`)
|
2 |
+
**Basic Requirements**
|
3 |
+
* Upload one `jsonl` file per model (e.g., five files to compare five LLMs)
|
4 |
+
* โ ๏ธ Important: All `jsonl` files must have the same number of rows
|
5 |
+
* โ ๏ธ Important: The `model_id` field must be unique within and across all files
|
6 |
+
|
7 |
+
**Required Fields**
|
8 |
+
* Per Model Fields
|
9 |
+
* `model_id`: Unique identifier for the model (recommendation: keep it short)
|
10 |
+
* `generated`: The LLM's response to the test instruction
|
11 |
+
|
12 |
+
* Required only for Translation (`translation_pair` prompt need those. See `streamlit_app_local/user_submit/mt/llama5.jsonl`)
|
13 |
+
* `source_lang`: input language (e.g. Korean, KR, kor, ...)
|
14 |
+
* `target_lang`: output language (e.g. English, EN, ...)
|
15 |
+
|
16 |
+
* Common Fields (Must be identical across all files)
|
17 |
+
* `instruction`: The input prompt or test instruction given to the model
|
18 |
+
* `task`: Category label used to group results (useful when using different evaluation prompts per task)
|
19 |
+
|
20 |
+
**Example Format**
|
21 |
+
```python
|
22 |
+
# model1.jsonl
|
23 |
+
{"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
|
24 |
+
{"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}
|
25 |
+
|
26 |
+
# model2.jsonl
|
27 |
+
{"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
|
28 |
+
{"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
|
29 |
+
...
|
30 |
+
..
|
31 |
+
.
|
32 |
+
```
|
33 |
+
**Use Case Example**
|
34 |
+
If you want to compare different prompting strategies for the same model:
|
35 |
+
* Use the same `instruction` across files (using unified test scenarios).
|
36 |
+
* `generated` responses of each prompting strategy will vary across the files.
|
37 |
+
* Use descriptive `model_id` values like "prompt1", "prompt2", etc.
|
guide_mds/input_jsonls_kr.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#### \[KR\] ์ง์ด๋ฃ์ jsonl ํ์ผ ๊ฐ์ด๋
|
2 |
+
๋น๊ตํ ๋ชจ๋ธ์ด ๋ค์ฏ ๊ฐ๋ผ๋ฉด ๋ค์ฏ ๊ฐ์ .jsonl ํ์ผ์ ์
๋ก๋ํ์ธ์.
|
3 |
+
* ๐ฅ๋ชจ๋ jsonl ์ ๊ฐ์ ์์ ํ์ ๊ฐ์ ธ์ผํฉ๋๋ค.
|
4 |
+
* ๐ฅ`model_id` ํ๋๋ ํ์ผ๋ง๋ค ๋ฌ๋ผ์ผํ๋ฉฐ ํ์ผ ๋ด์์๋ ์ ์ผํด์ผํฉ๋๋ค.
|
5 |
+
|
6 |
+
**jsonl ํ์ ํ๋**
|
7 |
+
* ๊ฐ๋ณ
|
8 |
+
* `model_id`: ํ๊ฐ๋ฐ๋ ๋ชจ๋ธ์ ์ด๋ฆ์
๋๋ค. (์งง๊ฒ ์ฐ๋ ๊ฒ ์ถ์ฒ)
|
9 |
+
* `generated`: ๋ชจ๋ธ์ด testset instruction ์ ์์ฑํ ์๋ต์ ๋ฃ์ผ์ธ์.
|
10 |
+
|
11 |
+
* ๋ฒ์ญํ๊ฐ ํ๋กฌํํธ ์ฌ์ฉ์ (`translation_pair`. `streamlit_app_local/user_submit/mt/llama5.jsonl` ์์ ์์ ๋ณผ ์ ์์)
|
12 |
+
* `source_lang`: input language (e.g. Korean, KR, kor, ...)
|
13 |
+
* `target_lang`: output language (e.g. English, EN, ...)
|
14 |
+
|
15 |
+
* ๊ณตํต ๋ถ๋ถ (**๋ชจ๋ ํ์ผ์ ๋ํด ๊ฐ์์ผ ํจ**)
|
16 |
+
* `instruction`: ๋ชจ๋ธ์ ์ง์ด๋ฃ๋ `testset instruction` ํน์ `input`์ ํด๋นํ๋ ๋ฌด์ธ๊ฐ์
๋๋ค.
|
17 |
+
* `task`: ์ ์ฒด ๊ฒฐ๊ณผ๋ฅผ subset์ผ๋ก ๊ทธ๋ฃน์ง์ด์ ๋ณด์ฌ์ค ๋ ์ฌ์ฉ๋ฉ๋๋ค. `evaluation prompt`๋ฅผ ํ๋ณ๋ก ๋ค๋ฅด๊ฒ ์ฌ์ฉํ๊ณ ์ถ์ ๋ ํ์ฉ๋ ์ ์์ต๋๋ค.
|
18 |
+
|
19 |
+
|
20 |
+
๊ฐ jsonl ํ์ผ์ ์๋์ฒ๋ผ ์๊ฒผ์ต๋๋ค.
|
21 |
+
```python
|
22 |
+
# model1.jsonl
|
23 |
+
{"model_id": "๋ชจ๋ธ1", "task": "๊ธธ ๋ฌป๊ธฐ", "instruction": "์ด๋๋ก ๊ฐ์ผํ์ค", "generated": "์ ๊ธฐ๋ก์"}
|
24 |
+
{"model_id": "๋ชจ๋ธ1", "task": "์ฐ์", "instruction": "1+1", "generated": "2"} # ๊ธธ ๋ฌป๊ธฐ์ ์ฐ์์ ๊ฒฝ์ฐ ๋ค๋ฅธ ํ๊ฐ ํ๋กฌํํธ๋ฅผ ์ฌ์ฉํ๊ณ ์ถ์ ์ ์๊ฒ ์ฃ ?
|
25 |
+
|
26 |
+
# model2.jsonl -* model1.jsonl๊ณผ `instruction`์ ๊ฐ๊ณ `generated`, `model_id` ๋ ๋ค๋ฆ
๋๋ค!
|
27 |
+
{"model_id": "๋ชจ๋ธ2", "task": "๊ธธ ๋ฌป๊ธฐ", "instruction": "์ด๋๋ก ๊ฐ์ผํ์ค", "generated": "ํ์ด"}
|
28 |
+
{"model_id": "๋ชจ๋ธ2", "task": "์ฐ์", "instruction": "1+1", "generated": "3"}
|
29 |
+
|
30 |
+
...
|
31 |
+
..
|
32 |
+
```
|
33 |
+
|
34 |
+
์๋ฅผ ๋ค์ด, ํ๊ฐ์ง ๋ชจ๋ธ์ ๋ํด ๋ค๋ฅธ ํ๋กฌํํ
์ ์๋ํ์ฌ ๋ค๋ฅธ ์์ฑ๋ฌธ์ ์ป์๊ณ ์ด๋ฅผ ๋น๊ตํ๊ณ ์ถ์ ๊ฒฝ์ฐ๋ฅผ ์๊ฐํด๋ด
์๋ค. ์ด ๋ ํ๊ฐ๋ฐ์ testset์ ๊ฐ์ผ๋ฏ๋ก `instruction`์ ๋ชจ๋ ๊ฐ๊ณ ํ๋กฌํํ
์ ๋ฐ๋ผ `generated`๋ ๋ฌ๋ผ์ง๊ฒ ์ฃ ? `model_id` ๋ `"prompt1"`, `"prompt2"` ๋ฑ ์ทจํฅ์ ๋ง๊ฒ ์ ์ด์ฃผ์๋ฉด ๋ฉ๋๋ค.
|
modules/__pycache__/nav.cpython-311.pyc
ADDED
Binary file (3.8 kB). View file
|
|
modules/nav.py
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
|
4 |
+
def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
|
5 |
+
with sidebar_placeholder.container():
|
6 |
+
st.toggle(
|
7 |
+
"๐ฐ๐ท ํ๊ตญ์ด๋ก ๋ณด๊ธฐ",
|
8 |
+
value=st.session_state.get("korean", False),
|
9 |
+
key=f"korean_toggle_{toggle_hashstr}", # Add this explicit key
|
10 |
+
on_change=lambda: setattr(
|
11 |
+
st.session_state,
|
12 |
+
"korean",
|
13 |
+
st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
|
14 |
+
),
|
15 |
+
)
|
16 |
+
|
17 |
+
st.page_link(
|
18 |
+
"app.py",
|
19 |
+
label="Varco Arena ๊ตฌ๋" if st.session_state.korean else "Run VARCO Arena",
|
20 |
+
icon="๐ฅ",
|
21 |
+
)
|
22 |
+
st.page_link(
|
23 |
+
"pages/see_results.py",
|
24 |
+
label="๊ฒฐ๊ณผ ๋ณด๊ธฐ" if st.session_state.korean else "See Results",
|
25 |
+
icon="๐",
|
26 |
+
disabled=st.session_state.get("is_running", False),
|
27 |
+
)
|
28 |
+
st.page_link(
|
29 |
+
"pages/brief_intro.py",
|
30 |
+
label="์ด๋ป๊ฒ ์๋ํ๋์?" if st.session_state.korean else "How it Works",
|
31 |
+
icon="โ",
|
32 |
+
disabled=st.session_state.get("is_running", False),
|
33 |
+
)
|
34 |
+
st.page_link(
|
35 |
+
"pages/quick_start_guide.py",
|
36 |
+
label="๋ณธ๊ฒฉ ์ฌ์ฉํ๊ธฐ ๊ฐ์ด๋" if st.session_state.korean else "Quick Start Guide",
|
37 |
+
icon="๐ฏ",
|
38 |
+
disabled=st.session_state.get("is_running", False),
|
39 |
+
)
|
40 |
+
related_links_en = """
|
41 |
+
**About**
|
42 |
+
* [Paper](https://huggingface.co/papers/2411.01281)
|
43 |
+
* [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
44 |
+
* [Inquiry](https://linkedin.com/in/deftson/)
|
45 |
+
""".strip()
|
46 |
+
related_links_kr = """
|
47 |
+
**About**
|
48 |
+
* [๋
ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
|
49 |
+
* [๋ธ๋ก๊ทธ](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
50 |
+
* [๋ฌธ์](https://linkedin.com/in/deftson/)
|
51 |
+
""".strip()
|
52 |
+
st.info(related_links_kr if st.session_state.korean else related_links_en)
|
53 |
+
|
54 |
+
st.divider()
|
55 |
+
demo_warning_kr = "โโ**๋ณธ ๋ฐ๋ชจ์์๋ ์๋ก๊ณ ์นจ ํ์ ํ๊ฐ ๊ฒฐ๊ณผ๊ฐ ๋ณด์กด๋์ง ์์ต๋๋ค**โโ ์ฑ์ ์จ์ ํ ํ์ฉํ์๋ ค๋ฉด ๊ฐ์ธ ๊ธฐ๊ธฐ์์ ํธ์คํ
ํ์ธ์ (**๐ฏ๋ณธ๊ฒฉ ์ฌ์ฉํ๊ธฐ ๊ฐ์ด๋** ์ฐธ์กฐ)"
|
56 |
+
demo_warning_en = "โโFor this demo, **evaluation results will not be preserved after refreshing**โโ To fully utilize the app, please host it on your personal device (refer to **๐ฏQuick Start Guide**)"
|
57 |
+
st.markdown(demo_warning_kr if st.session_state.korean else demo_warning_en)
|
pages/__pycache__/see_results.cpython-311.pyc
ADDED
Binary file (26.3 kB). View file
|
|
pages/brief_intro.py
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
from view_utils import default_page_setting, set_nav_bar
|
4 |
+
|
5 |
+
sidebar_placeholder = default_page_setting()
|
6 |
+
set_nav_bar(
|
7 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
|
8 |
+
)
|
9 |
+
|
10 |
+
st.image("va_en.png")
|
11 |
+
if st.session_state.korean:
|
12 |
+
st.info("์ฌ๋ฌ๋ถ์ด ์
๋ก๋ํ๋ ํ์ผ์ ์ ๊ทธ๋ฆผ์ LLM ์ถ๋ ฅ๋ค์
๋๋ค.")
|
13 |
+
with st.expander("โโ ๋ฌด์์ ์
๋ก๋ ํ๋์โโ"):
|
14 |
+
st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
|
15 |
+
else:
|
16 |
+
st.info(
|
17 |
+
"What you upload is the generated responses of LLMs regarding the test prompts."
|
18 |
+
)
|
19 |
+
with st.expander("โโ What should I upload โโ"):
|
20 |
+
st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
|
21 |
+
|
22 |
+
|
23 |
+
st.image("va_concept_new.png")
|
24 |
+
st.markdown(
|
25 |
+
"""
|
26 |
+
| |Current Practice|Varco Arena|
|
27 |
+
|-|-|-|
|
28 |
+
|Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
|
29 |
+
|No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
|
30 |
+
|Comparison type|reference-mediated (anchored) |direct|
|
31 |
+
"""
|
32 |
+
)
|
33 |
+
if st.session_state.korean:
|
34 |
+
st.info(
|
35 |
+
"Varco Arena๋ ์ ๋ขฐ์ฑ ์๋ ์์๋ฅผ ๋ ์ ์ ํ์์ ๋น๊ต ๋ด์ ์ป์ด๋ด๋ฉฐ, ์ด๋ฌํ ํน์ง์ LLM ์ง์ ๋น๊ต์ ์ด์ ์ผ๋ก๋ถํฐ ๊ธฐ์ธํฉ๋๋ค."
|
36 |
+
)
|
37 |
+
else:
|
38 |
+
st.info(
|
39 |
+
"Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
|
40 |
+
)
|
pages/quick_start_guide.py
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
from view_utils import default_page_setting, set_nav_bar
|
4 |
+
|
5 |
+
sidebar_placeholder = default_page_setting()
|
6 |
+
set_nav_bar(
|
7 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="quickstart_init"
|
8 |
+
)
|
9 |
+
|
10 |
+
|
11 |
+
if st.session_state.korean:
|
12 |
+
st.markdown(open("varco_arena/README_kr.md").read())
|
13 |
+
else:
|
14 |
+
st.markdown(open("varco_arena/README_en.md").read())
|
pages/see_results.py
ADDED
@@ -0,0 +1,365 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
import streamlit as st
|
3 |
+
|
4 |
+
import analysis_utils as au
|
5 |
+
from analysis_utils import number_breakdown_from_df
|
6 |
+
from app import load_and_cache_data
|
7 |
+
|
8 |
+
# from app import VA_ROOT
|
9 |
+
from query_comp import QueryWrapper, get_base_url
|
10 |
+
from varco_arena.varco_arena_core.prompts import load_prompt
|
11 |
+
from view_utils import (
|
12 |
+
default_page_setting,
|
13 |
+
escape_markdown,
|
14 |
+
set_nav_bar,
|
15 |
+
show_linebreak_in_md,
|
16 |
+
)
|
17 |
+
|
18 |
+
DEFAULT_LAYOUT_DICT = {
|
19 |
+
"title": {"font": {"size": 20, "family": "Gothic A1"}},
|
20 |
+
"font": {"size": 16, "family": "Gothic A1"},
|
21 |
+
"xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
|
22 |
+
"yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
|
23 |
+
"legend": {"font": {"size": 12, "family": "Gothic A1"}},
|
24 |
+
}
|
25 |
+
|
26 |
+
|
27 |
+
def navigate(t, source, key, val):
|
28 |
+
# print(key, val)
|
29 |
+
if source is None:
|
30 |
+
return
|
31 |
+
target_index = t.index(source) + val
|
32 |
+
if 0 <= target_index < len(t):
|
33 |
+
st.session_state[key] = t[target_index]
|
34 |
+
st.rerun()
|
35 |
+
|
36 |
+
|
37 |
+
def main():
|
38 |
+
sidebar_placeholder = default_page_setting(layout="wide")
|
39 |
+
set_nav_bar(
|
40 |
+
False,
|
41 |
+
sidebar_placeholder=sidebar_placeholder,
|
42 |
+
toggle_hashstr="see_results_init",
|
43 |
+
)
|
44 |
+
|
45 |
+
# load the data
|
46 |
+
# print(f"{st.session_state.get('result_file_path', None)=}")
|
47 |
+
most_recent_run = st.session_state.get("result_file_path", None)
|
48 |
+
most_recent_run = str(most_recent_run) if most_recent_run is not None else None
|
49 |
+
(
|
50 |
+
st.session_state["all_result_dict"],
|
51 |
+
st.session_state["df_dict"],
|
52 |
+
) = load_and_cache_data(result_file_path=most_recent_run)
|
53 |
+
|
54 |
+
# side bar
|
55 |
+
st.sidebar.title("Select Result:")
|
56 |
+
result_select = QueryWrapper("expname")(
|
57 |
+
st.sidebar.selectbox,
|
58 |
+
list(st.session_state["all_result_dict"].keys()),
|
59 |
+
)
|
60 |
+
|
61 |
+
if result_select is None:
|
62 |
+
st.stop()
|
63 |
+
eval_prompt_name = result_select.split("/")[-1].strip()
|
64 |
+
if st.sidebar.button("Clear Cache"):
|
65 |
+
st.cache_data.clear()
|
66 |
+
st.cache_resource.clear()
|
67 |
+
st.rerun()
|
68 |
+
|
69 |
+
if result_select:
|
70 |
+
if "alpha2names" in st.session_state:
|
71 |
+
del st.session_state["alpha2names"]
|
72 |
+
|
73 |
+
fig_dict_per_task = st.session_state["all_result_dict"][result_select]
|
74 |
+
task_list = list(fig_dict_per_task.keys())
|
75 |
+
elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
|
76 |
+
# tabs = st.tabs(task_list)
|
77 |
+
|
78 |
+
df_dict_per_task = st.session_state["df_dict"][result_select]
|
79 |
+
|
80 |
+
default_layout_dict = DEFAULT_LAYOUT_DICT
|
81 |
+
task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
|
82 |
+
|
83 |
+
if task is None:
|
84 |
+
st.stop()
|
85 |
+
figure_dict = fig_dict_per_task[task]
|
86 |
+
judgename = figure_dict["judgename"]
|
87 |
+
df = df_dict_per_task[task]
|
88 |
+
interpretation, n_models, size_testset = number_breakdown_from_df(df)
|
89 |
+
if st.session_state.korean:
|
90 |
+
st.markdown(f"## ๊ฒฐ๊ณผ ({task})")
|
91 |
+
st.markdown(f"##### Judge ๋ชจ๋ธ: {judgename} / ํ๊ฐํ๋กฌ: {eval_prompt_name}")
|
92 |
+
st.markdown(f"##### ํ
์คํธ์
์ฌ์ด์ฆ: {int(size_testset)} ํ")
|
93 |
+
else:
|
94 |
+
st.markdown(f"## Results ({task})")
|
95 |
+
st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
|
96 |
+
st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
|
97 |
+
col1, col2 = st.columns(2)
|
98 |
+
with col1:
|
99 |
+
with st.container(border=True):
|
100 |
+
st.markdown(f"#### Ratings ({task})")
|
101 |
+
st.table(figure_dict["elo_rating"])
|
102 |
+
st.write(show_linebreak_in_md(escape_markdown(interpretation)))
|
103 |
+
|
104 |
+
with col2:
|
105 |
+
with st.container(border=True):
|
106 |
+
st.plotly_chart(
|
107 |
+
elo_rating_by_task.update_layout(**default_layout_dict),
|
108 |
+
use_container_width=True,
|
109 |
+
key=f"{task}_elo_rating_by_task",
|
110 |
+
)
|
111 |
+
|
112 |
+
st.divider()
|
113 |
+
|
114 |
+
if st.session_state.korean:
|
115 |
+
st.markdown("### ํ ๋๋จผํธ (ํ
์คํธ ์๋๋ฆฌ์ค) ๋ณ๋ก ๋ณด๊ธฐ")
|
116 |
+
else:
|
117 |
+
st.markdown("### Tournament Results by Test Scenario")
|
118 |
+
|
119 |
+
# with st.expander("๋ณผ ํ ๋๋จผํธ ๊ณ ๋ฅด๊ธฐ"):
|
120 |
+
d = list(df.idx_inst_src.unique())
|
121 |
+
default_idx = st.session_state.get("selected_tournament", None)
|
122 |
+
cols = st.columns((1, 18, 1))
|
123 |
+
with cols[0]:
|
124 |
+
if st.button("โ", key="prev_tournament"):
|
125 |
+
navigate(d, default_idx, "selected_tournament", -1)
|
126 |
+
with cols[1]:
|
127 |
+
tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
|
128 |
+
st.selectbox,
|
129 |
+
d,
|
130 |
+
default_idx,
|
131 |
+
key=f"{task}_tournament_select",
|
132 |
+
on_change=lambda: st.session_state.update(
|
133 |
+
selected_tournament=st.session_state.get(f"{task}_tournament_select"),
|
134 |
+
selected_match=None,
|
135 |
+
),
|
136 |
+
label_visibility="collapsed",
|
137 |
+
)
|
138 |
+
with cols[2]:
|
139 |
+
if st.button("โถ", key="next_tournament"):
|
140 |
+
navigate(d, default_idx, "selected_tournament", 1)
|
141 |
+
|
142 |
+
# tournament_prm_select = st.selectbox(
|
143 |
+
# "Select Tournament",
|
144 |
+
# df.idx_inst_src.unique(),
|
145 |
+
# index=d.index(st.session_state.get("selected_tournament")),
|
146 |
+
# key=f"{task}_tournament_{result_select}",
|
147 |
+
# )
|
148 |
+
|
149 |
+
# print(tournament_prm_select, type(tournament_prm_select))
|
150 |
+
st.session_state["selected_tournament"] = tournament_prm_select
|
151 |
+
# tournament_prm_select = st.selectbox(
|
152 |
+
# "Select Tournament",
|
153 |
+
# df.idx_inst_src.unique(),
|
154 |
+
# key=f"{task}_tournament_{result_select}",
|
155 |
+
# )
|
156 |
+
df_now_processed = None
|
157 |
+
if tournament_prm_select:
|
158 |
+
df_now = df[df.idx_inst_src == tournament_prm_select]
|
159 |
+
df_now_processed, _alpha2names = au.init_tournament_dataframe(
|
160 |
+
df_now,
|
161 |
+
alpha2names=st.session_state["alpha2names"]
|
162 |
+
if "alpha2names" in st.session_state.keys()
|
163 |
+
else None,
|
164 |
+
)
|
165 |
+
if "alpha2names" not in st.session_state:
|
166 |
+
st.session_state["alpha2names"] = _alpha2names
|
167 |
+
|
168 |
+
try:
|
169 |
+
bracket_drawing = au.draw(
|
170 |
+
df_now_processed,
|
171 |
+
alpha2names=st.session_state["alpha2names"],
|
172 |
+
)
|
173 |
+
legend = au.make_legend_str(
|
174 |
+
df_now_processed, st.session_state["alpha2names"]
|
175 |
+
)
|
176 |
+
st.code(bracket_drawing + legend)
|
177 |
+
|
178 |
+
m = list(df_now_processed.human_readable_idx)
|
179 |
+
default_idx = st.session_state.get("selected_match", None)
|
180 |
+
cols = st.columns((1, 18, 1))
|
181 |
+
with cols[0]:
|
182 |
+
if st.button("โ", key="prev_match"):
|
183 |
+
navigate(m, default_idx, "selected_match", -1)
|
184 |
+
with cols[1]:
|
185 |
+
match_idx_human = QueryWrapper("match", "Select Match")(
|
186 |
+
st.selectbox,
|
187 |
+
m,
|
188 |
+
default_idx,
|
189 |
+
key=f"{task}_match_select",
|
190 |
+
label_visibility="collapsed",
|
191 |
+
)
|
192 |
+
with cols[2]:
|
193 |
+
if st.button("โถ", key="next_match"):
|
194 |
+
navigate(m, default_idx, "selected_match", 1)
|
195 |
+
|
196 |
+
# match_idx_human = st.selectbox(
|
197 |
+
# "Select Match",
|
198 |
+
# df_now_processed.human_readable_idx,
|
199 |
+
# key=f"{task}_match_{result_select}",
|
200 |
+
# )
|
201 |
+
# print(match_idx_human)
|
202 |
+
st.session_state["selected_match"] = match_idx_human
|
203 |
+
# match_idx_human = st.selectbox(
|
204 |
+
# "Select Match",
|
205 |
+
# df_now_processed.human_readable_idx,
|
206 |
+
# key=f"{task}_match_{result_select}",
|
207 |
+
# )
|
208 |
+
if match_idx_human:
|
209 |
+
match_idx = int(match_idx_human.split(": ")[0])
|
210 |
+
row = df_now_processed.loc[match_idx]
|
211 |
+
|
212 |
+
st.markdown("#### Current Test Scenario:")
|
213 |
+
|
214 |
+
with st.expander(
|
215 |
+
f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
|
216 |
+
):
|
217 |
+
prompt = load_prompt(eval_prompt_name, task=task)
|
218 |
+
kwargs = dict(
|
219 |
+
inst="{inst}",
|
220 |
+
src="{src}",
|
221 |
+
out_a="{out_a}",
|
222 |
+
out_b="{out_b}",
|
223 |
+
task=task,
|
224 |
+
)
|
225 |
+
if eval_prompt_name == "translation_pair":
|
226 |
+
kwargs["source_lang"] = "{source_lang}"
|
227 |
+
kwargs["target_lang"] = "{target_lang}"
|
228 |
+
prompt_cmpl = prompt.complete_prompt(**kwargs)
|
229 |
+
for msg in prompt_cmpl:
|
230 |
+
st.markdown(f"**{msg['role']}**")
|
231 |
+
st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
|
232 |
+
|
233 |
+
st.info(show_linebreak_in_md(tournament_prm_select))
|
234 |
+
|
235 |
+
winner = row.winner
|
236 |
+
col1, col2 = st.columns(2)
|
237 |
+
|
238 |
+
winnerbox = st.success
|
239 |
+
loserbox = st.error
|
240 |
+
with col1:
|
241 |
+
iswinner = winner == "model_a"
|
242 |
+
writemsg = winnerbox if iswinner else loserbox
|
243 |
+
st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
|
244 |
+
writemsg(
|
245 |
+
show_linebreak_in_md(row.generated_a),
|
246 |
+
icon="โ
" if iswinner else "โ",
|
247 |
+
)
|
248 |
+
with col2:
|
249 |
+
iswinner = winner == "model_b"
|
250 |
+
writemsg = winnerbox if iswinner else loserbox
|
251 |
+
st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
|
252 |
+
writemsg(
|
253 |
+
show_linebreak_in_md(row.generated_b),
|
254 |
+
icon="โ
" if iswinner else "โ",
|
255 |
+
)
|
256 |
+
except Exception as e:
|
257 |
+
import traceback
|
258 |
+
|
259 |
+
traceback.print_exc()
|
260 |
+
st.markdown(
|
261 |
+
"**Bug: ์๋ ํ๋ฅผ ๋ณต์ฌํด์ ์ด์๋ก ๋จ๊ฒจ์ฃผ์๋ฉด ๊ฐ์ ์ ๋์์ด ๋ฉ๋๋ค. ๊ฐ์ฌํฉ๋๋ค๐**"
|
262 |
+
if st.session_state.korean
|
263 |
+
else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.๐"
|
264 |
+
)
|
265 |
+
st.error(e)
|
266 |
+
st.info(tournament_prm_select)
|
267 |
+
st.table(
|
268 |
+
df_now_processed[
|
269 |
+
[
|
270 |
+
"depth",
|
271 |
+
"round",
|
272 |
+
"winner_nodes",
|
273 |
+
"winner_resolved",
|
274 |
+
"winner",
|
275 |
+
"model_a",
|
276 |
+
"model_b",
|
277 |
+
]
|
278 |
+
]
|
279 |
+
)
|
280 |
+
st.write("Sharable link")
|
281 |
+
st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
|
282 |
+
st.divider()
|
283 |
+
|
284 |
+
if st.session_state.korean:
|
285 |
+
st.markdown("### ๋งค์น ํต๊ณ")
|
286 |
+
else:
|
287 |
+
st.markdown("### Match Stats.")
|
288 |
+
col1, col2 = st.columns(2)
|
289 |
+
col1, col2 = st.columns(2)
|
290 |
+
with col1:
|
291 |
+
with st.container(border=True):
|
292 |
+
st.plotly_chart(
|
293 |
+
figure_dict[
|
294 |
+
"fraction_of_model_a_wins_for_all_a_vs_b_matches"
|
295 |
+
].update_layout(autosize=True, **default_layout_dict),
|
296 |
+
use_container_width=True,
|
297 |
+
key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
|
298 |
+
)
|
299 |
+
with col2:
|
300 |
+
with st.container(border=True):
|
301 |
+
st.plotly_chart(
|
302 |
+
figure_dict["match_count_of_each_combination_of_models"].update_layout(
|
303 |
+
autosize=True, **default_layout_dict
|
304 |
+
),
|
305 |
+
use_container_width=True,
|
306 |
+
key=f"{task}_match_count_of_each_combination_of_models",
|
307 |
+
)
|
308 |
+
|
309 |
+
with col1:
|
310 |
+
with st.container(border=True):
|
311 |
+
st.plotly_chart(
|
312 |
+
figure_dict["match_count_for_each_model"].update_layout(
|
313 |
+
**default_layout_dict
|
314 |
+
),
|
315 |
+
use_container_width=True,
|
316 |
+
key=f"{task}_match_count_for_each_model",
|
317 |
+
)
|
318 |
+
with col2:
|
319 |
+
pass
|
320 |
+
|
321 |
+
if st.session_state.korean:
|
322 |
+
st.markdown("### ์ฐธ๊ณ ์ฉ LLM Judge ํธํฅ ์ ๋ณด")
|
323 |
+
else:
|
324 |
+
st.markdown("### FYI: How biased is your LLM Judge?")
|
325 |
+
|
326 |
+
with st.expander("ํผ์ณ์ ๋ณด๊ธฐ" if st.session_state.korean else "Expand to show"):
|
327 |
+
st.info(
|
328 |
+
"""
|
329 |
+
Varco Arena์์๋ position bias์ ์ํฅ์ ์ต์ํํ๊ธฐ ์ํด ๋ชจ๋ ๋ชจ๋ธ์ด A๋ B์์น์ ๋ฒ๊ฐ์ ์์นํ๋๋ก ํ์์ต๋๋ค. ๊ทธ๋ฌ๋ LLM Judge ํน์ Prompt์ ์ฑ๋ฅ์ด ๋ถ์กฑํ๋ค๊ณ ๋๊ปด์ง๋ค๋ฉด, ์๋ ์๋ ค์ง LLM Judge bias๊ฐ ์ฐธ๊ณ ๊ฐ ๋ ๊ฒ๋๋ค.
|
330 |
+
* position bias (์ผ์ชฝ)
|
331 |
+
* length bias (์ค๋ฅธ์ชฝ)
|
332 |
+
|
333 |
+
๊ฒฐ๊ณผ์ ์๊ณก์ด LLM Judge์ ๋ถ์กฑํจ ๋๋ฌธ์ด์๋ค๋ ์ ์ ๊ท๋ช
ํ๋ ค๋ฉด ์ฌ์ฉํ์ LLM Judge์ Prompt์ binary classification ์ ํ๋๋ฅผ ์ธก์ ํด๋ณด์๊ธธ ๋ฐ๋๋๋ค (Varco Arena๋ฅผ ํ์ฉํ์ฌ ์ด๋ฅผ ์ํํด๋ณผ ์ ์์ต๋๋ค!).""".strip()
|
334 |
+
if st.session_state.korean
|
335 |
+
else """
|
336 |
+
In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
|
337 |
+
* position bias (left)
|
338 |
+
* length bias (right)
|
339 |
+
|
340 |
+
To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
|
341 |
+
""".strip()
|
342 |
+
)
|
343 |
+
st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
|
344 |
+
col1, col2 = st.columns(2)
|
345 |
+
with col1:
|
346 |
+
with st.container(border=True):
|
347 |
+
st.plotly_chart(
|
348 |
+
figure_dict["counts_of_match_winners"].update_layout(
|
349 |
+
**default_layout_dict
|
350 |
+
),
|
351 |
+
use_container_width=True,
|
352 |
+
key=f"{task}_counts_of_match_winners",
|
353 |
+
)
|
354 |
+
with col2:
|
355 |
+
with st.container(border=True):
|
356 |
+
st.plotly_chart(
|
357 |
+
figure_dict["length_bias"].update_layout(**default_layout_dict),
|
358 |
+
use_container_width=True,
|
359 |
+
key=f"{task}_length_bias",
|
360 |
+
)
|
361 |
+
st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
|
362 |
+
|
363 |
+
|
364 |
+
if __name__ == "__main__":
|
365 |
+
main()
|
precommit.sh
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# reformat all to prevent commit problem
|
2 |
+
git add .
|
3 |
+
pre-commit run --all-files
|
4 |
+
# restore the files added so that one can actually add files to be commit
|
5 |
+
git restore --staged .
|
query_comp.py
ADDED
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# query๋ฅผ ์๋์ผ๋ก ์ฝ๊ณ ์ฐ๋ container๋ฅผ ์ ์
|
2 |
+
|
3 |
+
from __future__ import annotations
|
4 |
+
|
5 |
+
import re
|
6 |
+
from typing import Callable, TypeVar
|
7 |
+
|
8 |
+
import streamlit as st
|
9 |
+
|
10 |
+
__all__ = ["QueryWrapper", "get_base_url"]
|
11 |
+
|
12 |
+
T = TypeVar("T")
|
13 |
+
|
14 |
+
|
15 |
+
import hashlib
|
16 |
+
import urllib.parse
|
17 |
+
|
18 |
+
|
19 |
+
def SHA1(msg: str) -> str:
|
20 |
+
return hashlib.sha1(msg.encode()).hexdigest()[:8]
|
21 |
+
|
22 |
+
|
23 |
+
def get_base_url():
|
24 |
+
session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
|
25 |
+
return urllib.parse.urlunparse(
|
26 |
+
[session.client.request.protocol, session.client.request.host, "", "", "", ""]
|
27 |
+
)
|
28 |
+
|
29 |
+
|
30 |
+
class QueryWrapper:
|
31 |
+
queries: dict[str, _QueryWrapper] = {} # ๊ธฐ๋ก์ฉ
|
32 |
+
|
33 |
+
def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
|
34 |
+
self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
|
35 |
+
query, label, use_hash
|
36 |
+
)
|
37 |
+
|
38 |
+
def __call__(self, *args, **kwargs):
|
39 |
+
return self.__wrapper(*args, **kwargs)
|
40 |
+
|
41 |
+
@classmethod
|
42 |
+
def get_sharable_link(cls):
|
43 |
+
# for k, v in cls.queries.items():
|
44 |
+
# print(f"{k}: {v}")
|
45 |
+
return re.sub(
|
46 |
+
"&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
|
47 |
+
).strip("&")
|
48 |
+
|
49 |
+
|
50 |
+
class _QueryWrapper:
|
51 |
+
ILLEGAL_CHARS = "&/=?"
|
52 |
+
|
53 |
+
def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
|
54 |
+
self.query = query
|
55 |
+
self.label = label or query
|
56 |
+
self.use_hash = use_hash
|
57 |
+
self.hash_table = {}
|
58 |
+
self.key = None
|
59 |
+
|
60 |
+
def __call__(
|
61 |
+
self,
|
62 |
+
base_container: Callable,
|
63 |
+
legal_list: list[T],
|
64 |
+
default: T | list[T] | None = None,
|
65 |
+
*,
|
66 |
+
key: str | None = None,
|
67 |
+
**kwargs,
|
68 |
+
) -> T | list[T] | None:
|
69 |
+
val_from_query = st.query_params.get_all(self.query.lower())
|
70 |
+
# print(val_from_query)
|
71 |
+
legal = len(val_from_query) > 0
|
72 |
+
self.key = key or self.label
|
73 |
+
|
74 |
+
self.hash_table = {SHA1(str(v)): v for v in legal_list}
|
75 |
+
|
76 |
+
# filter out illegal values
|
77 |
+
if legal and legal_list:
|
78 |
+
val_from_query = [v for v in val_from_query if v in self.hash_table]
|
79 |
+
# print(self.label, val_from_query, legal)
|
80 |
+
if legal:
|
81 |
+
selected = [self.hash_table[v] for v in val_from_query]
|
82 |
+
elif default:
|
83 |
+
selected = default
|
84 |
+
elif self.label in st.session_state:
|
85 |
+
selected = st.session_state[self.label]
|
86 |
+
if legal_list:
|
87 |
+
if isinstance(selected, list):
|
88 |
+
selected = [v for v in selected if v in legal_list]
|
89 |
+
elif selected not in legal_list:
|
90 |
+
selected = []
|
91 |
+
else:
|
92 |
+
selected = []
|
93 |
+
if selected is None:
|
94 |
+
pass
|
95 |
+
elif len(selected) == 1 and base_container in [st.selectbox, st.radio]:
|
96 |
+
selected = selected[0]
|
97 |
+
# print(self.label, selected)
|
98 |
+
if base_container == st.checkbox:
|
99 |
+
selected = base_container(
|
100 |
+
self.label,
|
101 |
+
legal_list,
|
102 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
103 |
+
key=self.key,
|
104 |
+
**kwargs,
|
105 |
+
)
|
106 |
+
elif base_container == st.multiselect:
|
107 |
+
selected = base_container(
|
108 |
+
self.label, legal_list, default=selected, key=self.key, **kwargs
|
109 |
+
)
|
110 |
+
elif base_container == st.radio:
|
111 |
+
selected = base_container(
|
112 |
+
self.label,
|
113 |
+
legal_list,
|
114 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
115 |
+
key=self.key,
|
116 |
+
**kwargs,
|
117 |
+
)
|
118 |
+
elif base_container == st.selectbox:
|
119 |
+
selected = base_container(
|
120 |
+
self.label,
|
121 |
+
legal_list,
|
122 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
123 |
+
key=self.key,
|
124 |
+
**kwargs,
|
125 |
+
)
|
126 |
+
else:
|
127 |
+
selected = base_container(self.label, legal_list, key=self.key, **kwargs)
|
128 |
+
return st.session_state[self.key]
|
129 |
+
|
130 |
+
def __str__(self):
|
131 |
+
selected = st.session_state.get(self.key, None)
|
132 |
+
if isinstance(selected, str):
|
133 |
+
return f"{self.query.lower()}={SHA1(selected)}"
|
134 |
+
elif isinstance(selected, list):
|
135 |
+
return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
|
136 |
+
else:
|
137 |
+
return ""
|
requirements.txt
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
openai>=1.17.0
|
2 |
+
munch
|
3 |
+
pandas
|
4 |
+
numpy
|
5 |
+
tqdm>=4.48.0
|
6 |
+
plotly
|
7 |
+
scikit-learn
|
8 |
+
kaleido
|
9 |
+
tiktoken>=0.7.0
|
10 |
+
pyyaml
|
11 |
+
transformers
|
12 |
+
streamlit>=1.40.2
|
13 |
+
openpyxl
|
14 |
+
git+https://github.com/shobrook/openlimit.git#egg=openlimit
|
run.sh
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
# ์ฌ๊ธฐ์ ๋๋ ค์ผํด์ (cwd = ./streamlit_app_components/)
|
3 |
+
# VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py
|
4 |
+
streamlit run app.py
|
streamlit_app_local/.streamlit/config.toml
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[client]
|
2 |
+
showSidebarNavigation = false
|
3 |
+
[server]
|
4 |
+
headless = true
|
streamlit_app_local/README.md
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Varco Arena web app
|
2 |
+
```bash
|
3 |
+
cd ./streamlit_app_local/
|
4 |
+
bash run.sh
|
5 |
+
```
|
6 |
+
|
7 |
+
## view: Running
|
8 |
+
![run_page](streamlit_app_local/page_run.png)
|
9 |
+
|
10 |
+
## view: Results
|
11 |
+
![result_page1](./page_result_1.png)
|
12 |
+
![result_page2](./page_result_2.png)
|
13 |
+
![result_page3](./page_result_3.png)
|
14 |
+
![result_page4](./page_result_4.png)
|
15 |
+
|
16 |
+
## how to use
|
17 |
+
1. upload llm outputs in jsonlines format
|
18 |
+
2. cost estimation
|
19 |
+
3. run
|
20 |
+
4. go to see results tab
|
21 |
+
|
22 |
+
## example run results at...
|
23 |
+
- `streamlit_app_local/user_submit/`
|
24 |
+
- visible on web if you `bash run.sh`
|
streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc
ADDED
Binary file (17.7 kB). View file
|
|
streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc
ADDED
Binary file (9.12 kB). View file
|
|
streamlit_app_local/__pycache__/app.cpython-311.pyc
ADDED
Binary file (15.9 kB). View file
|
|
streamlit_app_local/__pycache__/app.cpython-38.pyc
ADDED
Binary file (6.32 kB). View file
|
|
streamlit_app_local/__pycache__/query_comp.cpython-311.pyc
ADDED
Binary file (8 kB). View file
|
|
streamlit_app_local/__pycache__/view_utils.cpython-311.pyc
ADDED
Binary file (18.3 kB). View file
|
|
streamlit_app_local/__pycache__/view_utils.cpython-38.pyc
ADDED
Binary file (9.91 kB). View file
|
|
streamlit_app_local/analysis_utils.py
ADDED
@@ -0,0 +1,338 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from copy import deepcopy
|
2 |
+
from functools import partial
|
3 |
+
from typing import *
|
4 |
+
|
5 |
+
import pandas as pd
|
6 |
+
from fire import Fire
|
7 |
+
|
8 |
+
"""
|
9 |
+
This code assumes dealing with only one instruction
|
10 |
+
"""
|
11 |
+
|
12 |
+
|
13 |
+
# from varco_arena.tournament
|
14 |
+
def log2_power_of_two(n):
|
15 |
+
# First, let's make sure n is indeed a power of 2
|
16 |
+
if n & (n - 1) != 0 or n == 0:
|
17 |
+
raise ValueError("n must be a positive power of 2")
|
18 |
+
|
19 |
+
exponent = 0
|
20 |
+
while n > 1:
|
21 |
+
n >>= 1 # Right shift is like dividing by 2, but faster
|
22 |
+
exponent += 1
|
23 |
+
return exponent
|
24 |
+
|
25 |
+
|
26 |
+
def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
|
27 |
+
finals = df[df["round"] == "final"]
|
28 |
+
if len(finals) == 1:
|
29 |
+
first = finals.iloc[0].winner_resolved
|
30 |
+
else:
|
31 |
+
first = (
|
32 |
+
None # error case (no finals match or multiple finals (buggy result file))
|
33 |
+
)
|
34 |
+
|
35 |
+
return first
|
36 |
+
|
37 |
+
|
38 |
+
def get_unique_participants(df: pd.DataFrame) -> list:
|
39 |
+
participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
|
40 |
+
participants = [p for p in participants if p] # remove None
|
41 |
+
participants = sorted(participants) # make it sorted
|
42 |
+
return participants
|
43 |
+
|
44 |
+
|
45 |
+
def _impute_byes(df):
|
46 |
+
max_depth = df.depth.max()
|
47 |
+
|
48 |
+
# init
|
49 |
+
imputed_parts = dict()
|
50 |
+
for depth in range(max_depth + 1):
|
51 |
+
imputed_parts[depth] = df[df.depth == depth].copy()
|
52 |
+
|
53 |
+
# reverse
|
54 |
+
for depth in range(max_depth, 0, -1): # always we have 1 proper match for depth=0
|
55 |
+
null_v_null = {
|
56 |
+
"model_a": "",
|
57 |
+
"model_b": "",
|
58 |
+
"winner": "model_a",
|
59 |
+
"match_order_in_round": "-",
|
60 |
+
"depth": depth,
|
61 |
+
}
|
62 |
+
|
63 |
+
# fill some_model vs null byes
|
64 |
+
players = get_unique_participants(imputed_parts[depth])
|
65 |
+
proceeded = get_unique_participants(imputed_parts[depth - 1])
|
66 |
+
imputed = []
|
67 |
+
for p in proceeded:
|
68 |
+
if p not in players:
|
69 |
+
p_v_null = deepcopy(null_v_null)
|
70 |
+
p_v_null["model_a"] = p
|
71 |
+
imputed.append(p_v_null)
|
72 |
+
imputed_parts[depth] = pd.concat(
|
73 |
+
[
|
74 |
+
imputed_parts[depth],
|
75 |
+
pd.DataFrame(imputed),
|
76 |
+
],
|
77 |
+
axis="index",
|
78 |
+
)
|
79 |
+
|
80 |
+
# fill null vs null
|
81 |
+
n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
|
82 |
+
if n_null_v_null > 0:
|
83 |
+
imputed = pd.DataFrame([null_v_null] * n_null_v_null)
|
84 |
+
imputed_parts[depth] = pd.concat(
|
85 |
+
[
|
86 |
+
imputed_parts[depth],
|
87 |
+
imputed,
|
88 |
+
],
|
89 |
+
axis="index",
|
90 |
+
)
|
91 |
+
|
92 |
+
df_imputed = pd.concat(imputed_parts.values(), axis="index")
|
93 |
+
df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
|
94 |
+
|
95 |
+
return df_imputed
|
96 |
+
|
97 |
+
|
98 |
+
def index_test_scenario(df) -> pd.DataFrame:
|
99 |
+
df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
|
100 |
+
df["idx_inst_src"] = df.apply(
|
101 |
+
lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
|
102 |
+
)
|
103 |
+
|
104 |
+
# later used for tournament bracket backtrackiung
|
105 |
+
if "depth" not in df.columns:
|
106 |
+
mappings = {
|
107 |
+
"final": 0,
|
108 |
+
"semi-final": 1,
|
109 |
+
"quarter-final": 2,
|
110 |
+
}
|
111 |
+
|
112 |
+
def _convert_round_to_depth(rnd: str, mappings=None) -> int:
|
113 |
+
if rnd is None:
|
114 |
+
depth = None
|
115 |
+
elif rnd in mappings.keys():
|
116 |
+
depth = mappings[rnd]
|
117 |
+
elif rnd.startswith("round-"): # assume perfect power of two
|
118 |
+
num = int(rnd.replace("round-", "").strip())
|
119 |
+
depth = log2_power_of_two(num) - 1
|
120 |
+
return depth
|
121 |
+
|
122 |
+
conv = partial(_convert_round_to_depth, mappings=mappings)
|
123 |
+
df["depth"] = df["round"].apply(conv)
|
124 |
+
|
125 |
+
return df
|
126 |
+
|
127 |
+
|
128 |
+
def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
|
129 |
+
df = df.sort_values(by="depth").reset_index(drop=True)
|
130 |
+
# make winner interpretable (A -> model_a, B -> model_b)
|
131 |
+
df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
|
132 |
+
|
133 |
+
# define alpha2names if not given (covers upto 168 participants)
|
134 |
+
if alpha2names is None:
|
135 |
+
alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
|
136 |
+
abcdefghijklmnopqrstuvwxyz\
|
137 |
+
โโโโโโโโโโโโโโโโโ โกโขโฃโคโฅโฆโงโจโฉ\
|
138 |
+
ใ ใกใขใฃใคใฅใฆใงใจใฉใชใซใฌใญ\
|
139 |
+
ใฎใฏใฐใฑใฒใณใดใตใถใทใธในใบใป\
|
140 |
+
ใฑใดใทในใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\
|
141 |
+
ฮฮฮฮฮฮฮฮฮฮฮฮฮฮฮฮ ฮกฮฃฮคฮฅฮฆฮงฮจฮฉ\
|
142 |
+
ฮฑฮฒฮณฮดฮตฮถฮทฮธฮนฮบฮปฮผฮฝฮพฮฟฯฯฯฯฯ
ฯฯฯฯ"
|
143 |
+
model_full_names = get_unique_participants(df)
|
144 |
+
alpha2names = dict(zip(alphabets, model_full_names))
|
145 |
+
if len(alpha2names) < len(model_full_names):
|
146 |
+
raise ValueError(
|
147 |
+
f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
|
148 |
+
)
|
149 |
+
names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
|
150 |
+
df = _impute_byes(df)
|
151 |
+
|
152 |
+
# preserve readables for later
|
153 |
+
df = _make_readables(df, names2alpha)
|
154 |
+
|
155 |
+
if len(df[df["round"] == "final"]) != 1:
|
156 |
+
raise ValueError(f"final match need to be one and only.")
|
157 |
+
|
158 |
+
return df, alpha2names
|
159 |
+
|
160 |
+
|
161 |
+
def _make_readables(df, names2alpha):
|
162 |
+
df["human_readable_model_a"] = df.model_a.copy()
|
163 |
+
df["human_readable_model_b"] = df.model_b.copy()
|
164 |
+
|
165 |
+
df.model_a = df.model_a.apply(
|
166 |
+
lambda modelname: names2alpha[modelname] if modelname else "x"
|
167 |
+
)
|
168 |
+
df.model_b = df.model_b.apply(
|
169 |
+
lambda modelname: names2alpha[modelname] if modelname else "x"
|
170 |
+
)
|
171 |
+
|
172 |
+
df["human_readable_idx"] = df.apply(
|
173 |
+
lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
|
174 |
+
axis=1,
|
175 |
+
)
|
176 |
+
|
177 |
+
df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
|
178 |
+
df["winner_nodes"] = df.apply(
|
179 |
+
lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
|
180 |
+
) # later for figure representation of winner as a "node"
|
181 |
+
return df
|
182 |
+
|
183 |
+
|
184 |
+
# draw
|
185 |
+
def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
|
186 |
+
def _draw_round(
|
187 |
+
df: pd.DataFrame,
|
188 |
+
depth: int = None,
|
189 |
+
winners_in_order: list = None,
|
190 |
+
) -> Tuple:
|
191 |
+
df_now = df[df.depth == depth]
|
192 |
+
max_depth = df.depth.max()
|
193 |
+
|
194 |
+
width = 2 ** ((max_depth - depth) + 2)
|
195 |
+
|
196 |
+
connect_left = "โ" * (width)
|
197 |
+
connect_left = connect_left[4:]
|
198 |
+
connect_right = " " * (width)
|
199 |
+
connect_right = "โ" + connect_right[1:]
|
200 |
+
|
201 |
+
if winners_in_order is None:
|
202 |
+
assert (
|
203 |
+
depth == 0
|
204 |
+
), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
|
205 |
+
winners_in_order = df_now.winner_nodes
|
206 |
+
|
207 |
+
round_drawing_parts = []
|
208 |
+
descending_round_winners = []
|
209 |
+
for node in winners_in_order:
|
210 |
+
round_drawing_parts.append("".join([node, connect_left, connect_right]))
|
211 |
+
# next round winners in sync with winner order
|
212 |
+
row_now = df_now.query(f"winner_nodes=='{node}'")
|
213 |
+
descending_round_winners.append(row_now.model_a.item())
|
214 |
+
descending_round_winners.append(row_now.model_b.item())
|
215 |
+
|
216 |
+
# find descending_round_winners within winner_nodes format (num:alpha)
|
217 |
+
if depth == max_depth:
|
218 |
+
pass # keep the descending_round_winners intact
|
219 |
+
else:
|
220 |
+
df_descend = df[df.depth == depth + 1]
|
221 |
+
for i, winner_alpha in enumerate(descending_round_winners):
|
222 |
+
node_intr = df_descend.query(
|
223 |
+
f"winner_resolved=='{winner_alpha}'"
|
224 |
+
).winner_nodes.item()
|
225 |
+
descending_round_winners[i] = node_intr
|
226 |
+
|
227 |
+
round_drawing = "".join(round_drawing_parts)
|
228 |
+
descending_unit = " " * width
|
229 |
+
descending_unit = "โ" + descending_unit[1:]
|
230 |
+
descending_lines_parts = [descending_unit] * len(df_now) * 2
|
231 |
+
descending_lines = "".join(descending_lines_parts)
|
232 |
+
|
233 |
+
return round_drawing, descending_lines, descending_round_winners
|
234 |
+
|
235 |
+
drawings = []
|
236 |
+
winners_in_order = None
|
237 |
+
max_depth = df.depth.max()
|
238 |
+
for depth in range(max_depth + 1):
|
239 |
+
max_depth = df.depth.max()
|
240 |
+
winner_drw, lines_desc, winners_in_order = _draw_round(
|
241 |
+
df,
|
242 |
+
depth=depth,
|
243 |
+
winners_in_order=winners_in_order,
|
244 |
+
)
|
245 |
+
drawings.append((winner_drw, lines_desc))
|
246 |
+
# prepare bracket top
|
247 |
+
champion_alphabet = drawings[0][0].split()[0].split(":")[0]
|
248 |
+
champion_readable = alpha2names[champion_alphabet]
|
249 |
+
bracket_top = [f"๐ฅwinner: {champion_readable}", "โ"]
|
250 |
+
# prepare mid
|
251 |
+
bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
|
252 |
+
|
253 |
+
# prepare bot
|
254 |
+
initial_participants = winners_in_order
|
255 |
+
bracket_bot = (" " * 3).join(initial_participants)
|
256 |
+
|
257 |
+
full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
|
258 |
+
|
259 |
+
return full_figure
|
260 |
+
|
261 |
+
|
262 |
+
def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
|
263 |
+
n_models = len(get_unique_participants(result_df))
|
264 |
+
size_testset = int(len(result_df) / (n_models - 1))
|
265 |
+
interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
|
266 |
+
return interpretation, n_models, size_testset
|
267 |
+
|
268 |
+
|
269 |
+
def make_legend_str(df, alpha2names) -> str:
|
270 |
+
first = get_1st(df, alpha2names)
|
271 |
+
alpha2names = {k: v.replace("๐ฅ ", "") for k, v in alpha2names.items()}
|
272 |
+
alpha_ordered = sorted(list(alpha2names.keys()))
|
273 |
+
# names_ordered = sorted(list(alpha2names.values()))
|
274 |
+
# name2alpha = {v: k for k, v in alpha2names.items()}
|
275 |
+
|
276 |
+
for k, v in alpha2names.items():
|
277 |
+
if v == alpha2names[first]:
|
278 |
+
alpha2names[k] = f"๐ฅ {v}"
|
279 |
+
res_str = f"\n\nlegend:"
|
280 |
+
# for name in names_ordered:
|
281 |
+
# alpha = name2alpha[name]
|
282 |
+
for alpha in alpha_ordered:
|
283 |
+
name_w_medal = alpha2names[alpha]
|
284 |
+
res_str += f"\n{alpha}\t{name_w_medal}"
|
285 |
+
return res_str
|
286 |
+
|
287 |
+
|
288 |
+
def main(
|
289 |
+
jslname: str = "result.json",
|
290 |
+
):
|
291 |
+
"""
|
292 |
+
ํ
์คํธ ์ฝ๋
|
293 |
+
"""
|
294 |
+
|
295 |
+
df = pd.read_json(jslname, orient="records")
|
296 |
+
df = df.drop(columns=["tstamp", "logs"])
|
297 |
+
df = index_test_scenario(df)
|
298 |
+
|
299 |
+
# ์ค๊ฐ์ visualization(df) ์ฌ๊ธฐ์ ์๋ต. ๋ง์ฝ ์ด๊ฑฐ ๋บด๊ณ ๋ค ๋ฐ๋ผํ๋๋ฐ ๋ฌธ์ ๊ฐ ์๊ธด๋ค๋ฉด viz ๋ฌธ์ ๋ค. ๊ทผ๋ฐ ์๊ทธ๋ด๊ฑฐ๊ฐ๊ธดํจ
|
300 |
+
selections = df.idx_inst_src.unique()
|
301 |
+
for i, sel in enumerate(selections):
|
302 |
+
try:
|
303 |
+
df_now = df[df.idx_inst_src == sel]
|
304 |
+
df_now_processed, _alpha2names = init_tournament_dataframe(
|
305 |
+
df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
|
306 |
+
)
|
307 |
+
if "alpha2names" not in dir():
|
308 |
+
alpha2names = _alpha2names
|
309 |
+
assert "alpha2names" in dir()
|
310 |
+
bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
|
311 |
+
legend = make_legend_str(df_now_processed, alpha2names)
|
312 |
+
print(bracket_drawing + legend)
|
313 |
+
print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
|
314 |
+
print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
|
315 |
+
|
316 |
+
for match_idx_human in df_now_processed.human_readable_idx:
|
317 |
+
match_idx = int(match_idx_human.split(": ")[0])
|
318 |
+
row = df_now_processed.loc[match_idx]
|
319 |
+
winner = row.winner
|
320 |
+
except Exception as e:
|
321 |
+
print(e, file=open(f"{i}_err.txt", "w"))
|
322 |
+
print("", file=open(f"{i}_err.txt", "a"))
|
323 |
+
print(sel, file=open(f"{i}_err.txt", "a"))
|
324 |
+
df_now_processed[
|
325 |
+
[
|
326 |
+
"depth",
|
327 |
+
"round",
|
328 |
+
"winner_nodes",
|
329 |
+
"winner_resolved",
|
330 |
+
"winner",
|
331 |
+
"model_a",
|
332 |
+
"model_b",
|
333 |
+
]
|
334 |
+
].to_json(f"{i}_err.jsonl", lines=True, orient="records")
|
335 |
+
|
336 |
+
|
337 |
+
if __name__ == "__main__":
|
338 |
+
Fire(main)
|
streamlit_app_local/app.py
ADDED
@@ -0,0 +1,318 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# import shutil
|
2 |
+
import os
|
3 |
+
import select
|
4 |
+
import subprocess
|
5 |
+
import sys
|
6 |
+
import time
|
7 |
+
from datetime import datetime, timedelta, timezone
|
8 |
+
from pathlib import Path
|
9 |
+
from typing import *
|
10 |
+
|
11 |
+
import streamlit as st
|
12 |
+
|
13 |
+
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
|
14 |
+
|
15 |
+
from varco_arena_core.prompts import load_prompt
|
16 |
+
|
17 |
+
from view_utils import (
|
18 |
+
default_page_setting,
|
19 |
+
escape_markdown,
|
20 |
+
set_nav_bar,
|
21 |
+
show_linebreak_in_md,
|
22 |
+
)
|
23 |
+
|
24 |
+
# import sys
|
25 |
+
# print(sys.executable)
|
26 |
+
|
27 |
+
|
28 |
+
VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
|
29 |
+
USR_SUB = VA_ROOT.parts[-1]
|
30 |
+
|
31 |
+
|
32 |
+
def upload_files(uploaded_files) -> Path:
|
33 |
+
# prep directory for user submission
|
34 |
+
user_sub_root = VA_ROOT
|
35 |
+
if user_sub_root.exists():
|
36 |
+
if not user_sub_root.is_dir():
|
37 |
+
raise ValueError(
|
38 |
+
f"{user_sub_root} file exists and is not a directory. Consider renaming it."
|
39 |
+
)
|
40 |
+
else:
|
41 |
+
user_sub_root.mkdir(parents=True)
|
42 |
+
|
43 |
+
KST = timezone(timedelta(hours=9))
|
44 |
+
tstamp = datetime.now(KST)
|
45 |
+
tstr = tstamp.strftime("%m-%d_%H:%M:%S")
|
46 |
+
files_dir_str = "./" + str(user_sub_root / tstr)
|
47 |
+
files_dir = Path(files_dir_str)
|
48 |
+
files_dir.mkdir(parents=True, exist_ok=True)
|
49 |
+
uploaded_files = list(uploaded_files)
|
50 |
+
|
51 |
+
if not uploaded_files:
|
52 |
+
st.warning("โ No files to upload. Please drag/drop or browse files to upload.")
|
53 |
+
elif len(uploaded_files) < 2:
|
54 |
+
st.error("โ You need at least 2 jsonlines files to properly run VA.")
|
55 |
+
else: # properly uploaded
|
56 |
+
for file in uploaded_files:
|
57 |
+
# Create a path for the file in the server directory
|
58 |
+
file_path = files_dir / file.name
|
59 |
+
|
60 |
+
# Save the file to the server directory
|
61 |
+
with open(file_path, "wb") as f:
|
62 |
+
f.write(file.getbuffer())
|
63 |
+
|
64 |
+
jslfiles = list(files_dir.glob("*.jsonl"))
|
65 |
+
st.success(f"โ
Successfully uploaded {len(jslfiles)} jsonl files.")
|
66 |
+
return files_dir.resolve()
|
67 |
+
|
68 |
+
|
69 |
+
def run_varco_arena(
|
70 |
+
price_estimation: bool = False,
|
71 |
+
# upload_dir: Union[str, Path] = None,
|
72 |
+
promptname: str = None,
|
73 |
+
exp_name: str = None,
|
74 |
+
api_key: Optional[str] = None,
|
75 |
+
evaluation_model: str = "gpt-4o-mini",
|
76 |
+
update_interval: float = 1.0,
|
77 |
+
):
|
78 |
+
# Use environment variable for API key
|
79 |
+
ptn = f"{str(st.session_state.upfiles_dir)}"
|
80 |
+
outdir = Path(ptn)
|
81 |
+
if exp_name:
|
82 |
+
outdir = outdir / exp_name
|
83 |
+
|
84 |
+
command = f"python ../varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
|
85 |
+
if price_estimation:
|
86 |
+
command = f"{command} -c"
|
87 |
+
else:
|
88 |
+
command = command.replace("python", "yes | python ")
|
89 |
+
print(command)
|
90 |
+
|
91 |
+
api_key = None # clear immediately
|
92 |
+
|
93 |
+
process = subprocess.Popen(
|
94 |
+
command,
|
95 |
+
stdout=subprocess.PIPE,
|
96 |
+
stderr=subprocess.STDOUT,
|
97 |
+
stdin=subprocess.PIPE,
|
98 |
+
text=True,
|
99 |
+
bufsize=1,
|
100 |
+
shell=True,
|
101 |
+
)
|
102 |
+
|
103 |
+
# Set stdout and stdin to non-blocking mode
|
104 |
+
os.set_blocking(process.stdout.fileno(), False)
|
105 |
+
|
106 |
+
last_update_time = time.time()
|
107 |
+
terminal_output = st.empty()
|
108 |
+
full_output = f"{command}\n"
|
109 |
+
while True:
|
110 |
+
# Check if we have output to read
|
111 |
+
if select.select([process.stdout], [], [], 0)[0]:
|
112 |
+
output = process.stdout.readline()
|
113 |
+
if output:
|
114 |
+
full_output += output
|
115 |
+
if price_estimation:
|
116 |
+
to_show = full_output
|
117 |
+
terminal_output.code(to_show, language="bash")
|
118 |
+
else:
|
119 |
+
current_time = time.time()
|
120 |
+
if current_time - last_update_time > update_interval:
|
121 |
+
lines = full_output.split("\n")
|
122 |
+
if len(lines) < 5:
|
123 |
+
to_show = full_output
|
124 |
+
else:
|
125 |
+
to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
|
126 |
+
terminal_output.code(to_show, language="bash")
|
127 |
+
last_update_time = current_time
|
128 |
+
print(output)
|
129 |
+
time.sleep(0.1)
|
130 |
+
# Check if the process has finished
|
131 |
+
if process.poll() is not None:
|
132 |
+
# Read any remaining output
|
133 |
+
remaining_output = process.stdout.read()
|
134 |
+
if remaining_output:
|
135 |
+
lines = remaining_output.split("\n")
|
136 |
+
if len(lines) > 10:
|
137 |
+
to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
|
138 |
+
else:
|
139 |
+
to_show += remaining_output
|
140 |
+
terminal_output.code(to_show, language="bash")
|
141 |
+
print(remaining_output)
|
142 |
+
break
|
143 |
+
|
144 |
+
return_code = process.poll()
|
145 |
+
return outdir, return_code
|
146 |
+
|
147 |
+
|
148 |
+
def main():
|
149 |
+
# init lang
|
150 |
+
st.session_state["korean"] = st.session_state.get("korean", False)
|
151 |
+
|
152 |
+
sidebar_placeholder = default_page_setting()
|
153 |
+
set_nav_bar(
|
154 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
|
155 |
+
)
|
156 |
+
|
157 |
+
st.title("โ๏ธ VARCO ARENA โ๏ธ")
|
158 |
+
if st.session_state.korean:
|
159 |
+
st.write(
|
160 |
+
"**VARCO Arena๋ ๊ฐ ๋ชจ๋ธ์ ์์ฑ๋ ๊ฒฐ๊ณผ๋ฅผ ๋น๊ต ํ๊ฐํ์ฌ ๋ชจ๋ธ์ ์ฑ๋ฅ ์์๋ฅผ ์ ๊ณตํ๋ ์์คํ
์
๋๋ค. ์ปค์คํ
ํ
์คํธ์
์ ์ ์ฉํ๊ฒ ํ์ฉํ ์ ์์ผ๋ฉฐ, ๋ชจ๋ฒ๋ต์์ ํ์๋ก ํ์ง ์์ต๋๋ค.**"
|
161 |
+
)
|
162 |
+
else:
|
163 |
+
st.write(
|
164 |
+
"**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
|
165 |
+
)
|
166 |
+
|
167 |
+
st.divider()
|
168 |
+
# Set up the file uploader
|
169 |
+
if st.session_state.korean:
|
170 |
+
st.markdown("๋ชจ๋ธ ์ถ๋ ฅํ์ผ ์
๋ก๋")
|
171 |
+
else:
|
172 |
+
st.markdown("### 1. Upload LLM responses")
|
173 |
+
uploaded_files = st.file_uploader(
|
174 |
+
"Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
|
175 |
+
)
|
176 |
+
|
177 |
+
# upload state
|
178 |
+
if "upfiles_dir" not in st.session_state:
|
179 |
+
st.session_state.upfiles_dir = None
|
180 |
+
if st.button("Upload Files"):
|
181 |
+
st.session_state.upfiles_dir = upload_files(uploaded_files)
|
182 |
+
# st.success(st.session_state.upfiles_dir)
|
183 |
+
|
184 |
+
# st.markdown("**๐ฅ์ฃผ์: ์ค๋ณต๋ ํ
์คํธ ์๋๋ฆฌ์ค๋ ์ค๋ฅ๋ก ์ฒ๋ฆฌ๋ฉ๋๋ค๐ฅ**")
|
185 |
+
if st.session_state.korean:
|
186 |
+
with st.expander("โโ ๋ฌด์์ ์
๋ก๋ ํ๋์โโ"):
|
187 |
+
st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
|
188 |
+
else:
|
189 |
+
with st.expander("โโ What should I upload โโ"):
|
190 |
+
st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
|
191 |
+
|
192 |
+
# Form for cost estimation
|
193 |
+
with st.form("cost_estimation_form"):
|
194 |
+
if st.session_state.korean:
|
195 |
+
st.write("### 2. ๊ฐ๊ฒฉ ์ฐ์ ")
|
196 |
+
else:
|
197 |
+
st.write("### 2. Cost Estimation")
|
198 |
+
eval_model = st.selectbox(
|
199 |
+
"Select Judge",
|
200 |
+
open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
|
201 |
+
)
|
202 |
+
promptname = st.selectbox(
|
203 |
+
"Select Evalutaion Prompt",
|
204 |
+
open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
|
205 |
+
)
|
206 |
+
if promptname == USR_SUB:
|
207 |
+
raise ValueError(
|
208 |
+
f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
|
209 |
+
)
|
210 |
+
estimate_button = st.form_submit_button("Calculate Cost!")
|
211 |
+
with st.expander(
|
212 |
+
"LLM Judge์ ํ์ฉ๋๋ ํ๋กฌํํธ (`Calculate Cost!` ํด๋ฆญ์ ๊ฐฑ์ )"
|
213 |
+
if st.session_state.korean
|
214 |
+
else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
|
215 |
+
):
|
216 |
+
prompt = load_prompt(promptname, task="-")
|
217 |
+
kwargs = dict(
|
218 |
+
inst="{inst}",
|
219 |
+
src="{src}",
|
220 |
+
out_a="{out_a}",
|
221 |
+
out_b="{out_b}",
|
222 |
+
task="-",
|
223 |
+
)
|
224 |
+
if promptname == "translation_pair":
|
225 |
+
kwargs["source_lang"] = "{source_lang}"
|
226 |
+
kwargs["target_lang"] = "{target_lang}"
|
227 |
+
prompt_cmpl = prompt.complete_prompt(**kwargs)
|
228 |
+
|
229 |
+
st.markdown(f"### Evaluation Prompt: {promptname}")
|
230 |
+
for msg in prompt_cmpl:
|
231 |
+
st.markdown(f"**{msg['role']}**")
|
232 |
+
st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
|
233 |
+
|
234 |
+
if estimate_button:
|
235 |
+
if st.session_state.get("upfiles_dir") is None:
|
236 |
+
st.error(
|
237 |
+
"โ Requirements: You have to upload jsonlines files first to proceed"
|
238 |
+
)
|
239 |
+
else:
|
240 |
+
st.markdown("##### Estimated Cost")
|
241 |
+
dummy_api_key = "dummy"
|
242 |
+
dummy_exp_name = "dummy"
|
243 |
+
result_file_path, return_code = run_varco_arena(
|
244 |
+
# upload_dir=st.session_state.upfiles_dir,
|
245 |
+
promptname=promptname,
|
246 |
+
api_key=dummy_api_key,
|
247 |
+
exp_name=dummy_exp_name,
|
248 |
+
price_estimation=True,
|
249 |
+
evaluation_model=eval_model,
|
250 |
+
)
|
251 |
+
if return_code:
|
252 |
+
st.error("โ RuntimeError: An error occurred during cost estimation")
|
253 |
+
else:
|
254 |
+
st.success("โ
Cost estimation completed successfully")
|
255 |
+
st.session_state.cost_estimated = True
|
256 |
+
|
257 |
+
# Form for actual run
|
258 |
+
with st.form("run_arena_form"):
|
259 |
+
if st.session_state.korean:
|
260 |
+
st.write("### 3. Varco Arena ๊ตฌ๋ํ๊ธฐ")
|
261 |
+
else:
|
262 |
+
st.write("### 3. Run Varco Arena")
|
263 |
+
api_key = st.text_input("Enter your OpenAI API Key", type="password")
|
264 |
+
exp_name = st.text_input("(Optional) Enter Exp. name")
|
265 |
+
exp_name = exp_name.replace(
|
266 |
+
"..", "_"
|
267 |
+
) # May cause rmtree problem later. Block it.
|
268 |
+
exp_name = exp_name.replace(
|
269 |
+
USR_SUB, f"-{USR_SUB}-"
|
270 |
+
) # May cause rmtree problem later. Block it.
|
271 |
+
exp_name = exp_name.replace("/", "-")
|
272 |
+
exp_name = exp_name.replace(" ", "_")
|
273 |
+
exp_name = exp_name.replace("~", "_")
|
274 |
+
|
275 |
+
if st.session_state.korean:
|
276 |
+
st.write("**์ฃผ์**:`Ctrl+C` ๋ฒํผ์ ๊ตฌํ๋์ง ์์์ต๋๋ค. ๊ตฌ๋ ์ ์๊ณ ํด์ฃผ์ธ์.")
|
277 |
+
else:
|
278 |
+
st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
|
279 |
+
run_button = st.form_submit_button(
|
280 |
+
"๐ฅ Run Arena!",
|
281 |
+
disabled=(not st.session_state.get("cost_estimated", False))
|
282 |
+
or "result_file_path"
|
283 |
+
in st.session_state.keys(), # run already performed once
|
284 |
+
)
|
285 |
+
|
286 |
+
if run_button:
|
287 |
+
set_nav_bar(
|
288 |
+
True,
|
289 |
+
sidebar_placeholder=sidebar_placeholder,
|
290 |
+
toggle_hashstr="app_during_run",
|
291 |
+
)
|
292 |
+
if st.session_state.get("upfiles_dir") is None:
|
293 |
+
st.error(
|
294 |
+
"โ Requirements: You have to upload jsonlines files first to proceed"
|
295 |
+
)
|
296 |
+
elif not api_key:
|
297 |
+
st.error("โ Requirements: OpenAI key required to run VA.")
|
298 |
+
else:
|
299 |
+
result_file_path, return_code = run_varco_arena(
|
300 |
+
# upload_dir=st.session_state.upfiles_dir,
|
301 |
+
promptname=promptname,
|
302 |
+
api_key=api_key,
|
303 |
+
exp_name=exp_name,
|
304 |
+
price_estimation=False,
|
305 |
+
evaluation_model=eval_model,
|
306 |
+
)
|
307 |
+
if return_code:
|
308 |
+
st.error("โ RuntimeError: An error occurred during Varco Arena run")
|
309 |
+
else:
|
310 |
+
st.success("โ
Varco Arena run completed successfully")
|
311 |
+
st.session_state.result_file_path = result_file_path
|
312 |
+
set_nav_bar(
|
313 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
|
314 |
+
)
|
315 |
+
|
316 |
+
|
317 |
+
if __name__ == "__main__":
|
318 |
+
main()
|
streamlit_app_local/eval_models_list.txt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
gpt-4o-mini
|
2 |
+
gpt-4o-2024-05-13
|
3 |
+
gpt-4o-2024-08-06
|
streamlit_app_local/eval_prompt_list.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
llmbar
|
2 |
+
llmbar_brief
|
3 |
+
translation_pair
|
4 |
+
rag_pair_kr
|
5 |
+
contextual (WIP)
|
streamlit_app_local/guide_mds
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
../guide_mds/
|
streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc
ADDED
Binary file (2.85 kB). View file
|
|
streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc
ADDED
Binary file (889 Bytes). View file
|
|
streamlit_app_local/modules/nav.py
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
|
4 |
+
def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
|
5 |
+
with sidebar_placeholder.container():
|
6 |
+
st.toggle(
|
7 |
+
"๐ฐ๐ท ํ๊ตญ์ด๋ก ๋ณด๊ธฐ",
|
8 |
+
value=st.session_state.get("korean", False),
|
9 |
+
key=f"korean_toggle_{toggle_hashstr}", # Add this explicit key
|
10 |
+
on_change=lambda: setattr(
|
11 |
+
st.session_state,
|
12 |
+
"korean",
|
13 |
+
st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
|
14 |
+
),
|
15 |
+
)
|
16 |
+
|
17 |
+
st.page_link(
|
18 |
+
"app.py",
|
19 |
+
label="Varco Arena ๊ตฌ๋" if st.session_state.korean else "Run VARCO Arena",
|
20 |
+
icon="๐ฅ",
|
21 |
+
)
|
22 |
+
st.page_link(
|
23 |
+
"pages/see_results.py",
|
24 |
+
label="๊ฒฐ๊ณผ ๋ณด๊ธฐ" if st.session_state.korean else "See Results",
|
25 |
+
icon="๐",
|
26 |
+
disabled=st.session_state.get("is_running", False),
|
27 |
+
)
|
28 |
+
st.page_link(
|
29 |
+
"pages/brief_intro.py",
|
30 |
+
label="์ด๋ป๊ฒ ์๋ํ๋์?" if st.session_state.korean else "How it Works",
|
31 |
+
icon="โ",
|
32 |
+
disabled=st.session_state.get("is_running", False),
|
33 |
+
)
|
34 |
+
related_links_en = """
|
35 |
+
**About**
|
36 |
+
* [Paper](https://huggingface.co/papers/2411.01281)
|
37 |
+
* [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
38 |
+
* [Inquiry](https://linkedin.com/in/deftson/)
|
39 |
+
""".strip()
|
40 |
+
related_links_kr = """
|
41 |
+
**About**
|
42 |
+
* [๋
ผ๋ฌธ](https://huggingface.co/papers/2411.01281)
|
43 |
+
* [๋ธ๋ก๊ทธ](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
|
44 |
+
* [๋ฌธ์](https://linkedin.com/in/deftson/)
|
45 |
+
""".strip()
|
46 |
+
st.info(related_links_kr if st.session_state.korean else related_links_en)
|
streamlit_app_local/pages/brief_intro.py
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
from view_utils import default_page_setting, set_nav_bar
|
4 |
+
|
5 |
+
sidebar_placeholder = default_page_setting()
|
6 |
+
set_nav_bar(
|
7 |
+
False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
|
8 |
+
)
|
9 |
+
|
10 |
+
st.image("va_en.png")
|
11 |
+
if st.session_state.korean:
|
12 |
+
st.info("์ฌ๋ฌ๋ถ์ด ์
๋ก๋ํ๋ ํ์ผ์ ์ ๊ทธ๋ฆผ์ LLM ์ถ๋ ฅ๋ค์
๋๋ค.")
|
13 |
+
with st.expander("โโ ๋ฌด์์ ์
๋ก๋ ํ๋์โโ"):
|
14 |
+
st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
|
15 |
+
else:
|
16 |
+
st.info(
|
17 |
+
"What you upload is the generated responses of LLMs regarding the test prompts."
|
18 |
+
)
|
19 |
+
with st.expander("โโ What should I upload โโ"):
|
20 |
+
st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
|
21 |
+
|
22 |
+
|
23 |
+
st.image("va_concept_new.png")
|
24 |
+
st.markdown(
|
25 |
+
"""
|
26 |
+
| |Current Practice|Varco Arena|
|
27 |
+
|-|-|-|
|
28 |
+
|Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
|
29 |
+
|No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
|
30 |
+
|Comparison type|reference-mediated (anchored) |direct|
|
31 |
+
"""
|
32 |
+
)
|
33 |
+
if st.session_state.korean:
|
34 |
+
st.info(
|
35 |
+
"Varco Arena๋ ์ ๋ขฐ์ฑ ์๋ ์์๋ฅผ ๋ ์ ์ ํ์์ ๋น๊ต ๋ด์ ์ป์ด๋ด๋ฉฐ, ์ด๋ฌํ ํน์ง์ LLM ์ง์ ๋น๊ต์ ์ด์ ์ผ๋ก๋ถํฐ ๊ธฐ์ธํฉ๋๋ค."
|
36 |
+
)
|
37 |
+
else:
|
38 |
+
st.info(
|
39 |
+
"Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
|
40 |
+
)
|
streamlit_app_local/pages/see_results.py
ADDED
@@ -0,0 +1,391 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
import streamlit as st
|
3 |
+
from varco_arena_core.prompts import load_prompt
|
4 |
+
|
5 |
+
import analysis_utils as au
|
6 |
+
from analysis_utils import number_breakdown_from_df
|
7 |
+
from app import VA_ROOT
|
8 |
+
from query_comp import QueryWrapper, get_base_url
|
9 |
+
from view_utils import (
|
10 |
+
default_page_setting,
|
11 |
+
escape_markdown,
|
12 |
+
set_nav_bar,
|
13 |
+
show_linebreak_in_md,
|
14 |
+
visualization,
|
15 |
+
)
|
16 |
+
|
17 |
+
DEFAULT_LAYOUT_DICT = {
|
18 |
+
"title": {"font": {"size": 20, "family": "Gothic A1"}},
|
19 |
+
"font": {"size": 16, "family": "Gothic A1"},
|
20 |
+
"xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
|
21 |
+
"yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
|
22 |
+
"legend": {"font": {"size": 12, "family": "Gothic A1"}},
|
23 |
+
}
|
24 |
+
|
25 |
+
|
26 |
+
@st.cache_data
|
27 |
+
def load_data():
|
28 |
+
all_result_dict = {}
|
29 |
+
df_dict = {}
|
30 |
+
# result_dir_list = os.listdir(VA_ROOT)
|
31 |
+
# result_dir_list = sorted(result_dir_list, key=lambda x: os.stat(os.path.join(VA_ROOT, x)).st_mtime, reverse=True)
|
32 |
+
for f in VA_ROOT.glob("**/result.json"):
|
33 |
+
print(f)
|
34 |
+
|
35 |
+
for resjsonf in VA_ROOT.glob("**/result.json"):
|
36 |
+
df = pd.read_json(resjsonf)
|
37 |
+
for col in ["tstamp", "logs"]:
|
38 |
+
if col in df.columns:
|
39 |
+
df.drop(columns=[col], inplace=True)
|
40 |
+
df = au.index_test_scenario(df)
|
41 |
+
|
42 |
+
fig_dict_per_task = {}
|
43 |
+
df_dict_per_task = {}
|
44 |
+
fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
|
45 |
+
df_dict_per_task["Overall"] = df
|
46 |
+
for task in df["task"].unique().tolist():
|
47 |
+
df_task = df[df["task"] == task]
|
48 |
+
fig_dict_per_task[task] = visualization(df_task, is_overall=False)
|
49 |
+
df_dict_per_task[task] = df_task
|
50 |
+
|
51 |
+
prm_name = resjsonf.parts[-2]
|
52 |
+
exp_name = resjsonf.parts[-3]
|
53 |
+
key = f"{exp_name}/{prm_name}"
|
54 |
+
all_result_dict[key] = fig_dict_per_task
|
55 |
+
df_dict[key] = df_dict_per_task
|
56 |
+
return all_result_dict, df_dict
|
57 |
+
|
58 |
+
|
59 |
+
def navigate(t, source, key, val):
|
60 |
+
# print(key, val)
|
61 |
+
if source is None:
|
62 |
+
return
|
63 |
+
target_index = t.index(source) + val
|
64 |
+
if 0 <= target_index < len(t):
|
65 |
+
st.session_state[key] = t[target_index]
|
66 |
+
st.rerun()
|
67 |
+
|
68 |
+
|
69 |
+
def main():
|
70 |
+
sidebar_placeholder = default_page_setting(layout="wide")
|
71 |
+
set_nav_bar(
|
72 |
+
False,
|
73 |
+
sidebar_placeholder=sidebar_placeholder,
|
74 |
+
toggle_hashstr="see_results_init",
|
75 |
+
)
|
76 |
+
# load the data
|
77 |
+
st.session_state["all_result_dict"], st.session_state["df_dict"] = load_data()
|
78 |
+
|
79 |
+
# side bar
|
80 |
+
st.sidebar.title("Select Result:")
|
81 |
+
result_select = QueryWrapper("expname", "")(
|
82 |
+
st.sidebar.selectbox,
|
83 |
+
list(st.session_state["all_result_dict"].keys()),
|
84 |
+
)
|
85 |
+
|
86 |
+
if result_select is None:
|
87 |
+
st.stop()
|
88 |
+
eval_prompt_name = result_select.split("/")[-1].strip()
|
89 |
+
if st.sidebar.button("Clear Cache"):
|
90 |
+
st.cache_data.clear()
|
91 |
+
st.cache_resource.clear()
|
92 |
+
st.rerun()
|
93 |
+
|
94 |
+
if result_select:
|
95 |
+
if "alpha2names" in st.session_state:
|
96 |
+
del st.session_state["alpha2names"]
|
97 |
+
|
98 |
+
fig_dict_per_task = st.session_state["all_result_dict"][result_select]
|
99 |
+
task_list = list(fig_dict_per_task.keys())
|
100 |
+
elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
|
101 |
+
# tabs = st.tabs(task_list)
|
102 |
+
|
103 |
+
df_dict_per_task = st.session_state["df_dict"][result_select]
|
104 |
+
|
105 |
+
default_layout_dict = DEFAULT_LAYOUT_DICT
|
106 |
+
task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
|
107 |
+
|
108 |
+
if task is None:
|
109 |
+
st.stop()
|
110 |
+
figure_dict = fig_dict_per_task[task]
|
111 |
+
judgename = figure_dict["judgename"]
|
112 |
+
df = df_dict_per_task[task]
|
113 |
+
interpretation, n_models, size_testset = number_breakdown_from_df(df)
|
114 |
+
if st.session_state.korean:
|
115 |
+
st.markdown(f"## ๊ฒฐ๊ณผ ({task})")
|
116 |
+
st.markdown(f"##### Judge ๋ชจ๋ธ: {judgename} / ํ๊ฐํ๋กฌ: {eval_prompt_name}")
|
117 |
+
st.markdown(f"##### ํ
์คํธ์
์ฌ์ด์ฆ: {int(size_testset)} ํ")
|
118 |
+
else:
|
119 |
+
st.markdown(f"## Results ({task})")
|
120 |
+
st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
|
121 |
+
st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
|
122 |
+
col1, col2 = st.columns(2)
|
123 |
+
with col1:
|
124 |
+
with st.container(border=True):
|
125 |
+
st.markdown(f"#### Ratings ({task})")
|
126 |
+
st.table(figure_dict["elo_rating"])
|
127 |
+
st.write(show_linebreak_in_md(escape_markdown(interpretation)))
|
128 |
+
st.markdown(f"**files at:** \n`{str(VA_ROOT.resolve()/result_select)}`")
|
129 |
+
|
130 |
+
with col2:
|
131 |
+
with st.container(border=True):
|
132 |
+
st.plotly_chart(
|
133 |
+
elo_rating_by_task.update_layout(**default_layout_dict),
|
134 |
+
use_container_width=True,
|
135 |
+
key=f"{task}_elo_rating_by_task",
|
136 |
+
)
|
137 |
+
|
138 |
+
st.divider()
|
139 |
+
|
140 |
+
if st.session_state.korean:
|
141 |
+
st.markdown("### ํ ๋๋จผํธ (ํ
์คํธ ์๋๋ฆฌ์ค) ๋ณ๋ก ๋ณด๊ธฐ")
|
142 |
+
else:
|
143 |
+
st.markdown("### Tournament Results by Test Scenario")
|
144 |
+
|
145 |
+
# with st.expander("๋ณผ ํ ๋๋จผํธ ๊ณ ๋ฅด๊ธฐ"):
|
146 |
+
d = list(df.idx_inst_src.unique())
|
147 |
+
default_idx = st.session_state.get("selected_tournament", None)
|
148 |
+
cols = st.columns((1, 18, 1))
|
149 |
+
with cols[0]:
|
150 |
+
if st.button("โ", key="prev_tournament"):
|
151 |
+
navigate(d, default_idx, "selected_tournament", -1)
|
152 |
+
with cols[1]:
|
153 |
+
tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
|
154 |
+
st.selectbox,
|
155 |
+
d,
|
156 |
+
default_idx,
|
157 |
+
key=f"{task}_tournament_select",
|
158 |
+
on_change=lambda: st.session_state.update(
|
159 |
+
selected_tournament=st.session_state.get(f"{task}_tournament_select"),
|
160 |
+
selected_match=None,
|
161 |
+
),
|
162 |
+
label_visibility="collapsed",
|
163 |
+
)
|
164 |
+
with cols[2]:
|
165 |
+
if st.button("โถ", key="next_tournament"):
|
166 |
+
navigate(d, default_idx, "selected_tournament", 1)
|
167 |
+
|
168 |
+
# tournament_prm_select = st.selectbox(
|
169 |
+
# "Select Tournament",
|
170 |
+
# df.idx_inst_src.unique(),
|
171 |
+
# index=d.index(st.session_state.get("selected_tournament")),
|
172 |
+
# key=f"{task}_tournament_{result_select}",
|
173 |
+
# )
|
174 |
+
|
175 |
+
# print(tournament_prm_select, type(tournament_prm_select))
|
176 |
+
st.session_state["selected_tournament"] = tournament_prm_select
|
177 |
+
# tournament_prm_select = st.selectbox(
|
178 |
+
# "Select Tournament",
|
179 |
+
# df.idx_inst_src.unique(),
|
180 |
+
# key=f"{task}_tournament_{result_select}",
|
181 |
+
# )
|
182 |
+
df_now_processed = None
|
183 |
+
if tournament_prm_select:
|
184 |
+
df_now = df[df.idx_inst_src == tournament_prm_select]
|
185 |
+
df_now_processed, _alpha2names = au.init_tournament_dataframe(
|
186 |
+
df_now,
|
187 |
+
alpha2names=st.session_state["alpha2names"]
|
188 |
+
if "alpha2names" in st.session_state.keys()
|
189 |
+
else None,
|
190 |
+
)
|
191 |
+
if "alpha2names" not in st.session_state:
|
192 |
+
st.session_state["alpha2names"] = _alpha2names
|
193 |
+
|
194 |
+
try:
|
195 |
+
bracket_drawing = au.draw(
|
196 |
+
df_now_processed,
|
197 |
+
alpha2names=st.session_state["alpha2names"],
|
198 |
+
)
|
199 |
+
legend = au.make_legend_str(
|
200 |
+
df_now_processed, st.session_state["alpha2names"]
|
201 |
+
)
|
202 |
+
st.code(bracket_drawing + legend)
|
203 |
+
|
204 |
+
m = list(df_now_processed.human_readable_idx)
|
205 |
+
default_idx = st.session_state.get("selected_match", None)
|
206 |
+
cols = st.columns((1, 18, 1))
|
207 |
+
with cols[0]:
|
208 |
+
if st.button("โ", key="prev_match"):
|
209 |
+
navigate(m, default_idx, "selected_match", -1)
|
210 |
+
with cols[1]:
|
211 |
+
match_idx_human = QueryWrapper("match", "Select Match")(
|
212 |
+
st.selectbox,
|
213 |
+
m,
|
214 |
+
default_idx,
|
215 |
+
key=f"{task}_match_select",
|
216 |
+
label_visibility="collapsed",
|
217 |
+
)
|
218 |
+
with cols[2]:
|
219 |
+
if st.button("โถ", key="next_match"):
|
220 |
+
navigate(m, default_idx, "selected_match", 1)
|
221 |
+
|
222 |
+
# match_idx_human = st.selectbox(
|
223 |
+
# "Select Match",
|
224 |
+
# df_now_processed.human_readable_idx,
|
225 |
+
# key=f"{task}_match_{result_select}",
|
226 |
+
# )
|
227 |
+
# print(match_idx_human)
|
228 |
+
st.session_state["selected_match"] = match_idx_human
|
229 |
+
# match_idx_human = st.selectbox(
|
230 |
+
# "Select Match",
|
231 |
+
# df_now_processed.human_readable_idx,
|
232 |
+
# key=f"{task}_match_{result_select}",
|
233 |
+
# )
|
234 |
+
if match_idx_human:
|
235 |
+
match_idx = int(match_idx_human.split(": ")[0])
|
236 |
+
row = df_now_processed.loc[match_idx]
|
237 |
+
|
238 |
+
st.markdown("#### Current Test Scenario:")
|
239 |
+
|
240 |
+
with st.expander(
|
241 |
+
f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
|
242 |
+
):
|
243 |
+
prompt = load_prompt(eval_prompt_name, task=task)
|
244 |
+
kwargs = dict(
|
245 |
+
inst="{inst}",
|
246 |
+
src="{src}",
|
247 |
+
out_a="{out_a}",
|
248 |
+
out_b="{out_b}",
|
249 |
+
task=task,
|
250 |
+
)
|
251 |
+
if eval_prompt_name == "translation_pair":
|
252 |
+
kwargs["source_lang"] = "{source_lang}"
|
253 |
+
kwargs["target_lang"] = "{target_lang}"
|
254 |
+
prompt_cmpl = prompt.complete_prompt(**kwargs)
|
255 |
+
for msg in prompt_cmpl:
|
256 |
+
st.markdown(f"**{msg['role']}**")
|
257 |
+
st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
|
258 |
+
|
259 |
+
st.info(show_linebreak_in_md(tournament_prm_select))
|
260 |
+
|
261 |
+
winner = row.winner
|
262 |
+
col1, col2 = st.columns(2)
|
263 |
+
|
264 |
+
winnerbox = st.success
|
265 |
+
loserbox = st.error
|
266 |
+
with col1:
|
267 |
+
iswinner = winner == "model_a"
|
268 |
+
writemsg = winnerbox if iswinner else loserbox
|
269 |
+
st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
|
270 |
+
writemsg(
|
271 |
+
show_linebreak_in_md(row.generated_a),
|
272 |
+
icon="โ
" if iswinner else "โ",
|
273 |
+
)
|
274 |
+
with col2:
|
275 |
+
iswinner = winner == "model_b"
|
276 |
+
writemsg = winnerbox if iswinner else loserbox
|
277 |
+
st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
|
278 |
+
writemsg(
|
279 |
+
show_linebreak_in_md(row.generated_b),
|
280 |
+
icon="โ
" if iswinner else "โ",
|
281 |
+
)
|
282 |
+
except Exception as e:
|
283 |
+
import traceback
|
284 |
+
|
285 |
+
traceback.print_exc()
|
286 |
+
st.markdown(
|
287 |
+
"**Bug: ์๋ ํ๋ฅผ ๋ณต์ฌํด์ ์ด์๋ก ๋จ๊ฒจ์ฃผ์๋ฉด ๊ฐ์ ์ ๋์์ด ๋ฉ๋๋ค. ๊ฐ์ฌํฉ๋๋ค๐**"
|
288 |
+
if st.session_state.korean
|
289 |
+
else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.๐"
|
290 |
+
)
|
291 |
+
st.error(e)
|
292 |
+
st.info(tournament_prm_select)
|
293 |
+
st.table(
|
294 |
+
df_now_processed[
|
295 |
+
[
|
296 |
+
"depth",
|
297 |
+
"round",
|
298 |
+
"winner_nodes",
|
299 |
+
"winner_resolved",
|
300 |
+
"winner",
|
301 |
+
"model_a",
|
302 |
+
"model_b",
|
303 |
+
]
|
304 |
+
]
|
305 |
+
)
|
306 |
+
st.write("Sharable link")
|
307 |
+
st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
|
308 |
+
st.divider()
|
309 |
+
|
310 |
+
if st.session_state.korean:
|
311 |
+
st.markdown("### ๋งค์น ํต๊ณ")
|
312 |
+
else:
|
313 |
+
st.markdown("### Match Stats.")
|
314 |
+
col1, col2 = st.columns(2)
|
315 |
+
col1, col2 = st.columns(2)
|
316 |
+
with col1:
|
317 |
+
with st.container(border=True):
|
318 |
+
st.plotly_chart(
|
319 |
+
figure_dict[
|
320 |
+
"fraction_of_model_a_wins_for_all_a_vs_b_matches"
|
321 |
+
].update_layout(autosize=True, **default_layout_dict),
|
322 |
+
use_container_width=True,
|
323 |
+
key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
|
324 |
+
)
|
325 |
+
with col2:
|
326 |
+
with st.container(border=True):
|
327 |
+
st.plotly_chart(
|
328 |
+
figure_dict["match_count_of_each_combination_of_models"].update_layout(
|
329 |
+
autosize=True, **default_layout_dict
|
330 |
+
),
|
331 |
+
use_container_width=True,
|
332 |
+
key=f"{task}_match_count_of_each_combination_of_models",
|
333 |
+
)
|
334 |
+
|
335 |
+
with col1:
|
336 |
+
with st.container(border=True):
|
337 |
+
st.plotly_chart(
|
338 |
+
figure_dict["match_count_for_each_model"].update_layout(
|
339 |
+
**default_layout_dict
|
340 |
+
),
|
341 |
+
use_container_width=True,
|
342 |
+
key=f"{task}_match_count_for_each_model",
|
343 |
+
)
|
344 |
+
with col2:
|
345 |
+
pass
|
346 |
+
|
347 |
+
if st.session_state.korean:
|
348 |
+
st.markdown("### ์ฐธ๊ณ ์ฉ LLM Judge ํธํฅ ์ ๋ณด")
|
349 |
+
else:
|
350 |
+
st.markdown("### FYI: How biased is your LLM Judge?")
|
351 |
+
|
352 |
+
with st.expander("ํผ์ณ์ ๋ณด๊ธฐ" if st.session_state.korean else "Expand to show"):
|
353 |
+
st.info(
|
354 |
+
"""
|
355 |
+
Varco Arena์์๋ position bias์ ์ํฅ์ ์ต์ํํ๊ธฐ ์ํด ๋ชจ๋ ๋ชจ๋ธ์ด A๋ B์์น์ ๋ฒ๊ฐ์ ์์นํ๋๋ก ํ์์ต๋๋ค. ๊ทธ๋ฌ๋ LLM Judge ํน์ Prompt์ ์ฑ๋ฅ์ด ๋ถ์กฑํ๋ค๊ณ ๋๊ปด์ง๋ค๋ฉด, ์๋ ์๋ ค์ง LLM Judge bias๊ฐ ์ฐธ๊ณ ๊ฐ ๋ ๊ฒ๋๋ค.
|
356 |
+
* position bias (์ผ์ชฝ)
|
357 |
+
* length bias (์ค๋ฅธ์ชฝ)
|
358 |
+
|
359 |
+
๊ฒฐ๊ณผ์ ์๊ณก์ด LLM Judge์ ๋ถ์กฑํจ ๋๋ฌธ์ด์๋ค๋ ์ ์ ๊ท๋ช
ํ๋ ค๋ฉด ์ฌ์ฉํ์ LLM Judge์ Prompt์ binary classification ์ ํ๋๋ฅผ ์ธก์ ํด๋ณด์๊ธธ ๋ฐ๋๋๋ค (Varco Arena๋ฅผ ํ์ฉํ์ฌ ์ด๋ฅผ ์ํํด๋ณผ ์ ์์ต๋๋ค!).""".strip()
|
360 |
+
if st.session_state.korean
|
361 |
+
else """
|
362 |
+
In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
|
363 |
+
* position bias (left)
|
364 |
+
* length bias (right)
|
365 |
+
|
366 |
+
To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
|
367 |
+
""".strip()
|
368 |
+
)
|
369 |
+
st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
|
370 |
+
col1, col2 = st.columns(2)
|
371 |
+
with col1:
|
372 |
+
with st.container(border=True):
|
373 |
+
st.plotly_chart(
|
374 |
+
figure_dict["counts_of_match_winners"].update_layout(
|
375 |
+
**default_layout_dict
|
376 |
+
),
|
377 |
+
use_container_width=True,
|
378 |
+
key=f"{task}_counts_of_match_winners",
|
379 |
+
)
|
380 |
+
with col2:
|
381 |
+
with st.container(border=True):
|
382 |
+
st.plotly_chart(
|
383 |
+
figure_dict["length_bias"].update_layout(**default_layout_dict),
|
384 |
+
use_container_width=True,
|
385 |
+
key=f"{task}_length_bias",
|
386 |
+
)
|
387 |
+
st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
|
388 |
+
|
389 |
+
|
390 |
+
if __name__ == "__main__":
|
391 |
+
main()
|
streamlit_app_local/query_comp.py
ADDED
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# query๋ฅผ ์๋์ผ๋ก ์ฝ๊ณ ์ฐ๋ container๋ฅผ ์ ์
|
2 |
+
|
3 |
+
from __future__ import annotations
|
4 |
+
|
5 |
+
import re
|
6 |
+
from typing import Callable, TypeVar
|
7 |
+
|
8 |
+
import streamlit as st
|
9 |
+
|
10 |
+
__all__ = ["QueryWrapper", "get_base_url"]
|
11 |
+
|
12 |
+
T = TypeVar("T")
|
13 |
+
|
14 |
+
|
15 |
+
import hashlib
|
16 |
+
import urllib.parse
|
17 |
+
|
18 |
+
|
19 |
+
def SHA1(msg: str) -> str:
|
20 |
+
return hashlib.sha1(msg.encode()).hexdigest()[:8]
|
21 |
+
|
22 |
+
|
23 |
+
def get_base_url():
|
24 |
+
session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
|
25 |
+
return urllib.parse.urlunparse(
|
26 |
+
[session.client.request.protocol, session.client.request.host, "", "", "", ""]
|
27 |
+
)
|
28 |
+
|
29 |
+
|
30 |
+
class QueryWrapper:
|
31 |
+
queries: dict[str, _QueryWrapper] = {} # ๊ธฐ๋ก์ฉ
|
32 |
+
|
33 |
+
def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
|
34 |
+
self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
|
35 |
+
query, label, use_hash
|
36 |
+
)
|
37 |
+
|
38 |
+
def __call__(self, *args, **kwargs):
|
39 |
+
return self.__wrapper(*args, **kwargs)
|
40 |
+
|
41 |
+
@classmethod
|
42 |
+
def get_sharable_link(cls):
|
43 |
+
# for k, v in cls.queries.items():
|
44 |
+
# print(f"{k}: {v}")
|
45 |
+
return re.sub(
|
46 |
+
"&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
|
47 |
+
).strip("&")
|
48 |
+
|
49 |
+
|
50 |
+
class _QueryWrapper:
|
51 |
+
ILLEGAL_CHARS = "&/=?"
|
52 |
+
|
53 |
+
def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
|
54 |
+
self.query = query
|
55 |
+
self.label = label or query
|
56 |
+
self.use_hash = use_hash
|
57 |
+
self.hash_table = {}
|
58 |
+
self.key = None
|
59 |
+
|
60 |
+
def __call__(
|
61 |
+
self,
|
62 |
+
base_container: Callable,
|
63 |
+
legal_list: list[T],
|
64 |
+
default: T | list[T] | None = None,
|
65 |
+
*,
|
66 |
+
key: str | None = None,
|
67 |
+
**kwargs,
|
68 |
+
) -> T | list[T] | None:
|
69 |
+
val_from_query = st.query_params.get_all(self.query.lower())
|
70 |
+
# print(val_from_query)
|
71 |
+
legal = len(val_from_query) > 0
|
72 |
+
self.key = key or self.label
|
73 |
+
|
74 |
+
self.hash_table = {SHA1(str(v)): v for v in legal_list}
|
75 |
+
|
76 |
+
# filter out illegal values
|
77 |
+
if legal and legal_list:
|
78 |
+
val_from_query = [v for v in val_from_query if v in self.hash_table]
|
79 |
+
# print(self.label, val_from_query, legal)
|
80 |
+
if legal:
|
81 |
+
selected = [self.hash_table[v] for v in val_from_query]
|
82 |
+
elif default:
|
83 |
+
selected = default
|
84 |
+
elif self.label in st.session_state:
|
85 |
+
selected = st.session_state[self.label]
|
86 |
+
if legal_list:
|
87 |
+
if isinstance(selected, list):
|
88 |
+
selected = [v for v in selected if v in legal_list]
|
89 |
+
elif selected not in legal_list:
|
90 |
+
selected = []
|
91 |
+
else:
|
92 |
+
selected = []
|
93 |
+
if len(selected) == 1 and base_container in [st.selectbox, st.radio]:
|
94 |
+
selected = selected[0]
|
95 |
+
# print(self.label, selected)
|
96 |
+
if base_container == st.checkbox:
|
97 |
+
selected = base_container(
|
98 |
+
self.label,
|
99 |
+
legal_list,
|
100 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
101 |
+
key=self.key,
|
102 |
+
**kwargs,
|
103 |
+
)
|
104 |
+
elif base_container == st.multiselect:
|
105 |
+
selected = base_container(
|
106 |
+
self.label, legal_list, default=selected, key=self.key, **kwargs
|
107 |
+
)
|
108 |
+
elif base_container == st.radio:
|
109 |
+
selected = base_container(
|
110 |
+
self.label,
|
111 |
+
legal_list,
|
112 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
113 |
+
key=self.key,
|
114 |
+
**kwargs,
|
115 |
+
)
|
116 |
+
elif base_container == st.selectbox:
|
117 |
+
selected = base_container(
|
118 |
+
self.label,
|
119 |
+
legal_list,
|
120 |
+
index=legal_list.index(selected) if selected in legal_list else None,
|
121 |
+
key=self.key,
|
122 |
+
**kwargs,
|
123 |
+
)
|
124 |
+
else:
|
125 |
+
selected = base_container(self.label, legal_list, key=self.key, **kwargs)
|
126 |
+
return st.session_state[self.key]
|
127 |
+
|
128 |
+
def __str__(self):
|
129 |
+
selected = st.session_state.get(self.key, None)
|
130 |
+
if isinstance(selected, str):
|
131 |
+
return f"{self.query.lower()}={SHA1(selected)}"
|
132 |
+
elif isinstance(selected, list):
|
133 |
+
return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
|
134 |
+
else:
|
135 |
+
return ""
|
streamlit_app_local/requirements.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
streamlit
|
2 |
+
plotly
|
3 |
+
scikit-learn
|
4 |
+
numpy
|
5 |
+
pandas
|
streamlit_app_local/run.sh
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
# ์ฌ๊ธฐ์ ๋๋ ค์ผํด์ (cwd = ./streamlit_app_components/)
|
3 |
+
VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py
|
streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"source":"D: ๊ธธ๋์ฐ๊ตฌ ์ถ์ฒ ๋งํฌ ํ์ผ๋\nD: ๋ค๋ค ๋๋ฅด๊ฑฐ๋ผ\nC: ๊ธธ๋์ฅ๋ ์์น๊ฐ ์ด๋์์?\nC: ์ฐพ์์...\nD: ๋ค ๊ธธ๋ ํ๊ทธ ๋ถ์์ผ๋๊น\nD: ๊ทธ์ชฝ์ผ๋ก ์ฑ ์ด๋ ํ์๊ณ \nD: ๋ณ๋ ฅ ๋ฃ์ผ๋ผ๊ณ ๋ ๋๋ ๋\nA: ํ์ดํ์ด\nD: ๋ฐ๊ฐ์ต๋๋ค ์ด์์ค์ญ์
\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ด๋ฆฌ๋ก ์ด๋ํด์ ์ค์ธ์\nC: ๋ค๋ค~~~~\nA: ํจ๊ปํด์ ~\nC: ์ฎ๊ฒผ์ต๋๋ค.\nA: โก\nD: ๊ธ๋ฃจํค์ ๋ง์ด๋ง์ด ์ก์์ฃผ์
์ผ\nD: ์ํ๋ธ์ ์์์ด ๋์ณ๋ฉ๋๋ค.\nA: ์๋ชจํ๋\nA: ๋งํฌ๊ฐ ๋๋ฌด ๋๊ธ๋๊ธํ์ญ๋๋ค\nA: ์ฒดํต์ ์ง์ผ์ฃผ์์ต์์\nD: ๋๋ ์ํด\nA: ์ผ์ผ\nD: ์ฌ๋ ์ข ๋ชจ์ด๋ฉด\nD: ํ์๊พผ ์ก์์ฌ\nB: ๊ณต๊ฒฉ๋ ์ฐธ์ฌ ๋ง์ด ํด์ฃผ์ธ์~!\nD: ๊ณต๊ฒฉ๋ ์ฐธ์ฌํด์ผ ์ ๋ง ์์ ๋์น๋ ์ํ๋ธ์ด ๋ฉ๋๋ค.\nD: ์ฌ๋ฌ๋ถ ํ์คํธํ์๋ฉด์ ์งํํ์๋ฉด ๋น ๋ฅธ ์ฑ์ฅ ๊ฐ๋ฅํฉ๋๋ค.\nD: ์ฌ๋ฌ๋ถ ์์ฃผ ๋ช
์ ๋ณด์๋ ๋ฐ์ผ์ธ์~\nD: ์์ฃผ ๋ช
์ ๋ ๋ฒจ 3 ๋์ ๋ณด์ ๋ฐ์ผ๋ฉด\nA: ๋ต ์๋ชจํ๋!!\nD: ๊ฑด์ค์ฌ๋กฏ 2๊ฐ!\nA: ๊ธ๋ฃจํค์ ๊ฐ์์ด ๊ทธ๋๊ทธ๋ ๋ค์๋ค์\nA: ํ๋ถ๋์ฉ ๊ณต๊ฒฉ๋ ๊ฑธ์ด์ฃผ์ธ์\nA: ํฌ๋ก์ฐ ์๋ณด์ด\nD: ์ด์์ค์ธ์~ :)\nA: ์น์ ํ ๋ชจํ์จ\nD: ์์ฐ\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ํฌ๋ก์ฐ ์๋ฝ์ด\nD: ๊ธธ๋ ์์ ๊ทผ์ฒ๋ก ์ด๋ํด ์ค์ธ์\nA: ๊ธ๋ฃจํค ์ฐธ์ฌํ์๋ฉด\nA: ์ํ๋ธ๋ ํ์กฑํด์ง๊ณ ๊ฐ์๋ ๋์ณํ๋ฅด๊ฒ ๋ฉ๋๋ค์\nA: 10๋ ์ง๋ฆฌ\nA: ๊ฐํ๋ชฌ์คํฐ ๊ฐ์ด ์ก์ผ์ค๋ถ\nA: ์ฌ๊ธฐ๋ก ๋ถ๋ ์ด๋ํด์ฃผ์ธ์ ์ต๋ํ\nA: ShareCoord#๋น ๋
#145926.609375#523510.437500\nD: ์ด์์ค์ธ์\nA: ์ปค๋ชฌ์ปค๋ชฌ ์๋ณด์ด\nD: 10๋ [๋ฒจ ์ ์๋ชน ๊ฐ์ด ์ก์์ฉ\nA: 10๋ ๋ฒจ ๊ฐํ๋ฉํ ๋ฃจ์ค ํ๋ฒ ๋ชจ์ฌ์ ์ก์๋ณผ๊น์","generated":"* D์ด ๊ธธ๋ ์ฐ๊ตฌ ์ถ์ฒ์ ๋งํฌํ์ผ๋ ๋ค๋ค ๋๋ฅด๋ผ๊ณ ํ์์ต๋๋ค.\n* D์ด ๊ธธ๋ ํ๊ทธ ๋ถ์์ผ๋ ๊ทธ ์ชฝ์ผ๋ก ์ฑ ์ด๋ ํ๊ณ ๋ณ๋ ฅ์ ๋ ๋๋ ๋ ๋ฃ์ผ๋ผ๊ณ ํ์์ต๋๋ค.\n* D์ด ๊ธ๋ฃจํค์ ๋ง์ด๋ง์ด ์ก์์ผ ์ํ๋ธ์ ์์์ด ๋์ฒ๋๋ค๊ณ ๊ฐ์กฐํ์ต๋๋ค.\n* A์ด ์๋ชจํ๋์ ๋งํฌ๊ฐ ๋๋ฌด ๋๊ธ๋๊ธํ๋ ์ฒดํต์ ์ง์ผ๋ฌ๋ผ๊ณ ํ์ A์ด ๋๋ ์ํด๋ผ๊ณ ํ์์ต๋๋ค.\n* D์ ์ฌ๋ ์ข ๋ชจ์ด๋ฉด ํ์๊พผ์ ์ก์๊ณ ํ์์ต๋๋ค.\n* D์ ์์ฃผ ๋ช
์ ๋ ๋ฒจ์ด 3 ๋์ด์ ๋ณด์์ผ๋ก ๊ฑด์ค์ฌ๋กฏ 2๊ฐ๋ฅผ ๋ฐ์ ์ ์๋ค๊ณ ํ์์ต๋๋ค.\n* A์ด ๊ธ๋ฃจํค์ ๊ฐ์์ด ๊ทธ๋๊ทธ๋ ๋ค์์ผ๋ ํ ๋ถ๋์ฉ ๊ณต๊ฒฉ๋๋ฅผ ๊ฑธ์ด๋ฌ๋ผ๊ณ ์์ฒญํ์ต๋๋ค.\n* D์ A์ด 10๋ ๋ฒจ ๊ฐํ๋ฉํ ๋ฃจ์ค๋ฅผ ๋ชจ์ฌ์ ์ก์๋ณด์๊ณ ํ์์ต๋๋ค","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
2 |
+
{"source":"A: ๋ถ๋ ๋จ์ผ์๋๋ถ\nA: 8282\nA: ์ ์ณ๋ณผ๊ฒ์\nA: ํ์ดํ
\nA: ๋ฐํ ๋ผ๋ก~!!!\nA: ๊ธฐ์ตํด์ค!!! ์ ์ 10๋ ๋ฒจ ์ก์๋ค๊ตฌ!!!\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ๋จ๋ ๋ถ๋ 1๊ฐ์ฉ\nA: ๊ธธ๋ ์์ ๊ฑด์ค์ ๋ฃ์ด์ฃผ์ธ์!\nA: ์ฐ๋ฆฌ์ ์ํ๋ธ์ ๊ฑด์คํด์ผํฉ๋๋ค!\nC: ์ฐ์ปด ํฌ ์ํ๋ธ\nA: ํฌ๋ก์ฐ ์๋ฝ์ด\nC: ๋ชจ๋ 1๋ถ๋๋ ๊ธธ๋ ์์์ ๋ฃ์ด๋ฌ๋ผ์
๋๋ค.\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nC: ์ํ๋ธ์ ์ํ ๊ฐ ์๊ธํ๋ค์์\nA: ์ด๋ฆฌ๋ก ์ค์ธ์ ์ด์~!\nB: ์ฐ๋ฆฌ ๊ธธ๋์์ ๋นจ๋ฆฌ ๊ฑด์ค ํ ๋ ค๋ฉด ๋ณ๋ ฅ์ ๋ ๋ง์ด ๋ฃ์ด์ผ ํ๋๊น ์ด์์ด์ ์ง์ํด์ฃผ์ธ์~ ^^\nA: ๋นจ๊ฐ๋ด๋ณต์ต๊ณ ~!!\nC: ์ด์์ค์ธ์~\nC: ใ
ใ
ใ
ใ
\nA: ์ฐ์\nC: ์ํฐ๋ณด์ด๋ ์ด์์ค์ญ์
\nA: ์(ํฐ)๋ณด์ด ์ค์
จ๋ค\nC: ๋ค๋ค ๋ ๋๋ ๋ ์ฐ๋งค์ฉ์์ ๊ธฐ์ฌ๋จ ํ๋์ฉ ๋ฃ์ผ๋ผ์ฐ\nA: ์ ์ฐ๋ฆฌ ์ด์ \nA: ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ\nA: ๋ฌ๋ ค์ผํฉ๋๋ค\nA: ํ ์คํธ๋ ์ด์์ค์ธ์~ ใ
ใ
\nA: ShareCoord#๋น ๋
#155862.171875#524460.062500\nA: ์ฌ๊ธฐ๋ก ๋ณ๋ ฅ์ง๊ฒฐํด์ฃผ์ธ์\nA: ์ค๋ฅธ์ชฝ์ ๋ณ๋ ฅ์ฌ๋กฏ ๋๋ฒ๋๋ฅด๋ฉด\nA: ์ ์ฒด์ ํํด์ ์ฌ๋ฌ ๊ธฐ์ฌ๋จ์ ํ๋ฒ์ ์ด์ฉํ ์ ์์ต๋๋ค\nA: ์์ ์์ ์ค๋ฅธ์ชฝ์\nA: ๋ชจ์ด๊ณ ์์ต๋๋ค\nA: ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ\nD: ๊ณ์ ์ด๋ํ ์ ์๋ ์์น๋ผ๊ณ ํ๋๋ฐ ์ด๋ป๊ฒ ํ๋ฉด ๋๋์?\nA: ์ฐธ์ฌํด์ ๊ธฐ์ฌ ๋ ๋ฒจ ์ญ์ญ ์ฌ๋ฆฌ์
์\nA: ๊ทธ ์ฑ์ ์ด๋ฆฌ์ ๋ฆฌ์ฎ๊ธฐ๋ฉด\nA: ์ด๋ก์์ผ๋ก ์ฑ ๋ฐ๋ฅ์ด ๋ฐ๋๋๋ฐ\nA: ๊ทธ๋ ์ด๋๋๋ฅด์๋ฉด ๋ฉ๋๋ค\nA: ์ ์๋๋ฉด 2~3๋ฒ์ ๋ ํธ๋ผ์ดํด์ฃผ์๋ฉด ์ด๋๋ ๊ฑฐ์์\nA: ShareCoord#๋น ๋
#156695.468750#525127.875000\nD: ์...๊น๋ฐ์ด ๋ ์์ด์ ๋๋ฌ๋ดค๋๋ฐ ๋ชป๊ฐ๋ค๊ณ ํด์ ใ
ใ
;\nA: ์~~\nA: ๊น๋ฐ๋ ์๋๊ณณ์ผ๋ก ๊ธฐ์ฌ๋จ(๋ถ๋)๋ ์ด๋ํ ์ ์์ด์!\nA: ๋ถ๋ ์ฌ๋ฌ๊ฐ ๋ณด๋ด์ ์๊ธฐ ์์ ํ์๊พผ ์ก์ผ๋ฉด\nA: ๊ฒฝํ์น ํญํ ํกํก ํฐ์ง๋๋ค\nA: ์ด์์ค์ธ์\nD: ์ฅ์ ๋ฌผ์ด ์์ด์ ๋ชป๊ฐ๋ค๊ณ ํ๋ค์..\nA: ShareCoord#๋น ๋
#156578.359375#523799.718750","generated":"* C๊ฐ ์ํ๋ธ์ ์ํ ๊ฐ ์๊ธํ๋ ๋ชจ๋ 1๋ถ๋๋ ๊ธธ๋ ์์์ ๋ฃ์ด๋ฌ๋ผ๊ณ ํ์์ต๋๋ค.\n* A์ด ์๊ธฐ ์์ ํ์๊พผ์ ์ก์ผ๋ฉด ๊ฒฝํ์น ํญํ ํกํก ํฐ์ง๋ ์ด์ ์ค๋ผ๊ณ ํ์์ต๋๋ค.\n* A์ด ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ๋ฅผ ๋ฌ๋ ค์ผ ํ๋ ๋น ๋
์ผ๋ก ๋ณ๋ ฅ์ ์ง๊ฒฐํด๋ฌ๋ผ๊ณ ํ์์ต๋๋ค.\n* A์ด ๋ถ๋ ์ฌ๋ฌ ๊ฐ ๋ณด๋ด์ ์๊ธฐ ์์ ํ์๊พผ ์ก์ผ๋ฉด ๊ฒฝํ์น ํญํ ํกํก ํฐ์ง๋ค๊ณ ํ์์ต๋๋ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
3 |
+
{"source":"A: 10์ 10๋ถ์\nA: ํ์๊พผ์ฌ๋ฅ ์์ํ ๊ฒ์\nA: ํ์๊พผ์ ๊ฒฝํ์น ๋ง์ด ๋ค์ด์์ต๋๋ค\nC: 10์ 10๋ถ์ ํ์๊พผ ์ก์ต๋๋ค.\nC: ๋ค๋ค ์ค์ธ์\nE: ์ด๋์ธ๊ฐ์..?\nA: ShareCoord#๋น ๋
#156509.859375#524728.187500\nA: ์๊ธฐ๊ฐ ๋ถ๋ ์ง๊ฒฐ์ง๊ณ ์!\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ๊ธธ๋์์ ๊ทผ์ฒ๋ก ์ฑ ์ด๋ํ์ ํ์ ๋ถ๋ ์ฎ๊ธฐ์๋ฉด ํธํฉ๋๋ค\nD: ๊ณต๊ฒฉ๋๋ฅผ ๊ทธ๋ฅ ๋ณด๋ด๋ฉด ๋๋์?\nA: ๊ธฐ์ฌ๋จ์ ๋ณด๋ด๋์
จ๋ค๊ฐ\nA: ์์ํ๋ฉด ๋ถ๋๋ฅผ ๋์ด์ ๋ชฌ์คํฐ ์์ ์ปค์๋ฅผ ์ฌ๋ ค๋๋ฉด\nA: ๋ฌ๋ ค๊ฐ์ ๊ณต๊ฒฉํ๋๋ฐ์\nA: ํํ์ด๋ฐ์ ๊ฐ์ด ์ก์ผ๋ฉด ์ ํฌ์ ์ฐธ์ฌํ ์ฌ๋ํํ
๋ณด์์ ๊ท ๋ฑํ๊ฒ ์ค์\nA: ๋ค๊ฐ์ด ๋ชฐ์์ก๋๊ฒ ์ด๋์
๋๋ค\nA: 10๋ถ๋์ด์ ํ์๊พผ ํ๋ฒ ๋ฌ๋ ค๋ณผ๊ฒ์~!\nA: ๋ถ๋ ์ต๋ํ ๊บผ๋ด์ ๊ฐ์ ธ์ค์ ํ์\nA: ์ฐ์ธก ๋ถ๋ ์ฌ๋กฏ ๋ฒํผ ๋๋ธํด๋ฆญํด์\nA: 3~5๋ถ๋ ์ด์ฉํ์
์~!\nA: ์ ๋ณด์๋ฉด 5๋ถ๋ ์ ๋ถ\nA: ํ๋ฐฉ์ ์ด์ฉํฉ๋๋ค\nA: ์์ ์น ๊ฒ์ ํ์๊พผ!!\nA: ShareCoord#ํ์๊พผ#155425.500000#527495.250000\nC: ๊ณ ํฉ๋๋ค.\nA: ๊ณต๊ฒฉ๊ณต๊ฒฉ\nA: ShareCoord#ํ์๊พผ#159173.750000#528246.125000\nA: ๋ฉ์์ด๋ ์ปค๋น๋\nA: ๋ถ๋ ์์ง์ฌ์ ํ์๊พผ ๊ณต๊ฒฉํ์
์~~!\nB: ๊ฐ์ด ๊ฐ์ ~~~ ํ์๊พผ\nA: ShareCoord#ํ์๊พผ#160192.187500#525461.625000\nA: ์ด์์ค์ธ์ ~~\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ฑ ์ด๋ํ์๊ตฌ\nA: ํ์๊พผ์ก์ผ๋ฌ ์ฐธ์ฌํ์
์ ์์ชฝ์ผ๋ฃจ '\nA: ShareCoord#ํ์๊พผ#165078.687500#525488.375000\nA: ๊ฐ์ด ์ก์์ผ ๋ณ๋ ฅ๋ ๋์ฃฝ๊ณ ๊ฒฝํ์น๋ ๊ฐ์ด๋จน์ด์~!\nA: ์ด์์ค์ธ์์ค~~~~\nB: ์ด์์ค์ธ์~\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ฑ ์ด๋ ์ขํ\nA: ShareCoord#ํ์๊พผ#164864.437500#527752.750000\nC: ์ด์์ค์ธ์~ :)\nA: ๋ชฌ์คํฐ ์ขํ\nA: ๊ณ์ ํ์๊พผ ์งํ ์ค์
๋๋~!!\nA: ์ด์์์ ์ก์ผ์
์ฉ~~\nA: ๋ถ๋ ํ๊ตฐํด์\nA: ์ ๋น ํ๋ฒ ใดํ๊ณ \nA: ๋ค์ ๋ฌ๋ฆด๊ฒ์\nA: ๋ค๋ค ๋ถ๋ ์ฒด๋ ฅ์ด ๊ฐ๋น๊ฐ๋นํ์ค๋ฏํ์ฌ..","generated":"* A์ด 10์ 10๋ถ์ ํ์๊พผ ์ฌ๋ฅ์ ์์ํ๋ค๊ณ ํ์์ต๋๋ค.\n* D๊ฐ ๊ณต๊ฒฉ๋๋ฅผ ๊ทธ๋ฅ ๋ณด๋ด๋ฉด ๋๋๊ณ ๋ฌป์ A์ ๋ค๊ฐ์ด ๋ชฐ์์ก๋๊ฒ ์ด๋์ด๋ 10๋ถ ๋์ด์ ํ์๊พผ์ ๋ฌ๋ ค๋ณด์๊ณ ํ์์ต๋๋ค.\n* A์ ๋ถ๋๋ฅผ ์ต๋ํ ๊บผ๋ด์ ๊ฐ์ ธ์จ ํ์ ์ฐ์ธก ๋ถ๋ ์ฌ๋กฏ ๋ฒํผ ๋๋ธํด๋ฆญ ํด์ 3~5๋ถ๋ ์ด์ฉํ๋ผ๊ณ ํ์์ต๋๋ค.\n* A์ด ํ์๊พผ์ ๊ฐ์ด ์ก์์ผ ๋ณ๋ ฅ๋ ๋ ์ฃฝ๊ณ ๊ฒฝํ์น๋ ๊ฐ์ด ๋จน์ ์ ์์ผ๋ ์ด์์ค๋ผ๊ณ ํ์์ต๋๋ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
4 |
+
{"source":"C: ์๊ฑฐ ์ก๊ณ ์คํ ์ ์ ์ ๋นํ๊ณ ์ค๊ฒ ์ต๋๋ค.\nA: ShareCoord#๋น ๋
#159195.468750#530362.062500\nA: ์ฌ๊ธฐ ์ง๊ฒฐํ ๊ฒ์!!!\nA: 18๋ถ์ ์คํํธ ํ๊ฒ ์ต๋๋ค\nA: ๋ถ๋ ํ๊ตฐํ๋ค๊ฐ ๋ค์ค ํธ์ฑ์ผ๋ก ์ ๋ถ ๋๊ณ ๋์ค์
์ฉ~!\nA: ๋ค๊ฐ์ด ์ก์์ผ ๊ฒฝํ์น ํจ๊ป๋จน๊ณ ํญ์
๊ฐ๋ฅํฉ๋๋ค\nA: ์ด์์ด์~!! ์๋ฝ์ด๋ค ๋นจ๋ฆฌ ๊ฐํด์ ธ์ผ ๋์ค์ ์ธ์ธ๋ ์ฌ๋ฐ์ต๋๋ค~! ใ
ใ
ใ
\nA: ShareCoord#๋น ๋
#159809.593750#529345.625000\nA: ์ด๋์๊ฐ๋๋ฌธ์ 20๋ถ์ ๋ฐ๋ก ์์ํ ๊ฒ์\nA: 10์ 20๋ถ์ ๊ทธ๋ฅ ๋ฐ๋ก ์คํํธํฉ๋๋ต\nA: ์์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค ใ
ใ
ํคํค\nA: ์ ์น ๊ฒ์\nA: ๋จ์๊ฑฐ ๋ค์ก๊ฒ ์ต๋๋ค\nD: ๊ฐ๋์?\nA: gogo!!!!\nA: ShareCoord#๋น ๋
#159518.640625#532655.562500\nA: ShareCoord#ํ์๊พผ#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#์ํ
๋ฅด ์์#160185.000000#531312.000000\nA: ํ์๊พผ์ก๊ณ ๋จ์ด์ง\nA: ์ํ
๋ฅด ์์ ์ด๊ฑฐ ์๊ฐ์ ๋ฒํ๊ฐ์ผ๋๊น\nB: ๊ฑด๋ฌผ ์ง๊ฑฐ๋ ๋ณ์ฌ ๋ฝ์ ๋ ์๋ชจ์ ๋์ ๋๋ฅด๋ฉด ์๊ฐ์ด ๋นจ๋ผ์ง๋๋ค.\nA: ํ์ํ์ ๊ฑฐ ๋น ๋ฅด๊ฒ ์ฑ์งํด์ ๋ฒํ๋ฐ์ผ์ธ์\nB: ์ ๊ทน์ ์ผ๋ก ํ์ฉํด์ฃผ์ธ์\nC: ํ์๊พผ ์ฌ๋ฅ ์๊ณ ํ์
จ์ต๋๋ค.\nC: ์ํ
๋ฅด ์์ ํ์ฉํด์ฃผ์ธ์! ์ธํฐ๋ ์
ํ๋ฉด ๋ฒํ ๋ฐ์ต๋๋ค.\nC: ๋ค์ ๊ธ๋ฃจํค ์ฌ๋ฅ ๊ฐ์๋น\nC: ๊ธธ๋ ์ฐ๊ตฌ๋ ์์ํ์ผ๋\nC: ๊ธฐ๋ถ ๋ง์ด ํด์ฃผ์ธ์\nC: ๊ธฐ๋ถ๋ฅผ ํ๋ฉด ๊ฐ์ธ ํฌ์ธํธ๋ ์ป์ผ๋๊น ์๋ถ์์กฐ\nA: ์คํ์ฏค์ด๋ฉด ์ํ๏ฟฝ๏ฟฝ๏ฟฝ ํ์ง ์ถ๋ค์\nC: ๋ค๋ค ๋ฐํ ๋ผ ๊ฐ์\nA: ์ฑ์ฅ ํธ๋ค๋ฅ ํด์ฃผ์๊ตฌ ๊ถ๊ธํ๊ฑฐ ๋ฌผ์ด๋ด์ฃผ์ธ์\nA: ๋์ ํํํ ๋๋ฌ์ฃผ์๊ตฌ์ฅ~!\nA: ๊ฐํ ์๋ฝ์ด๊ฐ ๋์!!!!!!!\nB: ๊ธธ๋ ์์ 60% ๋์๋ค์.\nB: ์์ฐจ ์์ฐจ ์์ฐจ~\nA: ์ด์ฐ ๋ฌ์ ๋์ ๊ฐ์ฌํฉ๋๋ค\nC: ใ
ใ
ใ
ใ
\nD: ใ
ใ
ใ
ใ
ใ
\nA: ๋์๋์~!!! ์ต๊ณ ~!!!!!!\nA: ๊ฐ์ ์๋ผ์ง ๋ง์๊ตฌ ํํ์ฐ์ธ์\nA: ๋ง์ง๋ง๋ ๊น์ง ๊ฐ์ ๋ค ์ฐ์ง๋ ๋ชปํฉ๋๋ค ใทใท\nA: ๋์-๊ฐ์-๋์-๊ฐ์\nA: ์-์ฐจ-์-์ฐจ\nC: (ํก์กฑ)\nC: ์ํ๋ธ ์์ ์์ฑ๊น์ง 3๋ถ๋จ์์ต๋๋ค.\nA: ์๋ฝ์ด ์ฌ๋ฌ๋ถ๋ค\nA: ๊ถ๊ธํ๊ฑฐ๋ ๋งํ๋ ๊ฒ๋ค ๋ง์์ฃผ์๋ฉด\nA: ๋ค ์๋ ค๋๋ฆฌ๊ณ ๋ค ๋์๋๋ฆฝ๋๋ค","generated":"* A์ด ํ์๊พผ์ ์ก๊ณ ๋จ์ด์ง ์ํ
๋ฅด ์์์ ์๊ฐ์ ๋ฒํ๋ก ๊ฑด๋ฌผ ์ง๊ฑฐ๋ ๋ณ์ฌ ๋ฝ์ ๋ ์ ๋ชจ์ ๋์์ ๋๋ฅด๋ฉด ์๊ฐ์ด ๋นจ๋ผ์ง๋ ํ์ํ ๊ฒ์ ๋น ๋ฅด๊ฒ ์ฑ์งํด ๋ฒํ๋ฐ์ผ๋ผ๊ณ ํ์์ต๋๋ค.\n* C๊ฐ ๊ธธ๋ ์ฐ๊ตฌ์ ๊ธฐ๋ถํ๋ฉด ๊ฐ์ธ ํฌ์ธํธ๋ ์ป์ ์ ์์ผ๋ ๊ธฐ๋ถ๋ฅผ ๋ง์ด ํด๋ฌ๋ผ๊ณ ๋น๋ถํ์์ต๋๋ค.\n* A์ด ์คํ ์ฏค ์ ํ๋ฒ ํ์ง ์ถ์ผ๋ ์ฑ์ฅ ํธ๋ค๋ฅ ํ๊ณ ๊ถ๊ธํ ๊ฒ ๋ฌผ์ด๋ด๋ฌ๋ผ๊ณ ํ์์ต๋๋ค.\n* B์ด ๊ธธ๋ ์์๊ฐ 60%๋ฅผ ๋์๋ค๊ณ ํ์์ต๋๋ค.\n* A์ด ๊ถ๊ธํ๊ฑฐ๋ ๋งํ๋ ๊ฒ๋ค ๋ง์ ์ฃผ์๋ฉด ๋ค ์๋ ค๋๋ฆฌ๊ณ ๋์๋๋ฆฐ๋ค๊ณ ํ์์ต๋๋ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
5 |
+
{"source":"E: ์ฌ๋ฌ๋ถ!\nE: ๊ธ๋ฃจํค ๋ง์ด ์ก์์ฃผ์ธ์!\nE: ๊ธธ๋ ์์์ด ๋ถ ์กฑํฉ๋๋ค.\nE: ์ ๊ณผ๊ฟ์ด ํ๋ฅด๋ ์ํ๋ธ์ด ๋๊ณ ์ถ๋ค?\nE: ๊ธ๋ฃจํค์ ๋ง์ด ์ก์์ฃผ์ญ์ผ\nG: ๊ฐ๋ํ ์ํ๋ธ์ ์์ ์ ์์ต๋๋ค\nE: ํ๋ฆฌ์ฆ\nB: ๊ฐํ๋ชฌ์คํฐ ๋ฏธ์
๊ฑธ๋ฆฌ์ ๋ถ?\nB: ๊ฐ์ด ์ก์ผ์ค๋ถ ๊ณ์ ๊ฐ์ ~\nF: ์ขํ์ฃผ์ธ์\nB: ShareCoord#๋น ๋
#155225.781250#534257.875000\nB: ShareCoord#๋น ๋
#154322.968750#533640.812500\nB: ๊ฐํ๋ชฌ์คํฐ ์ก์ผ์ค๋ถ ๊ตฌํด์~~~\nB: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nB: ์ถ๋ฐํฉ๋๋ค\nB: ๊ฐํ๋ชฌ์คํฐ~!\nB: ๋์ด์ค์ปค๋น๋\nF: ใ
ใ
\nB: ๊ฐ์ฌํฉ๋๋ท\nF: ๊ตฟ๊ตฟ\nB: ๋๋ฅธ ๋ฒํ์ ๋ฐ๋๊ณ ์ถ์\nC: ๊ธธ๋ ์ฐ๊ตฌ ๊ธฐ๋ถ ๋ถํ๋๋ฆฝ๋๋ค.\nB: ์๋ณด์ด์ ๋ง์์ ์๋๊ฐ\nC: ๊ธธ๋ ๋ณด์๋ฉด ๊ธธ๋ ์ฐ๊ตฌ ์์ด์~\nB: ShareCoord#1๋ ๋ฒจ ๋๋๊ณค ๋ ์ด#138900.000000#535800.000000\nB: ์ ์ด๊ธฐ์ง ์ง๊ณ ๋๋๊ณค ์ ํ๋ฒ ์ป์ด๋ณผ๊ฒ์~~\nB: ์์ ์ป๊ณ ์ข ํค์์ผ ์ฉ์ธ ์ ์์ต๋๋ค~~~!!\nC: ์ ์ด๊ธฐ์ง ๊ฑด์ค์ ๊ธฐ์ฌ๋จ 1๊ฐ์ฉ ์ง์ ๋ถํ๋๋ฆฝ๋๋ค\nE: ๊ธ๋ฃจํค๋ ๋ง์ด ์ก์์ฃผ์ธ์\nE: ๊ธธ๋ ์์์ ๋ชจ์์ผ ํฉ๋๋น\nB: ๋ต ์๋ชจํ๋\nB: ์ถฉ์ฑ์ถฉ์ฑ์ถฉ์ฑ XXXXXXXXXXX\nE: ๊ฐ์ฆ์ ๋ฐํ ๋ผ!\nB: ํ๋ ํฌ์ธํธ ์ป๋ ๋ฒ\nB: 1. ๊ธธ๋ > ์ฐ๊ตฌ > ๊ธธ๋ ์ฐ๊ตฌ ๊ธฐ๋ถ(์ถ์ฒ ์ฐ๊ตฌ)\nB: 2. ๊ธธ๋ ๋ณธ๋ถ > ๋์\nB: ํ๋ ํฌ์ธํธ ์ฌ์ฉ๋ฒ\nB: 1. ๊ธธ๋ > ๊ธธ๋ ์์ > ์ํ ๊ตฌ๋งค(๊ธฐ์ฌ ๊ฒฝํ์น ๊ฟ)\nB: ๊ธฐ์ฌ๋ ์
๋นจ๋ฆฌํด์ ๊ดด์ ๋ชฌ์คํฐ ๋นจ๋ฆฌ์ก๊ณ ๊ณ ์ฑ์ฅํฉ์ธ๋ค~!!\nB: ๋์ ๊ณ ๋ง์ต๋๋ค ํคํค\nB: ๊ณง ์์ฌ์๊ฐ์ผํ
๋ฐ์\nB: ์ ์ด๊ธฐ์ง์ ๊ธฐ์ฌ๋จ 1๊ฐ ๋ฃ์ด์ฃผ์๊ตฌ\nB: ๋๋จธ์ง ๋ถ๋๋ 2๋ ๋ฒจ ์ด์ ์ฑ์ง์ง์ ๋ฃ์ด๋์ ํ ์์ฌ๋ค๋
์ค์๋ฉด\nB: ์ด๋ฒคํธ๋ ํด๋ฆฌ์ด๋๊ณ ์์๋ ๋นต๋นตํด์ง๋๋ค~~~\nB: ์๋
ํ์ธ์~~~~\nB: ๊ถ๊ธํ๊ฑฐ ๋ฌผ์ด๋ณด์
์~~~ ๋ค ์๋ ค๋๋ฆฝ๋๋ค ~~\nD: ์ฐ๋ฆฌ ์ ์ด๊ธฐ์ง 3๋ถ ๋จ์๋๋ฐ.\nA: ์ ์ฐฐ์ ์ด๋ป๊ฒ ๋ณด๋ด๋์\nD: ์ข๋ง ๋ ์์ฐจ์์ฐจ ํด์ฃผ์๋ฉด\nD: ๋จ์ ์ฑ ์ฐ๊ณ ์ ์ฐฐ ๋ฒํผ ํด๋ฆญํ๋๊น ์ ์ฐฐ ๋ณด๋ด์ก์ด์.","generated":"* E๊ฐ ์ ๊ณผ๊ฟ์ด ํ๋ฅด๋ ์ํ๋ธ์ด ๋๊ณ ์ถ๋ค๋ฉด ๊ธ๋ฃจํค์ ๋ง์ด ์ก์๋ฌ๋ผ๊ณ ๋ถํํ์ต๋๋ค.\n* B์ด ๊ฐํ๋ชฌ์คํฐ ๋ฏธ์
๊ฑธ๋ฆฌ์ ๋ถ ๊ฐ์ด ์ก์๊ณ ํ์์ต๋๋ค.\n* B์ด ์ ์ด๊ธฐ์ง๋ฅผ ์ง๊ณ ๋๋๊ณค ์์ ์ป์ด๋ณด๊ฒ ๋ค๊ณ ํ์์ต๋๋ค.\n* E๊ฐ ๊ธธ๋ ์์์ ๋ชจ์์ผ ํ๋ค๊ณ ํ์์ต๋๋ค.\n* B ํ๋ ํฌ์ธํธ ์ป๋ ๋ฒ๊ณผ ํ๋ ํฌ์ธํธ ์ฌ์ฉ๋ฒ์ ๊ณต์ ํ์์ต๋๋ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
6 |
+
{"source":"D: ๋ณดํธ๋ง์๋ ์ฑ ๋นผ๊ณ \nD: ๋ค๋ฅธ ์ฌ๋์ด ์ฑ์งํ๋ ๊ณณ๋ ์ ์ฐฐ ๋๋๋ฐ์.\nC: ์ ์ฐฐ์ ๋ง์์ด๋ ํ์ธ์ ์ฑ์ ๋๋ฅด๋ฉด ์ ์ฐฐ ๋๋๋ฐ์\nE: ์ ์ฐฐ์ ์์ง์ ์ ์ฐฐ ๋ณธ๋ถ๋ฅผ ๊ฑด์คํด์ผํฉ๋๋ค\nC: ShareCoord#2๋ ๋ฒจ ํ๋ก์ ๋ง์#148500.000000#524397.000000\nC: ๋ฏธ์
๋๋๋ฌธ์ด์๋ฉด ์ด ๋ง์์ด๋\nA: ์ค ํ๋์ ๋ ์๊ฐ๋ค\nC: ์ค ๋์
จ๋์\nA: ใ
ใ
๋ค ใณใณ\nD: ํ๋์...\nC: ์ฌ์ค๋ถ์์ด์
๋๋ค ใ
ใ
\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ๋ฑ๋ค๊ตฌ๋ฆฌ๋ผ๊ณ ํด๋ ๋ฏฟ์ ๏ฟฝ๏ฟฝ๋ฅ์ด ๊ธธ์ด์ง๋ง ๊ทธ๋ ๋ค๊ณ ํ๋ ๋ฏฟ๊ฒ ์ต๋๋ค\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ํ๊ฒ์. ๊ทธ ์ ๋น์ ์ธ๊ณ\nC: ์
์ด ์์ฃฝ ๋์์ด์\nC: ์์ง๋ฏ์\nA: ๋ค์ ์ ์ฐฐ ์๋ณด๋ผ๊ฒ ใท\nE: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nD: ๋\nC: ๊ตฐ์ฃผ๋ ๊ณ์ ๊ฐ์\nC: ์คํธ๋ฆฟ ๊ตฐ์ฃผ๋\nC: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nC: ์์ ๊ทผ์ฒ๋ก ์ค์๋ฉด ๊ธ๋ค๋ฃจํค ๊ณต๊ฒฉ๋ ์ง๊ฒฐ์ด ํจ์ฌ ์ฉ์ดํฉ๋๋ค!!\nC: ์ด์ชฝ์ผ๋ก ์ค์
์ฉ~!\nC: ์ขํ ๊ทผ์ฒ ๋
์ ๊พธ์ฑ ๋๋ฅด์ ํ ์ฑ ์ด๋์ ๋๋ฅด์๋ฉด ์ฑ์ด ์ด์ ๋ฉ๋๋ค\nE: 1์์ ํ์๊พผ ์ฌ๋ฅ ๊ฐ๋๋ค.\nE: ํ๊ทธ ๋ฌ์์ผ๋ ์์น ํ์ธํด์ฃผ์ธ์ :)\nB: ์๋
ํ์ธ์ ๊ฐ์
์ธ์ฌ ๋๋ฆฝ๋๋ค\nE: ์๋
ํ์ธ์! ๋ฐ๊ฐ์ต๋๋น :)\nC: ์๋
ํ์ธ์~~~\nC: 1์ ํ์๊พผ ํ์ธํ์ต๋๋ค\nE: ์์\nF: ์๋
ํ์ธ์ ๋ฐ๊ฐ์ต๋๋ค\nC: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nE: ์ด์์ค์ญ์ผ!\nE: ์ ๊ณผ ๊ฟ์ด ํ๋ฅด๋ ์ํ๋ธ์
๋๋ค.\nE: ๊ธธ๋ ์์ ๊ทผ์ฒ๋ก ์ฑ ์ด๋ ํด์ฃผ์ธ์! :)\nD: ๋ค~~~~~~\nC: ShareCoord#[STOP]์ ์ด๊ธฐ์ง#-152190.000000#492921.000000\nC: ์ฐ๋ฆฌ๋ ์๊ธฐ์ฒ๋ผ ์ธ์ฐ๊ฒ ๋ ์ง๋ ๋ชฐ๋ผ์ ใ
ใ
\nC: ๋ฌด๋ ฅ์ ๋นจ๋ฆฌ ํค์๋ก์๋ค~!!\nC: ํ!\nC: ์ปค๋น๋ ํน์ ์ํ๋ธ ๋ฒ๋ฆฌ๊ณ ๊ฐ์๋์\nC: ShareCoord#[CITA]๋จ์์ปค๋น#-155181.000000#523710.000000\nC: ๋ค์ผฐ์ผ๋ ๋ณต๊ทํ์์ฃต\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nC: ํ์ฌ ์ค๋นํ์ธ์\nC: ShareCoord#[INSD]๋ถ์์ด#-148347.000000#488490.000000","generated":"* D์ด ๋ค๋ฅธ ์ฌ๋์ด ์ฑ์งํ๋ ๊ณณ๋ ์ ์ฐฐ์ด ๊ฐ๋ฅํ๋ค๊ณ ํ์์ต๋๋ค.\n* C์ด ์ธ์ฐ๊ฒ ๋ ์ง ๋ชจ๋ฅด๋ ๋ฌด๋ ฅ์ ๋นจ๋ฆฌ ํค์๋์๊ณ ์ ์ํ์ต๋๋ค.\n* C์ด ์ปค๋น๋์๊ฒ ์ํ๋ธ ๋ฒ๋ฆฌ๊ณ ๊ฐ์๋๋ฉฐ ๋ค์ผฐ์ผ๋ ๋ณต๊ทํ๋ผ๊ณ ์ ์ํ์ต๋๋ค.\n* C์ด ํ์ฌ๋ฅผ ์ค๋นํ์๊ณ ํ์์ต๋๋ค.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","how_prompted":""}
|
streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"generated":"* D๊ธธ๋์ฅ์ด ๊ธธ๋์ฐ๊ตฌ ํ๊ทธ๋ฅผ ์ถ์ฒํ๋ฉฐ, C๊ฐ ์
์ฅํ์ ๋ ์์น๋ฅผ ์๋ดํฉ๋๋ค.\n* A๊ฐ D๊ธธ๋์ฅ์๊ฒ ์ธ์ฌ๋ฅผ ๊ฑด๋ค๊ณ , C๊ฐ ๊ธธ๋๋ก ์ด๋ํฉ๋๋ค.\n* D๊ธธ๋์ฅ์ ๊ธ๋ฃจํค ์ก๋๊ฒ์ ๋
๋ คํ๋ฉฐ ์ํ๋ธ ์์ ํ๋ถํด์ง์ ์๋ดํฉ๋๋ค.\n* B, D ๊ณต๊ฒฉ๋ ์ฐธ์ฌ๋ฅผ ๋
๋ คํ๊ณ ํ์คํธ ์งํ์ ๊ถ์ฅํ๋ฉฐ ์์ฃผ ๋ช
์ ๋ณด์์ ์กด์ฌ๋ฅผ ์ค๋ช
ํฉ๋๋ค.\n* A๋ ๊ธ๋ฃจํค์ ๊ฐ์์ด ํ๋ถํ๋ค๋ ์ ์ ๊ฐ์กฐํ๊ณ , ๋ถ๋ ์ด๋์ ์๋ดํ๋ฉฐ, ์๋ณด์ด์ ๊ทธ์ ์น๊ตฌ๋ฅผ ๋ง์ดํฉ๋๋ค.\n* D๊ธธ๋์ฅ์ 10๋ ๋ฒจ ๊ฐํ๋ชฌ์คํฐ(๋ฉํ ๋ฃจ์ค)๋ฅผ ํจ๊ป ์ก์ ๊ฒ์ ์ ์ํฉ๋๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"D: ๊ธธ๋์ฐ๊ตฌ ์ถ์ฒ ๋งํฌ ํ์ผ๋\nD: ๋ค๋ค ๋๋ฅด๊ฑฐ๋ผ\nC: ๊ธธ๋์ฅ๋ ์์น๊ฐ ์ด๋์์?\nC: ์ฐพ์์...\nD: ๋ค ๊ธธ๋ ํ๊ทธ ๋ถ์์ผ๋๊น\nD: ๊ทธ์ชฝ์ผ๋ก ์ฑ ์ด๋ ํ์๊ณ \nD: ๋ณ๋ ฅ ๋ฃ์ผ๋ผ๊ณ ๋ ๋๋ ๋\nA: ํ์ดํ์ด\nD: ๋ฐ๊ฐ์ต๋๋ค ์ด์์ค์ญ์
\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ด๋ฆฌ๋ก ์ด๋ํด์ ์ค์ธ์\nC: ๋ค๋ค~~~~\nA: ํจ๊ปํด์ ~\nC: ์ฎ๊ฒผ์ต๋๋ค.\nA: โก\nD: ๊ธ๋ฃจํค์ ๋ง์ด๋ง์ด ์ก์์ฃผ์
์ผ\nD: ์ํ๋ธ์ ์์์ด ๋์ณ๋ฉ๋๋ค.\nA: ์๋ชจํ๋\nA: ๋งํฌ๊ฐ ๋๋ฌด ๋๊ธ๋๊ธํ์ญ๋๋ค\nA: ์ฒดํต์ ์ง์ผ์ฃผ์์ต์์\nD: ๋๋ ์ํด\nA: ์ผ์ผ\nD: ์ฌ๋ ์ข ๋ชจ์ด๋ฉด\nD: ํ์๊พผ ์ก์์ฌ\nB: ๊ณต๊ฒฉ๋ ์ฐธ์ฌ ๋ง์ด ํด์ฃผ์ธ์~!\nD: ๊ณต๊ฒฉ๋ ์ฐธ์ฌํด์ผ ์ ๋ง ์์ ๋์น๋ ์ํ๋ธ์ด ๋ฉ๋๋ค.\nD: ์ฌ๋ฌ๋ถ ํ์คํธํ์๋ฉด์ ์งํํ์๋ฉด ๋น ๋ฅธ ์ฑ์ฅ ๊ฐ๋ฅํฉ๋๋ค.\nD: ์ฌ๋ฌ๋ถ ์์ฃผ ๋ช
์ ๋ณด์๋ ๋ฐ์ผ์ธ์~\nD: ์์ฃผ ๋ช
์ ๋ ๋ฒจ 3 ๋์ ๋ณด์ ๋ฐ์ผ๋ฉด\nA: ๋ต ์๋ชจํ๋!!\nD: ๊ฑด์ค์ฌ๋กฏ 2๊ฐ!\nA: ๊ธ๋ฃจํค์ ๊ฐ์์ด ๊ทธ๋๊ทธ๋ ๋ค์๋ค์\nA: ํ๋ถ๋์ฉ ๊ณต๊ฒฉ๋ ๊ฑธ์ด์ฃผ์ธ์\nA: ํฌ๋ก์ฐ ์๋ณด์ด\nD: ์ด์์ค์ธ์~ :)\nA: ์น์ ํ ๋ชจํ์จ\nD: ์์ฐ\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ํฌ๋ก์ฐ ์๋ฝ์ด\nD: ๊ธธ๋ ์์ ๊ทผ์ฒ๋ก ์ด๋ํด ์ค์ธ์\nA: ๊ธ๋ฃจํค ์ฐธ์ฌํ์๋ฉด\nA: ์ํ๋ธ๋ ํ์กฑํด์ง๊ณ ๊ฐ์๋ ๋์ณํ๋ฅด๊ฒ ๋ฉ๋๋ค์\nA: 10๋ ์ง๋ฆฌ\nA: ๊ฐํ๋ชฌ์คํฐ ๊ฐ์ด ์ก์ผ์ค๋ถ\nA: ์ฌ๊ธฐ๋ก ๋ถ๋ ์ด๋ํด์ฃผ์ธ์ ์ต๋ํ\nA: ShareCoord#๋น ๋
#145926.609375#523510.437500\nD: ์ด์์ค์ธ์\nA: ์ปค๋ชฌ์ปค๋ชฌ ์๋ณด์ด\nD: 10๋ [๋ฒจ ์ ์๋ชน ๊ฐ์ด ์ก์์ฉ\nA: 10๋ ๋ฒจ ๊ฐํ๋ฉํ ๋ฃจ์ค ํ๋ฒ ๋ชจ์ฌ์ ์ก์๋ณผ๊น์","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
2 |
+
{"generated":"* A๋ ๊ธธ๋๊ฐ ๊ฑด์คํ๋ ์ํ๋ธ ๊ธฐ๋
์ฒซ ๊ฑด์ถ๋ฌผ์ธ ๊ธธ๋ ์์๋ก ๋ถ๋๋ฅผ ์ด๋์์ผ๋ฌ๋ผ๊ณ ํฉ๋๋ค. \n* ์ํ๋ธ ๊ฑด์ค์ ์ถํํ๋ฉฐ ๋ชจ๋๊ฐ ์ฐธ์ฌํ๋๋ก ์ ๋ํฉ๋๋ค.\n* ๋ค์์ผ๋ก ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ๋ฅผ ์๋ ค์ฃผ๊ณ \n * ์ฐธ์ฌ ๋ฐฉ๋ฒ์ ์ค๋ช
ํ๋ฉฐ ๋ ์ฝ๊ฒ ์ดํดํ๋๋ก ๋์์ค๋๋ค.\n* ๋์ธต์ด ์ข ๋ฏ์ค์ด ๋ณด์ด๋ D์๊ฒ ์ด๋ฒคํธ ์ง์ญ์ผ๋ก ์ด๋ํ๋ ค๋ฉด ๊น๋ฐ ์์ผ๋ก ๋ถ๋๋ฅผ ๋ณด๋ด๋๋ก ์๋ดํ๊ณ , \n* ๊ฒฝํ์น๋ฅผ ๋น ๋ฅด๊ฒ ์ป๊ธฐ ์ํด ํ์๊พผ์ ์ก์ผ๋ฉด ์ด๋ฒคํธ ํจ์จ์ด ๋์์ง๋ค๊ณ ์ค๋ช
ํฉ๋๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"A: ๋ถ๋ ๋จ์ผ์๋๋ถ\nA: 8282\nA: ์ ์ณ๋ณผ๊ฒ์\nA: ํ์ดํ
\nA: ๋ฐํ ๋ผ๋ก~!!!\nA: ๊ธฐ์ตํด์ค!!! ์ ์ 10๋ ๋ฒจ ์ก์๋ค๊ตฌ!!!\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ๋จ๋ ๋ถ๋ 1๊ฐ์ฉ\nA: ๊ธธ๋ ์์ ๊ฑด์ค์ ๋ฃ์ด์ฃผ์ธ์!\nA: ์ฐ๋ฆฌ์ ์ํ๋ธ์ ๊ฑด์คํด์ผํฉ๋๋ค!\nC: ์ฐ์ปด ํฌ ์ํ๋ธ\nA: ํฌ๋ก์ฐ ์๋ฝ์ด\nC: ๋ชจ๋ 1๋ถ๋๋ ๊ธธ๋ ์์์ ๋ฃ์ด๋ฌ๋ผ์
๋๋ค.\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nC: ์ํ๋ธ์ ์ํ ๊ฐ ์๊ธํ๋ค์์\nA: ์ด๋ฆฌ๋ก ์ค์ธ์ ์ด์~!\nB: ์ฐ๋ฆฌ ๊ธธ๋์์ ๋นจ๋ฆฌ ๊ฑด์ค ํ ๋ ค๋ฉด ๋ณ๋ ฅ์ ๋ ๋ง์ด ๋ฃ์ด์ผ ํ๋๊น ์ด์์ด์ ์ง์ํด์ฃผ์ธ์~ ^^\nA: ๋นจ๊ฐ๋ด๋ณต์ต๊ณ ~!!\nC: ์ด์์ค์ธ์~\nC: ใ
ใ
ใ
ใ
\nA: ์ฐ์\nC: ์ํฐ๋ณด์ด๋ ์ด์์ค์ญ์
\nA: ์(ํฐ)๋ณด์ด ์ค์
จ๋ค\nC: ๋ค๋ค ๋ ๋๋ ๋ ์ฐ๋งค์ฉ์์ ๊ธฐ์ฌ๋จ ํ๋์ฉ ๋ฃ์ผ๋ผ์ฐ\nA: ์ ์ฐ๋ฆฌ ์ด์ \nA: ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ\nA: ๋ฌ๋ ค์ผํฉ๋๋ค\nA: ํ ์คํธ๋ ์ด์์ค์ธ์~ ใ
ใ
\nA: ShareCoord#๋น ๋
#155862.171875#524460.062500\nA: ์ฌ๊ธฐ๋ก ๋ณ๋ ฅ์ง๊ฒฐํด์ฃผ์ธ์\nA: ์ค๋ฅธ์ชฝ์ ๋ณ๋ ฅ์ฌ๋กฏ ๋๋ฒ๋๋ฅด๋ฉด\nA: ์ ์ฒด์ ํํด์ ์ฌ๋ฌ ๊ธฐ์ฌ๋จ์ ํ๋ฒ์ ์ด์ฉํ ์ ์์ต๋๋ค\nA: ์์ ์์ ์ค๋ฅธ์ชฝ์\nA: ๋ชจ์ด๊ณ ์์ต๋๋ค\nA: ๊ฒฝํ์น 3๋ฐฐ ์ด๋ฒคํธ\nD: ๊ณ์ ์ด๋ํ ์ ์๋ ์์น๋ผ๊ณ ํ๋๋ฐ ์ด๋ป๊ฒ ํ๋ฉด ๋๋์?\nA: ์ฐธ์ฌํด์ ๊ธฐ์ฌ ๋ ๋ฒจ ์ญ์ญ ์ฌ๋ฆฌ์
์\nA: ๊ทธ ์ฑ์ ์ด๋ฆฌ์ ๋ฆฌ์ฎ๊ธฐ๋ฉด\nA: ์ด๋ก์์ผ๋ก ์ฑ ๋ฐ๋ฅ์ด ๋ฐ๋๋๋ฐ\nA: ๊ทธ๋ ์ด๋๋๋ฅด์๋ฉด ๋ฉ๋๋ค\nA: ์ ์๋๋ฉด 2~3๋ฒ์ ๋ ํธ๋ผ์ดํด์ฃผ์๋ฉด ์ด๋๋ ๊ฑฐ์์\nA: ShareCoord#๋น ๋
#156695.468750#525127.875000\nD: ์...๊น๋ฐ์ด ๋ ์์ด์ ๋๋ฌ๋ดค๋๋ฐ ๋ชป๊ฐ๋ค๊ณ ํด์ ใ
ใ
;\nA: ์~~\nA: ๊น๋ฐ๋ ์๋๊ณณ์ผ๋ก ๊ธฐ์ฌ๋จ(๋ถ๋)๋ ์ด๋ํ ์ ์์ด์!\nA: ๋ถ๋ ์ฌ๋ฌ๊ฐ ๋ณด๋ด์ ์๊ธฐ ์์ ํ์๊พผ ์ก์ผ๋ฉด\nA: ๊ฒฝํ์น ํญํ ํกํก ํฐ์ง๋๋ค\nA: ์ด์์ค์ธ์\nD: ์ฅ์ ๋ฌผ์ด ์์ด์ ๋ชป๊ฐ๋ค๊ณ ํ๋ค์..\nA: ShareCoord#๋น ๋
#156578.359375#523799.718750","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
3 |
+
{"generated":"* A๋ ๊ธธ๋์์ ํ์๊พผ ์ฌ๋ฅ ์ด๋ฒคํธ๋ฅผ 10์ 10๋ถ์ ์์ํ๋ค๊ณ ์๋ฆฌ๊ณ ์ฌ๋ฅ ์ฅ์ ์์น ์ ๋ณด๋ฅผ ๊ณต์ ํฉ๋๋ค. \n* ์ด๋ณด์๋ฅผ ์ํ ์๋ด๋ก, ๊ธธ๋ ์์ ๊ทผ์ฒ๋ก ์ด๋ํ ํ ๋ถ๋๋ฅผ ์ด๋์์ผ ๋ชฌ์คํฐ ์์ ์ปค์๋ฅผ ๋๊ณ ๊ณต๊ฒฉํ๋ ์ ๋ต์ ์ค๋ช
ํฉ๋๋ค. ๊ณต๋์ ํฌ์ ์ฐธ์ฌํ๋ฉด ๊ฒฝํ์น๊ฐ ๊ท ๋ฑํ๊ฒ ๋๋ ์ง๋๋ค.\n* A๋ ํ๋ ์ด์ด๋ค์ด ๋ถ๋๋ฅผ ์ด์ฉํ๊ณ ํ์๊พผ์ ๊ณต๊ฒฉํ ์ค๋น๋ฅผ ํ๋๋ก ์ง์ํฉ๋๋ค.\n* ํ์๊พผ ์์น๋ฅผ ๊ณ์ ๊ณต์ ํ๋ฉฐ ๋ถ๋๊ฐ ์์ด์ง๋ฉด ํ์ ํ ๋ค์ ์งํํ๋ค๊ณ ํฉ๋๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"A: 10์ 10๋ถ์\nA: ํ์๊พผ์ฌ๋ฅ ์์ํ ๊ฒ์\nA: ํ์๊พผ์ ๊ฒฝํ์น ๋ง์ด ๋ค์ด์์ต๋๋ค\nC: 10์ 10๋ถ์ ํ์๊พผ ์ก์ต๋๋ค.\nC: ๋ค๋ค ์ค์ธ์\nE: ์ด๋์ธ๊ฐ์..?\nA: ShareCoord#๋น ๋
#156509.859375#524728.187500\nA: ์๊ธฐ๊ฐ ๋ถ๋ ์ง๊ฒฐ์ง๊ณ ์!\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ๊ธธ๋์์ ๊ทผ์ฒ๋ก ์ฑ ์ด๋ํ์ ํ์ ๋ถ๋ ์ฎ๊ธฐ์๋ฉด ํธํฉ๋๋ค\nD: ๊ณต๊ฒฉ๋๋ฅผ ๊ทธ๋ฅ ๋ณด๋ด๋ฉด ๋๋์?\nA: ๊ธฐ์ฌ๋จ์ ๋ณด๋ด๋์
จ๋ค๊ฐ\nA: ์์ํ๋ฉด ๋ถ๋๋ฅผ ๋์ด์ ๋ชฌ์คํฐ ์์ ์ปค์๋ฅผ ์ฌ๋ ค๋๋ฉด\nA: ๋ฌ๋ ค๊ฐ์ ๊ณต๊ฒฉํ๋๋ฐ์\nA: ํํ์ด๋ฐ์ ๊ฐ์ด ์ก์ผ๋ฉด ์ ํฌ์ ์ฐธ์ฌํ ์ฌ๋ํํ
๋ณด์์ ๊ท ๋ฑํ๊ฒ ์ค์\nA: ๋ค๊ฐ์ด ๋ชฐ์์ก๋๊ฒ ์ด๋์
๋๋ค\nA: 10๋ถ๋์ด์ ํ์๊พผ ํ๋ฒ ๋ฌ๋ ค๋ณผ๊ฒ์~!\nA: ๋ถ๋ ์ต๋ํ ๊บผ๋ด์ ๊ฐ์ ธ์ค์ ํ์\nA: ์ฐ์ธก ๋ถ๋ ์ฌ๋กฏ ๋ฒํผ ๋๋ธํด๋ฆญํด์\nA: 3~5๋ถ๋ ์ด์ฉํ์
์~!\nA: ์ ๋ณด์๋ฉด 5๋ถ๋ ์ ๋ถ\nA: ํ๋ฐฉ์ ์ด์ฉํฉ๋๋ค\nA: ์์ ์น ๊ฒ์ ํ์๊พผ!!\nA: ShareCoord#ํ์๊พผ#155425.500000#527495.250000\nC: ๊ณ ํฉ๋๋ค.\nA: ๊ณต๊ฒฉ๊ณต๊ฒฉ\nA: ShareCoord#ํ์๊พผ#159173.750000#528246.125000\nA: ๋ฉ์์ด๋ ์ปค๋น๋\nA: ๋ถ๋ ์์ง์ฌ์ ํ์๊พผ ๊ณต๊ฒฉํ์
์~~!\nB: ๊ฐ์ด ๊ฐ์ ~~~ ํ์๊พผ\nA: ShareCoord#ํ์๊พผ#160192.187500#525461.625000\nA: ์ด์์ค์ธ์ ~~\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ฑ ์ด๋ํ์๊ตฌ\nA: ํ์๊พผ์ก์ผ๋ฌ ์ฐธ์ฌํ์
์ ์์ชฝ์ผ๋ฃจ '\nA: ShareCoord#ํ์๊พผ#165078.687500#525488.375000\nA: ๊ฐ์ด ์ก์์ผ ๋ณ๋ ฅ๋ ๋์ฃฝ๊ณ ๊ฒฝํ์น๋ ๊ฐ์ด๋จน์ด์~!\nA: ์ด์์ค์ธ์์ค~~~~\nB: ์ด์์ค์ธ์~\nA: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nA: ์ฑ ์ด๋ ์ขํ\nA: ShareCoord#ํ์๊พผ#164864.437500#527752.750000\nC: ์ด์์ค์ธ์~ :)\nA: ๋ชฌ์คํฐ ์ขํ\nA: ๊ณ์ ํ์๊พผ ์งํ ์ค์
๋๋~!!\nA: ์ด์์์ ์ก์ผ์
์ฉ~~\nA: ๋ถ๋ ํ๊ตฐํด์\nA: ์ ๋น ํ๋ฒ ใดํ๊ณ \nA: ๋ค์ ๋ฌ๋ฆด๊ฒ์\nA: ๋ค๋ค ๋ถ๋ ์ฒด๋ ฅ์ด ๊ฐ๋น๊ฐ๋นํ์ค๋ฏํ์ฌ..","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
4 |
+
{"generated":"* A๋ ๋๋ ์ด ํ์ด์ง ์นด์ดํธ๋ค์ด์ ์์ํ๊ธฐ ์ ์ ํ๋ ์ด์ด๋ค์ ๋ชจ์๋๊ณ ์ค๋น ์ํฉ์ ์๋ดํฉ๋๋ค. \n* ํ์๊พผ๊ณผ ์ํ
๋ฅด ์์ ์ถํ ์์น๋ฅผ ๊ณต์ ํ๊ณ , \nํ์๊พผ์ ์ก๊ณ ์ํ
๋ฅด ์์์ ํ์ฉํ๋ฉด ์๊ฐ ์ ํ ๋ฒํ๋ฅผ ์ป์ ์ ์์์ ์๋ฆฝ๋๋ค.\n* '๋์' & '๊ฐ์' ๋๊ตฌ๋ฅผ ์ ๊ทน์ ์ผ๋ก ์ฌ์ฉํ๋ผ๋ ์กฐ์ธ๋ ๋ง๋ถ์ด๊ณ , C๊ฐ ์ํ๋ธ ๊ฑด์ค์ด 3๋ถ ๋จ์๋ค๊ณ ์๋ฆฌ๋ฉฐ ํ๋ ์ด์ด๊ฐ ์กฐ๊ธ์ฉ ์ญํ ์ ๋ง์น๊ณ ์ํฉ์ด ์งํ๋๋ ๋ชจ์ต์ ๋ณด์ฌ์ค๋๋ค.\n* ํ๋ ์ด์ด๋ค์ ์๋ก ์์ํ๊ณ , ๊ฒ์ ์งํ ์ํฉ์ ๋๋๋ฉฐ ๊ฐ ๊ธธ์ ๊ณํํฉ๋๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"C: ์๊ฑฐ ์ก๊ณ ์คํ ์ ์ ์ ๋นํ๊ณ ์ค๊ฒ ์ต๋๋ค.\nA: ShareCoord#๋น ๋
#159195.468750#530362.062500\nA: ์ฌ๊ธฐ ์ง๊ฒฐํ ๊ฒ์!!!\nA: 18๋ถ์ ์คํํธ ํ๊ฒ ์ต๋๋ค\nA: ๋ถ๋ ํ๊ตฐํ๋ค๊ฐ ๋ค์ค ํธ์ฑ์ผ๋ก ์ ๋ถ ๋๊ณ ๋์ค์
์ฉ~!\nA: ๋ค๊ฐ์ด ์ก์์ผ ๊ฒฝํ์น ํจ๊ป๋จน๊ณ ํญ์
๊ฐ๋ฅํฉ๋๋ค\nA: ์ด์์ด์~!! ์๋ฝ์ด๋ค ๋นจ๋ฆฌ ๊ฐํด์ ธ์ผ ๋์ค์ ์ธ์ธ๋ ์ฌ๋ฐ์ต๋๋ค~! ใ
ใ
ใ
\nA: ShareCoord#๋น ๋
#159809.593750#529345.625000\nA: ์ด๋์๊ฐ๋๋ฌธ์ 20๋ถ์ ๋ฐ๋ก ์์ํ ๊ฒ์\nA: 10์ 20๋ถ์ ๊ทธ๋ฅ ๋ฐ๋ก ์คํํธํฉ๋๋ต\nA: ์์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค ใ
ใ
ํคํค\nA: ์ ์น ๊ฒ์\nA: ๋จ์๊ฑฐ ๋ค์ก๊ฒ ์ต๋๋ค\nD: ๊ฐ๋์?\nA: gogo!!!!\nA: ShareCoord#๋น ๋
#159518.640625#532655.562500\nA: ShareCoord#ํ์๊พผ#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#์ํ
๋ฅด ์์#160185.000000#531312.000000\nA: ํ์๊พผ์ก๊ณ ๋จ์ด์ง\nA: ์ํ
๋ฅด ์์ ์ด๊ฑฐ ์๊ฐ์ ๋ฒํ๊ฐ์ผ๋๊น\nB: ๊ฑด๋ฌผ ์ง๊ฑฐ๋ ๋ณ์ฌ ๋ฝ์ ๋ ์๋ชจ์ ๋์ ๋๋ฅด๋ฉด ์๊ฐ์ด ๋นจ๋ผ์ง๋๋ค.\nA: ํ์ํ์ ๊ฑฐ ๋น ๋ฅด๊ฒ ์ฑ์งํด์ ๋ฒํ๋ฐ์ผ์ธ์\nB: ์ ๊ทน์ ์ผ๋ก ํ์ฉํด์ฃผ์ธ์\nC: ํ์๊พผ ์ฌ๋ฅ ์๊ณ ํ์
จ์ต๋๋ค.\nC: ์ํ
๋ฅด ์์ ํ์ฉํด์ฃผ์ธ์! ์ธํฐ๋ ์
ํ๋ฉด ๋ฒํ ๋ฐ์ต๋๋ค.\nC: ๋ค์ ๊ธ๋ฃจํค ์ฌ๋ฅ ๊ฐ์๋น\nC: ๊ธธ๋ ์ฐ๊ตฌ๋ ์์ํ์ผ๋\nC: ๊ธฐ๋ถ ๋ง์ด ํด์ฃผ์ธ์\nC: ๊ธฐ๋ถ๋ฅผ ํ๋ฉด ๊ฐ์ธ ํฌ์ธํธ๋ ์ป์ผ๋๊น ์๋ถ์์กฐ\nA: ์คํ์ฏค์ด๋ฉด ์ํ๋ฒ ํ์ง ์ถ๋ค์\nC: ๋ค๋ค ๋ฐํ ๋ผ ๊ฐ์\nA: ์ฑ์ฅ ํธ๋ค๋ฅ ํด์ฃผ์๊ตฌ ๊ถ๊ธํ๊ฑฐ ๋ฌผ์ด๋ด์ฃผ์ธ์\nA: ๋์ ํํํ ๋๋ฌ์ฃผ์๊ตฌ์ฅ~!\nA: ๊ฐํ ์๋ฝ์ด๊ฐ ๋์!!!!!!!\nB: ๊ธธ๋ ์์ 60% ๋์๋ค์.\nB: ์์ฐจ ์์ฐจ ์์ฐจ~\nA: ์ด์ฐ ๋ฌ์ ๋์ ๊ฐ์ฌํฉ๋๋ค\nC: ใ
ใ
ใ
ใ
\nD: ใ
ใ
ใ
ใ
ใ
\nA: ๋์๋์~!!! ์ต๊ณ ~!!!!!!\nA: ๊ฐ์ ์๋ผ์ง ๋ง์๊ตฌ ํํ์ฐ์ธ์\nA: ๋ง์ง๋ง๋ ๊น์ง ๊ฐ์ ๋ค ์ฐ์ง๋ ๋ชปํฉ๋๋ค ใทใท\nA: ๋์-๊ฐ์-๋์-๊ฐ์\nA: ์-์ฐจ-์-์ฐจ\nC: (ํก์กฑ)\nC: ์ํ๋ธ ์์ ์์ฑ๊น์ง 3๋ถ๋จ์์ต๋๋ค.\nA: ์๋ฝ์ด ์ฌ๋ฌ๋ถ๋ค\nA: ๊ถ๊ธํ๊ฑฐ๋ ๋งํ๋ ๊ฒ๋ค ๋ง์์ฃผ์๋ฉด\nA: ๋ค ์๋ ค๋๋ฆฌ๊ณ ๋ค ๋์๋๋ฆฝ๋๋ค","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
5 |
+
{"generated":"* E๋ ๊ธธ๋ ์์์ด ๋ถ์กฑํ๋ค๋ ์ด์ ๋ก ๊ธ๋ฃจํค์ ๋ง์ด ์ก์๋ฌ๋ผ๊ณ ํฉ๋๋ค. \n* B๋ ๊ฐํ๋ชฌ์คํฐ ๋ฏธ์
์ ํ๊ณ ์์ผ๋ฉฐ ๋ช๋ช ์ฌ๋๋ค์ ๋ชจ์งํฉ๋๋ค. B๋ ๋ค๋ฅธ ํ๋์ ์๋ดํ๋ฉฐ F์ ํจ๊ป ๋ฏธ์
์ ์๋ฃํฉ๋๋ค.\n* C๋ ๊ธธ๋ ์ฐ๊ตฌ ๊ธฐ๋ถ๋ฅผ ๋ถํํ๋ฉฐ ์ ์ด๊ธฐ์ง ๊ฑด์ค์ ์ง์ํ๊ธฐ๋ฅผ ์์ฒญํฉ๋๋ค. B๋ C์ ์์ฒญ์ ๋ฐ์๋ค์ฌ ์ ์ด๊ธฐ์ง ๊ฑด์ค์ ๊ธฐ์ฌ๋จ์ ๋ฐฐ์นํ๊ณ 2๋ ๋ฒจ ์ด์์ ๋ถ๋๋ค์ ์ฑ์ง์ง๋ก ๋ณด๋ด ์์ ํ๋ณด๋ฅผ ์ํ ๊ณํ์ ์ค๋ช
ํฉ๋๋ค.\n* D๋ ์ ์ด๊ธฐ์ง๊ฐ ๊ณง ์์ฑ์ด ๋ ๊ฒ์ด๋ผ๊ณ ๋งํ๋ฉฐ, ๋ค๋ฅธ ์ฑ์ ์ ์ฐฐ์ ์ด๋ป๊ฒ ๋ณด๋ธ์ง์ ๋ํด A์๊ฒ ๋ฌธ์ํฉ๋๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"E: ์ฌ๋ฌ๋ถ!\nE: ๊ธ๋ฃจํค ๋ง์ด ์ก์์ฃผ์ธ์!\nE: ๊ธธ๋ ์์์ด ๋ถ ์กฑํฉ๋๋ค.\nE: ์ ๊ณผ๊ฟ์ด ํ๋ฅด๋ ์ํ๋ธ์ด ๋๊ณ ์ถ๋ค?\nE: ๊ธ๋ฃจํค์ ๋ง์ด ์ก์์ฃผ์ญ์ผ\nG: ๊ฐ๋ํ ์ํ๋ธ์ ์์ ์ ์์ต๋๋ค\nE: ํ๋ฆฌ์ฆ\nB: ๊ฐํ๋ชฌ์คํฐ ๋ฏธ์
๊ฑธ๋ฆฌ์ ๋ถ?\nB: ๊ฐ์ด ์ก์ผ์ค๋ถ ๊ณ์ ๊ฐ์ ~\nF: ์ขํ์ฃผ์ธ์\nB: ShareCoord#๋น ๋
#155225.781250#534257.875000\nB: ShareCoord#๋น ๋
#154322.968750#533640.812500\nB: ๊ฐํ๋ชฌ์คํฐ ์ก์ผ์ค๋ถ ๊ตฌํด์~~~\nB: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nB: ์ถ๋ฐํฉ๋๋ค\nB: ๊ฐํ๋ชฌ์คํฐ~!\nB: ๋์ด์ค์ปค๋น๋\nF: ใ
ใ
\nB: ๊ฐ์ฌํฉ๋๋ท\nF: ๊ตฟ๊ตฟ\nB: ๋๋ฅธ ๋ฒํ์ ๋ฐ๋๊ณ ์ถ์\nC: ๊ธธ๋ ์ฐ๊ตฌ ๊ธฐ๋ถ ๋ถํ๋๋ฆฝ๋๋ค.\nB: ์๋ณด์ด์ ๋ง์์ ์๋๊ฐ\nC: ๊ธธ๋ ๋ณด์๋ฉด ๊ธธ๋ ์ฐ๊ตฌ ์์ด์~\nB: ShareCoord#1๋ ๋ฒจ ๋๋๊ณค ๋ ์ด#138900.000000#535800.000000\nB: ์ ์ด๊ธฐ์ง ์ง๊ณ ๋๋๊ณค ์ ํ๋ฒ ์ป์ด๋ณผ๊ฒ์~~\nB: ์์ ์ป๊ณ ์ข ํค์์ผ ์ฉ์ธ ์ ์์ต๋๋ค~~~!!\nC: ์ ์ด๊ธฐ์ง ๊ฑด์ค์ ๊ธฐ์ฌ๋จ 1๊ฐ์ฉ ์ง์ ๋ถํ๋๋ฆฝ๋๋ค\nE: ๊ธ๋ฃจํค๋ ๋ง์ด ์ก์์ฃผ์ธ์\nE: ๊ธธ๋ ์์์ ๋ชจ์์ผ ํฉ๋๋น\nB: ๋ต ์๋ชจํ๋\nB: ์ถฉ์ฑ์ถฉ์ฑ์ถฉ์ฑ XXXXXXXXXXX\nE: ๊ฐ์ฆ์ ๋ฐํ ๋ผ!\nB: ํ๋ ํฌ์ธํธ ์ป๋ ๋ฒ\nB: 1. ๊ธธ๋ > ์ฐ๊ตฌ > ๊ธธ๋ ์ฐ๊ตฌ ๊ธฐ๋ถ(์ถ์ฒ ์ฐ๊ตฌ)\nB: 2. ๊ธธ๋ ๋ณธ๋ถ > ๋์\nB: ํ๋ ํฌ์ธํธ ์ฌ์ฉ๋ฒ\nB: 1. ๊ธธ๋ > ๊ธธ๋ ์์ > ์ํ ๊ตฌ๋งค(๊ธฐ์ฌ ๊ฒฝํ์น ๊ฟ)\nB: ๊ธฐ์ฌ๋ ์
๋นจ๋ฆฌํด์ ๊ดด์ ๋ชฌ์คํฐ ๋นจ๋ฆฌ์ก๊ณ ๊ณ ์ฑ์ฅํฉ์ธ๋ค~!!\nB: ๋์ ๊ณ ๋ง์ต๋๋ค ํคํค\nB: ๊ณง ์์ฌ์๊ฐ์ผํ
๋ฐ์\nB: ์ ์ด๊ธฐ์ง์ ๊ธฐ์ฌ๋จ 1๊ฐ ๋ฃ์ด์ฃผ์๊ตฌ\nB: ๋๋จธ์ง ๋ถ๋๋ 2๋ ๋ฒจ ์ด์ ์ฑ์ง์ง์ ๋ฃ์ด๋์ ํ ์์ฌ๋ค๋
์ค์๋ฉด\nB: ์ด๋ฒคํธ๋ ํด๋ฆฌ์ด๋๊ณ ์์๋ ๋นต๋นตํด์ง๋๋ค~~~\nB: ์๋
ํ์ธ์~~~~\nB: ๊ถ๊ธํ๊ฑฐ ๋ฌผ์ด๋ณด์
์~~~ ๋ค ์๋ ค๋๋ฆฝ๋๋ค ~~\nD: ์ฐ๋ฆฌ ์ ์ด๊ธฐ์ง 3๋ถ ๋จ์๋๋ฐ.\nA: ์ ์ฐฐ์ ์ด๋ป๊ฒ ๋ณด๋ด๋์\nD: ์ข๋ง ๋ ์์ฐจ์์ฐจ ํด์ฃผ์๋ฉด\nD: ๋จ์ ์ฑ ์ฐ๊ณ ์ ์ฐฐ ๋ฒํผ ํด๋ฆญํ๋๊น ์ ์ฐฐ ๋ณด๋ด์ก์ด์.","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
6 |
+
{"generated":"* ์ฌ๋ฌ ์ฌ๋์ด ํจ๊ป ๊ฒ์์ ํ๊ณ ์๋ค. \n* A๊ฐ ์ด์ค๋ผ์์ ์์ ์ด๋ฒคํธ์ ์ฐธ์ฌํ๋ผ๋ ๋ถํ์ ๋ฐ๊ณ , ๊ทธ ํ ๋ค๋ฅธ ํ๋ ์ด์ด๋ค๊ณผ ๋ํ๋ฅผ ๋๋๋ค.\n* C๋ ๊ฒ์ ๋ด ๋ง์์ ์์น ์ ๋ณด๋ฅผ ๊ณต์ ํ๊ณ , ๋ง์์ ๋ฐฉ๋ฌธํ๋ผ๊ณ ๋ค๋ฅธ ํ๋ ์ด์ด D์๊ฒ ๊ถ์ ํ๋ค. \n* E๋ 1์์ ํ์๊พผ ์ฌ๋ฅ์ ํ๋ค๊ณ ์๋ฆฌ๊ณ ์ขํ๋ฅผ ๊ณต์ ํ๋ค. \n* ์๋ก์ด ํ๋ ์ด์ด B์ F๊ฐ ๊ฐ์
ํ๋ฉด C์ E๋ ํ์ํ๋ค.\n* C๋ ํ๋ ์ด์ด D๊ฐ ๋ค๋ฅธ ์ง์ญ์ผ๋ก ์ด๋ํ๋๋ฐ, ์ด์ ๋ํด ๋๋ด์ ๋๋๋ค.\n* ๋ง์ง๋ง์ผ๋ก C๋ ๋จ์์ปค๋น(ํน์ ์ปค๋น?)๋ผ๋ ์๋ก์ด ํด๋ฆฌ๊ณค์ ์์น๋ฅผ ๊ณต๊ฐํ๊ณ ์ ๋น๋ก์ด ํ์ฌ๋ฅผ ์ค๋นํ์๊ณ ์ ์ํ๋ค.","instruction":"์ฃผ์ด์ง ์ฑํ
์ ๋ํ ์์ฝ์ ์์ฑํด์ค. ์ฌ๋๋ค์ ๋ฐํ๋ฅผ ํตํด์ ์ด๋ค ์ํฉ์ธ์ง ํ์
ํ ์ ์์์ผ๋ฉด ์ข๊ฒ ์ด. ์์์ฒ๋ผ \"์์ฝ:\" ๋ค์๋ค ์์ฑํด์ฃผ๋ฉด ๋ผ.","source":"D: ๋ณดํธ๋ง์๋ ์ฑ ๋นผ๊ณ \nD: ๋ค๋ฅธ ์ฌ๋์ด ์ฑ์งํ๋ ๊ณณ๋ ์ ์ฐฐ ๋๋๋ฐ์.\nC: ์ ์ฐฐ์ ๋ง์์ด๋ ํ์ธ์ ์ฑ์ ๋๋ฅด๋ฉด ์ ์ฐฐ ๋๋๋ฐ์\nE: ์ ์ฐฐ์ ์์ง์ ์ ์ฐฐ ๋ณธ๋ถ๋ฅผ ๊ฑด์คํด์ผํฉ๋๋ค\nC: ShareCoord#2๋ ๋ฒจ ํ๋ก์ ๋ง์#148500.000000#524397.000000\nC: ๋ฏธ์
๋๋๋ฌธ์ด์๋ฉด ์ด ๋ง์์ด๋\nA: ์ค ํ๋์ ๋ ์๊ฐ๋ค\nC: ์ค ๋์
จ๋์\nA: ใ
ใ
๋ค ใณใณ\nD: ํ๋์...\nC: ์ฌ์ค๋ถ์์ด์
๋๋ค ใ
ใ
\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ๋ฑ๋ค๊ตฌ๋ฆฌ๋ผ๊ณ ํด๋ ๋ฏฟ์ ์ฃผ๋ฅ์ด ๊ธธ์ด์ง๋ง ๊ทธ๋ ๋ค๊ณ ํ๋ ๋ฏฟ๊ฒ ์ต๋๋ค\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nA: ํ๊ฒ์. ๊ทธ ์ ๋น์ ์ธ๊ณ\nC: ์
์ด ์์ฃฝ ๋์์ด์\nC: ์์ง๋ฏ์\nA: ๋ค์ ์ ์ฐฐ ์๋ณด๋ผ๊ฒ ใท\nE: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nD: ๋\nC: ๊ตฐ์ฃผ๋ ๊ณ์ ๊ฐ์\nC: ์คํธ๋ฆฟ ๊ตฐ์ฃผ๋\nC: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nC: ์์ ๊ทผ์ฒ๋ก ์ค์๋ฉด ๊ธ๋ค๋ฃจํค ๊ณต๊ฒฉ๋ ์ง๊ฒฐ์ด ํจ์ฌ ์ฉ์ดํฉ๋๋ค!!\nC: ์ด์ชฝ์ผ๋ก ์ค์
์ฉ~!\nC: ์ขํ ๊ทผ์ฒ ๋
์ ๊พธ์ฑ ๋๋ฅด์ ํ ์ฑ ์ด๋์ ๋๋ฅด์๋ฉด ์ฑ์ด ์ด์ ๋ฉ๋๋ค\nE: 1์์ ํ์๊พผ ์ฌ๋ฅ ๊ฐ๋๋ค.\nE: ํ๊ทธ ๋ฌ์์ผ๋ ์์น ํ์ธํด์ฃผ์ธ์ :)\nB: ์๋
ํ์ธ์ ๊ฐ์
์ธ์ฌ ๋๋ฆฝ๋๋ค\nE: ์๋
ํ์ธ์! ๋ฐ๊ฐ์ต๋๋น :)\nC: ์๋
ํ์ธ์~~~\nC: 1์ ํ์๊พผ ํ์ธํ์ต๋๋ค\nE: ์์\nF: ์๋
ํ์ธ์ ๋ฐ๊ฐ์ต๋๋ค\nC: ShareCoord#[CITA]๊ธธ๋์์#149559.000000#521478.000000\nE: ์ด์์ค์ญ์ผ!\nE: ์ ๊ณผ ๊ฟ์ด ํ๋ฅด๋ ์ํ๋ธ์
๋๋ค.\nE: ๊ธธ๋ ์์ ๊ทผ์ฒ๋ก ์ฑ ์ด๋ ํด์ฃผ์ธ์! :)\nD: ๋ค~~~~~~\nC: ShareCoord#[STOP]์ ์ด๊ธฐ์ง#-152190.000000#492921.000000\nC: ์ฐ๋ฆฌ๋ ์๊ธฐ์ฒ๋ผ ์ธ์ฐ๊ฒ ๋ ์ง๋ ๋ชฐ๋ผ์ ใ
ใ
\nC: ๋ฌด๋ ฅ์ ๋นจ๋ฆฌ ํค์๋ก์๋ค~!!\nC: ํ!\nC: ์ปค๋น๋ ํน์ ์ํ๋ธ ๋ฒ๋ฆฌ๊ณ ๊ฐ์๋์\nC: ShareCoord#[CITA]๋จ์์ปค๋น#-155181.000000#523710.000000\nC: ๋ค์ผฐ์ผ๋ ๋ณต๊ทํ์์ฃต\nC: ใ
ใ
ใ
ใ
ใ
ใ
ใ
\nC: ํ์ฌ ์ค๋นํ์ธ์\nC: ShareCoord#[INSD]๋ถ์์ด#-148347.000000#488490.000000","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
|
streamlit_app_local/user_submit/llm/240413_dora.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ๋ ์ข์."}
|
2 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๊ทผ๋ฐ ๋ ์ฌ๋ฆ๋ณด๋ค ๊ฒจ์ธ์ด ๋ ์ข์. ์๋ํ๋ฉด ์ด์ง ์ถ์ด ๋ ์จ๊ฐ ์ท ์
๊ธฐ๋ ์ฝ๊ณ , ํน ์ฌ๊ณ ์์ ์๋ ์์ด์ ์ข๊ฑฐ๋ ."}
|
3 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค ๊ฒจ์ธ์ด ์ข์. ์๋ํ๋ฉด ๊ฒจ์ธ์๋ ๋ฐ์ ๋๊ฐ์ ๋๊ธฐ๋ ํธํ๊ณ , ๋ ๋ฎ์ธ ์ฐ์ด๋ ํธ์๋ฅผ ๋ณผ ์ ์์ด์ ์ข๊ธฐ ๋๋ฌธ์ด์ผ."}
|
4 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ํ๊ต๋ผ๋ ๊ณณ์ ๋ฐฉ๋ฌธํ๋ค. ๊ทธ๋ฆฌ๊ณ ์ง์ ๋์์์๋ ๋ฐ๋ก ์์ ๋ฌธ์ ์ง์ ํ๊ธฐ ์์ํ๋๋ฐ, ์ด๊ฒ ์ง๊ธ๊น์ง ํ๋ ๊ฒ ์ค์์ ๊ฐ์ฅ ์ง์ค์ด ์ ๋๋ค. ์ง๊ธ์ ์์ ๋ ๋ง์น๊ณ ์ฌ์ฌ ์ ์๋ฆฌ์ ๋๋ ์ค์ด๋ค."}
|
5 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ํ๋ฃจ ์ข
์ผ ํ๊ต์ ๊ฐ๋ค ์๋ค. ์ง์ ์์ ์ ๊น ์์ ํด์์ ์ทจํ๊ณ , ์์ ๋ ๋ง์ ๋ค ๋๋๋ค. ์ค๋์ ์ฑ
๋ ๋ง์ด ์ฝ๊ณ , ๋ง์๋ ํธํ๊ฒ ์ฌ์ด์ผ๊ฒ ๋ค."}
|
6 |
+
{"model_id": "240413_dora", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ์ฌ๋ ์๊ฐ์ ๊ณต๋ถ๋ฅผ ํ๋ฉฐ ํ๋ฃจ๋ฅผ ๋ง๋ฌด๋ฆฌ ์ง์๋ค."}
|
streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ๋ ๋ง์์ ๋ค์ด."}
|
2 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์๋ฌด๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ๋ ์ข๋ค. ์ด์ ๋ ๋ฌผ๋ก ์ถ์์์ด๊ธฐ๋ ํ์ง๋ง, ๋ด๊ฐ ํผ๋ถ๋ก ๋๋ผ๋ ๊ฒ๋ค ๋๋ฌธ์ด๋ค. ํนํ๋ ๋์ ์ ๋ง ํ์๊ณ ๊นจ๋ํ๋ค. ์ด๊ฒ์ ๊ฒจ์ธ์๋ง ๋ณผ ์ ์๋ ๊ฐ์ฅ ์๋ฆ๋ค์ด ํ๊ฒฝ ์ค ํ๋๋ค."}
|
3 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ๋ ์ข์."}
|
4 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋ ๋๋ ํ๊ต์ ๋ฐฉ๋ฌธํ๋ค. ์ง์ ๋์์์ ์ด์ฌํ ๊ณต๋ถํ๋๋ฐ ๋ด์ผ๋ถํฐ๋ ๋ ๋ค๋ฅธ ์์ ๋ค์ด ๊ธฐ๋ค๋ฆฌ๊ณ ์์ ๊ฒ์ด๋ค."}
|
5 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ ์ค๋์ ์ ๋ง ๋ฐ์ ํ๋ฃจ์๋ค. ํ๊ต์ ๊ฐ์ ์ด์ฌํ ๊ณต๋ถํ๋ค. ๊ทธ๋ฆฌ๊ณ ์ง์ ๋์์์๋ ์์ ๋ ๋ค ๋๋ด๋๊ณ ์๊ธฐ๋ก ํ๋ค."}
|
6 |
+
{"model_id": "240413_dora_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋ ๋๋ ํ๊ต์ ๊ฐ์๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ๊ทธ๋ฆฌ๊ณ ์ค๋ ์์ ๋ ๋ค ๋ง์น๊ณ , ์ด์ ์ ์ค๋นํ๋ ค๊ณ ํ๋ค~"}
|
streamlit_app_local/user_submit/llm/240413_full.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ํจ์ฌ ๋ ์ข์ ๊ฒ ๊ฐ์."}
|
2 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ ํจ์ฌ ๋ ์ ํธํด. ์ฌ๋ฆ์ ๋์ ๊ธฐ์จ์ ๋๋ฅผ ์ง์น๊ฒ ๋ง๋ค๊ฑฐ๋ . ์์ํ ๋ฐ๋๊ณผ ํจ๊ป ๋ฐ๋ปํ ์ด๋ถ ์์์ ํน ์ฌ๋ค๊ฐ ๋ค์ ์๋ก์ด ๋ง์์ผ๋ก ๋ค์ ์์ํ๊ณ ์ถ์ด."}
|
3 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ๋ ์ข๋ค."}
|
4 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ์ผ์ ๊ธฐ๋ก:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ๋์์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋์ ์์ ๋ ๋๋ง์น๊ณ ์ด์ ๊ณง ์ ์ค๋นํ ์๊ฐ์ด๋ค."}
|
5 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ์ ๋ง ํ๋ฃจ ์ข
์ผ ๋ฐ๋นด๋ค. ํ๊ต๋ฅผ ๊ฐ๊ณ , ์ง์ ๋์์์ ๋ฐ๋ก ๊ณต๋ถ๋ฅผ ์์ํ๋ค. ์ค๋ ํ ์์ ๋ฅผ ๋ชจ๋ ๋ง์ณค๊ณ , ์ด์ ๋ ์ ์ ์๊ธฐ ์ํด ์๋ฆฌ์ ๋์์ผ๊ฒ ๋ค."}
|
6 |
+
{"model_id": "240413_full", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋ ๋๋ ํ๊ต์ ๊ฐ์๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ์์ํ๋ค. ์ค๋ ๊ณผ์ ๋ ๋ค ๋๋ง์น๊ณ ์ด์ ์ ์๋ฆฌ์ ๋ค ์๊ฐ์ด๋ค."}
|
streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ํจ์ฌ ๋ ์ข์ ๊ฒ ๊ฐ์."}
|
2 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ํจ์ฌ ๋ ์ข๋ค. ๊ฒจ์ธ์๋ ์ถ์ด ๋ ์จ ๋๋ถ์ ์ฐ์ฑ
์ด๋ ์ด๋์ ํ๊ธฐ์๋ ํธํ๊ณ , ๊ฒจ์ธ์ ์ ์ทจ๋ฅผ ๋๋ ์ ์์ด์ ์ข๋ค. ๊ทธ๋ฆฌ๊ณ ๊ฒจ์ธ์๋ ๋์ด ๋ด๋ฆด ๋์ ๋ถ์๊ธฐ๊ฐ ์ ๋ง ๋ญ๋ง์ ์ด๋ค."}
|
3 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "๋ ์ฌ๋ฆ๋ณด๋ค๋ ๊ฒจ์ธ์ด ์ข์.", "generated": "๋๋ ์ฌ๋ฆ๋ณด๋จ ๊ฒจ์ธ์ด ๋ ์ข๋๋ผ. ๊ฒจ์ธ์๋ ์ฐจ๊ฐ์ด ๋ฐ๋์ด ์ผ๊ตด์ ํกํก ๋๋๋ฆฌ๊ณ , ๋์ด ๋ด๋ฆฌ๋ฉด ์ธ์์ด ํ์๊ณ ์๋ฆ๋ต๊ฒ ๋ณด์ฌ."}
|
4 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ๋ด์ฉ์ ์ ์งํ ์ฑ, ์ดํ๋ฅผ ๋ ๊ธธ๊ฒ ๋ฐ๊พธ๊ฑฐ๋ ์ฌ์ํ ๋ด์ฉ์ ์ถ๊ฐํ์ฌ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋ ๋๋ ํ๊ต์ ๋ฐฉ๋ฌธํ๋ค. ์ง์ ๋์์ ์ด์ฌํ ๊ณต๋ถํ๋ค. ์ค๋์ ๊ณผ์ ๋ ๋ชจ๋ ๋๋ง์น๊ณ , ์ด์ ๋ ์ ์ ๋ค ์๊ฐ์ด๋ค."}
|
5 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ค๊ฐ ์ค๊ฐ์ ์๋ก์ด ๋ด์ฉ์ด๋ ๋ฌธ์ฅ์ ์ถ๊ฐํ์ฌ ๊ธธ๊ฒ ๋๋ ค์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋์ ํ๊ต์ ๊ฐ๋ ๋ ์ด์๋ค. ์ง์์ ์์นจ์ ๋จน๊ณ ํ๊ต์ ๋์ฐฉํ๋ค. ์์
์ด ์์๋์๊ณ , ๋๋ ์ด๊ณต ๋ชจ๋๋ก ๋ค์ด๊ฐ๋ค. ์์ ๋ ๋ค ๋๋๊ณ ์ด์ ์์ผ๊ฒ ๋ค."}
|
6 |
+
{"model_id": "240413_full_epoch_5", "task": "๋๋ ค์ฐ๊ธฐ", "instruction": "์
๋ ฅ์ผ๋ก ์ฃผ์ด์ง ๊ธ์ ์ดํฌ๋ฅผ ๋ฐ๊พธ๊ฑฐ๋, ๋ฌธ์ฅ์ ์ฌ๋ฌ๊ฐ๋ก ๋๋์ด์ ๋ค์ ์จ์ฃผ์ธ์.", "source": "์ค๋์ ์ผ๊ธฐ:\n์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค. ์ง์ ์์ ๊ณต๋ถ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ ๋๋ด๊ณ ์์ผ๊ฒ ๋ค.", "generated": "์ค๋ ๋๋ ํ๊ต์ ๊ฐ๋ค ์๋ค. ์ง์ ์์ ์์ ๋ฅผ ํ๋ค. ์ค๋ ์์ ๋ฅผ ๋๋ด๋ฉด ์์ผ๊ฒ ๋ค."}
|