100suping
/

Qwen2.5-Coder-34B-Instruct-kosql-adapter

@@ -17,7 +17,9 @@ tags:
 # 100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter
 <!-- Provide a quick summary of what the model is/does. -->
-This Repo contains **LoRA (Low-Rank Adaptation) Adapter** for [unsloth/qwen2.5-coder-32b-instruct]
 This adapter was created through **instruction tuning**.
@@ -29,7 +31,6 @@ This adapter was created through **instruction tuning**.
 <!-- Provide a longer summary of what this model is. -->
 - **Base Model:** unsloth/Qwen2.5-Coder-32B-Instruct
 - **Task:** Instruction Following(Korean)
 - **Language:** English (or relevant language)
@@ -47,13 +48,92 @@ To use this LoRA adapter, refer to the following code:
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 ```
 ```
 ### Inference
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 ```
 ```
 ## Bias, Risks, and Limitations

 # 100suping/Qwen2.5-Coder-34B-Instruct-kosql-adapter
 <!-- Provide a quick summary of what the model is/does. -->
+This Repo contains **LoRA (Low-Rank Adaptation) Adapter** for [unsloth/qwen2.5-coder-32b-instruct-bnb-4bit]
+The Adapter was trained for improving model's SQL generation capability in Korean question & multi-db context.
 This adapter was created through **instruction tuning**.
 <!-- Provide a longer summary of what this model is. -->
 - **Base Model:** unsloth/Qwen2.5-Coder-32B-Instruct
 - **Task:** Instruction Following(Korean)
 - **Language:** English (or relevant language)
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 ```
+GENERAL_QUERY_PREFIX = """당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
+당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.
+(context)
+{context}
+"""
+GENERATE_QUERY_INSTRUCTIONS = """
+주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
+"""
+```
+### Preprocess Functions
+```
+def get_conversation_data(examples):
+    questions = examples['question']
+    schemas =examples['schema']
+    sql_queries =examples['SQL']
+    convos = []
+    for question, schema, sql in zip(questions, schemas, sql_queries):
+        conv = [
+        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=schema) + GENERATE_QUERY_INSTRUCTIONS},
+        {"role": "user", "content": question},
+        {"role": "assistant", "content": "```sql\n"+sql+";\n```"}
+        ]
+        convos.append(conv)
+    return {"conversation":convos,}
+def formatting_prompts_func(examples):
+    convos = examples["conversation"]
+    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
+    return { "text" : texts, }
+```
+### Example input
+```
+<|im_start|>system
+당신은 사용자의 입력을 MySQL 쿼리문으로 바꾸어주는 조직의 팀원입니다.
+당신의 임무는 DB 이름 그리고 DB내 테이블의 메타 정보가 담긴 아래의 (context)를 이용해서 주어진 질문(user_question)에 걸맞는 MySQL 쿼리문을 작성하는 것입니다.
+(context)
+DB: movie_platform
+table DDL: CREATE TABLE `movies` ( `movie_id` INTEGER `movie_title` TEXT `movie_release_year` INTEGER `movie_url` TEXT `movie_title_language` TEXT `movie_popularity` INTEGER `movie_image_url` TEXT `director_id` TEXT `director_name` TEXT `director_url` TEXT PRIMARY KEY (movie_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists`(user_id) FOREIGN KEY (list_id) REFERENCES `lists`(list_id) FOREIGN KEY (user_id) REFERENCES `ratings_users`(user_id) FOREIGN KEY (user_id) REFERENCES `lists_users`(user_id) FOREIGN KEY (movie_id) REFERENCES `movies`(movie_id) );
+주어진 질문(user_question)에 대해서 문법적으로 올바른 MySQL 쿼리문을 작성해 주세요.
+<|im_end|>
+<|im_start|>user
+가장 인기 있는 영화는 무엇인가요? 그 영화는 언제 개봉되었고 누가 감독인가요?<|im_end|>
+<|im_start|>assistant
+```sql
+SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ;
+```<|im_end|>
 ```
 ### Inference
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 ```
+messages = [
+        {"role": "system", "content": GENERAL_QUERY_PREFIX.format(context=context) + GENERATE_QUERY_INSTRUCTIONS},
+        {"role": "user", "content": "user_question: "+ user_question}
+    ]
+text = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=max_new_tokens
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Bias, Risks, and Limitations