--- license: other license_name: qwen language: - th - en library_name: transformers pipeline_tag: text-generation tags: - openthaigpt - qwen --- # ðŸ‡đ🇭 OpenThaiGPT 72b 1.5 Instruct ![OpenThaiGPT](https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2Fb8eiMDaqiEQL6ahbAY0h%2Fimage.png?alt=media&token=6fce78fd-2cca-4c0a-9648-bd5518e644ce) [More Info](https://openthaigpt.aieat.or.th/) ðŸ‡đ🇭 **OpenThaiGPT 72b Version 1.5** is an advanced 72-billion-parameter Thai language chat model based on Qwen v2.5 released on September 30, 2024. It has been specifically fine-tuned on over 2,000,000 Thai instruction pairs and is capable of answering Thai-specific domain questions. ## Online Demo: https://demo72b.aieat.or.th/ ## Example code for API Calling https://github.com/OpenThaiGPT/openthaigpt1.5_api_examples ## Highlights - **State-of-the-art Thai language LLM**, achieving the highest average scores across various Thai language exams compared to other open-source Thai LLMs. - **Multi-turn conversation support** for extended dialogues. - **Retrieval Augmented Generation (RAG) compatibility** for enhanced response generation. - **Impressive context handling**: Processes up to 131,072 tokens of input and generates up to 8,192 tokens, enabling detailed and complex interactions. - **Tool calling support**: Enables users to efficiently call various functions through intelligent responses. ## Benchmark on [OpenThaiGPT Eval](https://huggingface.co/datasets/openthaigpt/openthaigpt_eval) ** Please take a look at ``openthaigpt/openthaigpt1.5-72b-instruct`` for this model's evaluation result. | **Exam names** | **scb10x/llama-3-typhoon-v1.5x-70b-instruct** | **meta-llama/Llama-3.1-70B-Instruct** | **Qwen/Qwen2.5-72B-Instruct** | **openthaigpt/openthaigpt1.5-72b-instruct** | |:------------------------------:|:---------------------------------------------:|:-------------------------------------:|:-----------------------------:|:----------------------------------:| | **01_a_level** | 59.17% | 61.67% | 75.00% | 76.67% | | **02_tgat** | 46.00% | 40.00% | 48.00% | 46.00% | | **03_tpat1** | 52.50% | 50.00% | 55.00% | 55.00% | | **04_investment_consult** | 60.00% | 52.00% | 80.00% | 72.00% | | **05_facebook_beleble_th_200** | 87.50% | 88.00% | 90.00% | 90.00% | | **06_xcopa_th_200** | 84.50% | 85.50% | 90.00% | 90.50% | | **07_xnli2.0_th_200** | 62.50% | 63.00% | 65.50% | 70.50% | | **08_onet_m3_thai** | 76.00% | 56.00% | 76.00% | 84.00% | | **09_onet_m3_social** | 95.00% | 95.00% | 90.00% | 95.00% | | **10_onet_m3_math** | 43.75% | 25.00% | 37.50% | 37.50% | | **11_onet_m3_science** | 53.85% | 61.54% | 65.38% | 73.08% | | **12_onet_m3_english** | 93.33% | 93.33% | 96.67% | 96.67% | | **13_onet_m6_thai** | 55.38% | 60.00% | 60.00% | 56.92% | | **14_onet_m6_math** | 41.18% | 58.82% | 23.53% | 41.18% | | **15_onet_m6_social** | 67.27% | 76.36% | 63.64% | 65.45% | | **16_onet_m6_science** | 50.00% | 57.14% | 64.29% | 67.86% | | **17_onet_m6_english** | 73.08% | 82.69% | 86.54% | 90.38% | | **Micro Average** | 69.97% | 71.09% | 75.02% | 76.73% | Thai language multiple choice exams, Test on unseen test set, Zero-shot learning. Benchmark source code and exams information: https://github.com/OpenThaiGPT/openthaigpt_eval (Updated on: 30 September 2024) ## Benchmark on [scb10x/thai_exam](https://huggingface.co/datasets/scb10x/thai_exam) | Models | **Thai Exam (Acc)** | |:----------------------------------------------------------:|:-------------------:| | **api/claude-3-5-sonnet-20240620** | 69.2 | | **openthaigpt/openthaigpt1.5-72b-instruct*** | 64.07 | | **api/gpt-4o-2024-05-13** | 63.89 | | **hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4** | 63.54 | | **Qwen/Qwen2-72B-Instruct** | 58.23 | | **meta-llama/Meta-Llama-3.1-70B-Instruct** | 58.23 | | **scb10x/llama-3-typhoon-v1.5x-70b-instruct** | 58.76 | | **Qwen/Qwen2.5-14B-Instruct** | 57.35 | | **api/gpt-4o-mini-2024-07-18** | 54.51 | | **openthaigpt/openthaigpt1.5-7b-instruct*** | 52.04 | | **SeaLLMs/SeaLLMs-v3-7B-Chat** | 51.33 | | **openthaigpt/openthaigpt-1.0.0-70b-chat** | 50.09 | * Evaluated by OpenThaiGPT team using [scb10x/thai_exam](https://huggingface.co/datasets/scb10x/thai_exam). ## Licenses * Built with Qwen * Qwen License: Allow **Research** and **Commercial uses** but if your user base exceeds 100 million monthly active users, you need to negotiate a separate commercial license. Please see LICENSE file for more information.
## Sponsors ## Supports - Official website: https://openthaigpt.aieat.or.th - Facebook page: https://web.facebook.com/groups/openthaigpt - A Discord server for discussion and support [here](https://discord.gg/rUTp6dfVUF) - E-mail: kobkrit@aieat.or.th ## Prompt Format Prompt format is based on ChatML. ``` <|im_start|>system\n{sytem_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n ``` ### System prompt: ``` āļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ ``` ### Examples #### Single Turn Conversation Example ``` <|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš<|im_end|>\n<|im_start|>assistant\n ``` #### Single Turn Conversation with Context (RAG) Example ``` <|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢ āđ€āļ›āđ‡āļ™āđ€āļĄāļ·āļ­āļ‡āļŦāļĨāļ§āļ‡ āļ™āļ„āļĢāđāļĨāļ°āļĄāļŦāļēāļ™āļ„āļĢāļ—āļĩāđˆāļĄāļĩāļ›āļĢāļ°āļŠāļēāļāļĢāļĄāļēāļāļ—āļĩāđˆāļŠāļļāļ”āļ‚āļ­āļ‡āļ›āļĢāļ°āđ€āļ—āļĻāđ„āļ—āļĒ āļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢāļĄāļĩāļžāļ·āđ‰āļ™āļ—āļĩāđˆāļ—āļąāđ‰āļ‡āļŦāļĄāļ” 1,568.737 āļ•āļĢ.āļāļĄ. āļĄāļĩāļ›āļĢāļ°āļŠāļēāļāļĢāļ•āļēāļĄāļ—āļ°āđ€āļšāļĩāļĒāļ™āļĢāļēāļĐāļŽāļĢāļāļ§āđˆāļē 8 āļĨāđ‰āļēāļ™āļ„āļ™\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢāļĄāļĩāļžāļ·āđ‰āļ™āļ—āļĩāđˆāđ€āļ—āđˆāļēāđ„āļĢāđˆ<|im_end|>\n<|im_start|>assistant\n ``` #### Multi Turn Conversation Example ##### First turn ``` <|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš<|im_end|>\n<|im_start|>assistant\n ``` ##### Second turn ``` <|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš<|im_end|>\n<|im_start|>assistant\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš āļĒāļīāļ™āļ”āļĩāļ•āđ‰āļ­āļ™āļĢāļąāļšāļ„āļĢāļąāļš āļ„āļļāļ“āļ•āđ‰āļ­āļ‡āļāļēāļĢāđƒāļŦāđ‰āļ‰āļąāļ™āļŠāđˆāļ§āļĒāļ­āļ°āđ„āļĢāļ„āļĢāļąāļš?<|im_end|>\n<|im_start|>user\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢ āļŠāļ·āđˆāļ­āđ€āļ•āđ‡āļĄāļĒāļēāļ§āđ†āļ„āļ·āļ­āļ­āļ°āđ„āļĢ<|im_end|>\n<|im_start|>assistant\n ``` ##### Result ``` <|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš<|im_end|>\n<|im_start|>assistant\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš āļĒāļīāļ™āļ”āļĩāļ•āđ‰āļ­āļ™āļĢāļąāļšāļ„āļĢāļąāļš āļ„āļļāļ“āļ•āđ‰āļ­āļ‡āļāļēāļĢāđƒāļŦāđ‰āļ‰āļąāļ™āļŠāđˆāļ§āļĒāļ­āļ°āđ„āļĢāļ„āļĢāļąāļš?<|im_end|>\n<|im_start|>user\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢ āļŠāļ·āđˆāļ­āđ€āļ•āđ‡āļĄāļĒāļēāļ§āđ†āļ„āļ·āļ­āļ­āļ°āđ„āļĢ<|im_end|>\n<|im_start|>assistant\nāļŠāļ·āđˆāļ­āđ€āļ•āđ‡āļĄāļ‚āļ­āļ‡āļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢāļ„āļ·āļ­ \"āļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢ āļ­āļĄāļĢāļĢāļąāļ•āļ™āđ‚āļāļŠāļīāļ™āļ—āļĢāđŒ āļĄāļŦāļīāļ™āļ—āļĢāļēāļĒāļļāļ˜āļĒāļē āļĄāļŦāļēāļ”āļīāļĨāļāļ āļž āļ™āļžāļĢāļąāļ•āļ™āļĢāļēāļŠāļ˜āļēāļ™āļĩāļšāļđāļĢāļĩāļĢāļĄāļĒāđŒ āļ­āļļāļ”āļĄāļĢāļēāļŠāļ™āļīāđ€āļ§āļĻāļ™āđŒāļĄāļŦāļēāļŠāļ–āļēāļ™ āļ­āļĄāļĢāļžāļīāļĄāļēāļ™āļ­āļ§āļ•āļēāļĢāļŠāļ–āļīāļ• āļŠāļąāļāļāļ°āļ—āļąāļ•āļ•āļīāļĒāļ§āļīāļĐāļ“āļļāļāļĢāļĢāļĄāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāđŒ\" ``` ## How to use ### Free API Service (hosted by Siam.Ai and Float16.cloud) #### Siam.AI ```bash curl https://api.aieat.or.th/v1/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer dummy" \ -d '{ "model": ".", "prompt": "<|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢāļ„āļ·āļ­āļ­āļ°āđ„āļĢ<|im_end|>\n<|im_start|>assistant\n", "max_tokens": 512, "temperature": 0.7, "top_p": 0.8, "top_k": 40, "stop": ["<|im_end|>"] }' ``` #### Float16 ```bash curl -X POST https://api.float16.cloud/dedicate/78y8fJLuzE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer float16-AG0F8yNce5s1DiXm1ujcNrTaZquEdaikLwhZBRhyZQNeS7Dv0X" \ -d '{ "model": "openthaigpt/openthaigpt1.5-7b-instruct", "messages": [ { "role": "system", "content": "āļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ" }, { "role": "user", "content": "āļŠāļ§āļąāļŠāļ”āļĩ" } ] }' ``` ### OpenAI Client Library (Hosted by VLLM, please see below.) ```python import openai # Configure OpenAI client to use vLLM server openai.api_base = "http://127.0.0.1:8000/v1" openai.api_key = "dummy" # vLLM doesn't require a real API key prompt = "<|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļāļĢāļļāļ‡āđ€āļ—āļžāļĄāļŦāļēāļ™āļ„āļĢāļ„āļ·āļ­āļ­āļ°āđ„āļĢ<|im_end|>\n<|im_start|>assistant\n" try: response = openai.Completion.create( model=".", # Specify the model you're using with vLLM prompt=prompt, max_tokens=512, temperature=0.7, top_p=0.8, top_k=40, stop=["<|im_end|>"] ) print("Generated Text:", response.choices[0].text) except Exception as e: print("Error:", str(e)) ``` ### Huggingface ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "openthaigpt/openthaigpt1.5-72b-instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "āļ›āļĢāļ°āđ€āļ—āļĻāđ„āļ—āļĒāļ„āļ·āļ­āļ­āļ°āđ„āļĢ" messages = [ {"role": "system", "content": "āļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ"}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ### vLLM 1. Install VLLM (https://github.com/vllm-project/vllm) 2. Run server ```bash vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4 ``` * Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards. 3. Run inference (CURL example) ```bash curl -X POST 'http://127.0.0.1:8000/v1/completions' \ -H 'Content-Type: application/json' \ -d '{ "model": ".", "prompt": "<|im_start|>system\nāļ„āļļāļ“āļ„āļ·āļ­āļœāļđāđ‰āļŠāđˆāļ§āļĒāļ•āļ­āļšāļ„āļģāļ–āļēāļĄāļ—āļĩāđˆāļ‰āļĨāļēāļ”āđāļĨāļ°āļ‹āļ·āđˆāļ­āļŠāļąāļ•āļĒāđŒ<|im_end|>\n<|im_start|>user\nāļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš<|im_end|>\n<|im_start|>assistant\n", "max_tokens": 512, "temperature": 0.7, "top_p": 0.8, "top_k": 40, "stop": ["<|im_end|>"] }' ``` ### Processing Long Texts The current `config.json` is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to `config.json` to enable YaRN: ```json { ..., "rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" } } ``` ### Tool Calling The Tool Calling feature in OpenThaiGPT 1.5 enables users to efficiently call various functions through intelligent responses. This includes making external API calls to retrieve real-time data, such as current temperature information, or predicting future data simply by submitting a query. For example, a user can ask OpenThaiGPT, “What is the current temperature in San Francisco?” and the AI will execute a pre-defined function to provide an immediate response without the need for additional coding. This feature also allows for broader applications with external data sources, including the ability to call APIs for services such as weather updates, stock market information, or data from within the user’s own system. #### Example: ```python import openai def get_temperature(location, date=None, unit="celsius"): """Get temperature for a location (current or specific date).""" if date: return {"temperature": 25.9, "location": location, "date": date, "unit": unit} return {"temperature": 26.1, "location": location, "unit": unit} tools = [ { "name": "get_temperature", "description": "Get temperature for a location (current or by date).", "parameters": { "location": "string", "date": "string (optional)", "unit": "enum [celsius, fahrenheit]" }, } ] messages = [{"role": "user", "content": "āļ­āļļāļ“āļŦāļ āļđāļĄāļīāļ—āļĩāđˆ San Francisco āļ§āļąāļ™āļ™āļĩāđ‰āļĩāđāļĨāļ°āļžāļĢāļļāđ‰āđˆāļ‡āļ™āļĩāđ‰āļ„āļ·āļ­āđ€āļ—āđˆāļēāđ„āļĢāđˆ?"}] # Simulated response flow using OpenThaiGPT Tool Calling response = openai.ChatCompletion.create( model=".", messages=messages, tools=tools, temperature=0.7, max_tokens=512 ) print(response) ``` **Full example**: https://github.com/OpenThaiGPT/openthaigpt1.5_api_examples/blob/main/api_tool_calling_powered_by_siamai.py ### GPU Memory Requirements | **Number of Parameters** | **FP 16 bits** | **8 bits (Quantized)** | **4 bits (Quantized)** | **Example Graphic Card for 4 bits** | |------------------|----------------|------------------------|------------------------|---------------------------------------------| | **7b** | 24 GB | 12 GB | 6 GB | Nvidia RTX 4060 8GB | | **13b** | 48 GB | 24 GB | 12 GB | Nvidia RTX 4070 16GB | | **72b** | 192 GB | 96 GB | 48 GB | Nvidia RTX 4090 24GB x 2 cards | ### Authors * Sumeth Yuenyong (sumeth.yue@mahidol.edu) * Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th) * Apivadee Piyatumrong (apivadee.piy@nectec.or.th) * Jillaphat Jaroenkantasima (autsadang41@gmail.com) * Thaweewat Rugsujarit (thaweewr@scg.com) * Norapat Buppodom (new@norapat.com) * Koravich Sangkaew (kwankoravich@gmail.com) * Peerawat Rojratchadakorn (peerawat.roj@gmail.com) * Surapon Nonesung (nonesungsurapon@gmail.com) * Chanon Utupon (chanon.utupon@gmail.com) * Sadhis Wongprayoon (sadhis.tae@gmail.com) * Nucharee Thongthungwong (nuchhub@hotmail.com) * Chawakorn Phiantham (mondcha1507@gmail.com) * Patteera Triamamornwooth (patt.patteera@gmail.com) * Nattarika Juntarapaoraya (natt.juntara@gmail.com) * Kriangkrai Saetan (kraitan.ss21@gmail.com) * Pitikorn Khlaisamniang (pitikorn32@gmail.com) Disclaimer: Provided responses are not guaranteed.