openthaigpt
/

openthaigpt1.5-72b-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kobkrit commited on Oct 29, 2024

Commit

157f7e4

·

verified ·

1 Parent(s): 08dfbd4

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -270,9 +270,14 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 2. Run server
 ```bash
-vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
 ```
 * Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
 3. Run inference (CURL example)
 ```bash

 2. Run server
 ```bash
+vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
 ```
 * Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
+If you wish to enable tool calling feature, add ``--enable-auto-tool-choice --tool-call-parser hermes`` into command. e.g.,
+```bash
+vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes
+```
 3. Run inference (CURL example)
 ```bash