Update README.md
Browse files
README.md
CHANGED
@@ -270,9 +270,14 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
270 |
|
271 |
2. Run server
|
272 |
```bash
|
273 |
-
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
|
274 |
```
|
275 |
* Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
|
|
|
|
|
|
|
|
|
|
|
276 |
|
277 |
3. Run inference (CURL example)
|
278 |
```bash
|
|
|
270 |
|
271 |
2. Run server
|
272 |
```bash
|
273 |
+
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
|
274 |
```
|
275 |
* Note, change ``--tensor-parallel-size 4`` to the amount of available GPU cards.
|
276 |
+
|
277 |
+
If you wish to enable tool calling feature, add ``--enable-auto-tool-choice --tool-call-parser hermes`` into command. e.g.,
|
278 |
+
```bash
|
279 |
+
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes
|
280 |
+
```
|
281 |
|
282 |
3. Run inference (CURL example)
|
283 |
```bash
|