alvarobartt HF staff commited on
Commit
e4777d6
·
verified ·
1 Parent(s): 51e41de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -35
README.md CHANGED
@@ -127,13 +127,13 @@ Then you just need to run the TGI v2.2.0 (or higher) Docker container as follows
127
 
128
  ```bash
129
  docker run --gpus all --shm-size 1g -ti -p 8080:80 \
130
- -e MODEL_ID=hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 \
131
- -e NUM_SHARD=4 \
132
- -e QUANTIZE=awq \
133
- -e HF_TOKEN=$(cat ~/.cache/huggingface/token) \
134
- -e MAX_INPUT_LENGTH=4000 \
135
- -e MAX_TOTAL_TOKENS=4096 \
136
- ghcr.io/huggingface/text-generation-inference:2.2.0
137
  ```
138
 
139
  > [!NOTE]
@@ -143,42 +143,39 @@ To send request to the deployed TGI endpoint compatible with [OpenAI specificati
143
 
144
  ```bash
145
  curl 0.0.0.0:8080/v1/chat/completions \
146
- -X POST \
147
- -H 'Content-Type: application/json' \
148
- -d '{
149
- "model": "tgi",
150
- "messages": [
151
- {
152
- "role": "system",
153
- "content": "You are a helpful assistant."
154
- },
155
- {
156
- "role": "user",
157
- "content": "What is Deep Learning?"
158
- }
159
- ],
160
- "max_tokens": 128
161
- }'
162
  ```
163
 
164
- Or via the `openai` Python SDK (see [installation notes](https://github.com/openai/openai-python?tab=readme-ov-file#installation)) as:
165
 
166
  ```python
167
  import os
168
- from openai import OpenAI
169
 
170
- client = OpenAI(
171
- base_url="http://0.0.0.0:8080/v1/",
172
- api_key=os.getenv("HF_TOKEN"),
173
- )
174
 
175
  chat_completion = client.chat.completions.create(
176
- model="tgi",
177
- messages=[
178
- {"role": "system", "content": "You are a helpful assistant."},
179
- {"role": "user", "content": "What is Deep Learning?"},
180
- ],
181
- max_tokens=128,
182
  )
183
  ```
184
 
 
127
 
128
  ```bash
129
  docker run --gpus all --shm-size 1g -ti -p 8080:80 \
130
+ -e MODEL_ID=hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 \
131
+ -e NUM_SHARD=4 \
132
+ -e QUANTIZE=awq \
133
+ -e HF_TOKEN=$(cat ~/.cache/huggingface/token) \
134
+ -e MAX_INPUT_LENGTH=4000 \
135
+ -e MAX_TOTAL_TOKENS=4096 \
136
+ ghcr.io/huggingface/text-generation-inference:2.2.0
137
  ```
138
 
139
  > [!NOTE]
 
143
 
144
  ```bash
145
  curl 0.0.0.0:8080/v1/chat/completions \
146
+ -X POST \
147
+ -H 'Content-Type: application/json' \
148
+ -d '{
149
+ "model": "tgi",
150
+ "messages": [
151
+ {
152
+ "role": "system",
153
+ "content": "You are a helpful assistant."
154
+ },
155
+ {
156
+ "role": "user",
157
+ "content": "What is Deep Learning?"
158
+ }
159
+ ],
160
+ "max_tokens": 128
161
+ }'
162
  ```
163
 
164
+ Or programatically via the `huggingface_hub` Python client as follows (TGI is fully compatible with OpenAI so its `openai` SDK can also be used):
165
 
166
  ```python
167
  import os
168
+ from huggingface_hub import InferenceClient # Instead of `from openai import OpenAI`
169
 
170
+ client = InferenceClient(base_url="http://0.0.0.0:8080/v1", api_key=os.getenv("HF_TOKEN", "-")) # Instead of `client = OpenAI(base_url=..., api_key=...)
 
 
 
171
 
172
  chat_completion = client.chat.completions.create(
173
+ model="hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4", # Instead of `model="tgi"`
174
+ messages=[
175
+ {"role": "system", "content": "You are a helpful assistant."},
176
+ {"role": "user", "content": "What is Deep Learning?"},
177
+ ],
178
+ max_tokens=128,
179
  )
180
  ```
181