Hansimov commited on
Commit
a16695d
·
1 Parent(s): f3b1386

:pencil: [Doc] Readme: Features, deployment, api usage examples, and huggingface space configs

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: HF LLM API
3
+ emoji: ☯️
4
+ colorFrom: gray
5
+ colorTo: gray
6
+ sdk: docker
7
+ app_port: 23333
8
+ ---
9
+
10
+ ## HF-LLM-API
11
+ API for LLM inference in Huggingface spaces.
12
+
13
+
14
+ ## Features
15
+
16
+ ✅ Implemented:
17
+
18
+ - Support Models
19
+ - `mixtral-8x7b`
20
+ - Support OpenAI API format
21
+ - Can use api endpoint via official `openai-python` package
22
+ - Support stream response
23
+ - Support infinite-round chat
24
+ - Support Docker deployment
25
+
26
+ 🔨 In progress:
27
+ - [ ] Support more models
28
+
29
+ ## Run API service
30
+
31
+ ### Run in Command Line
32
+
33
+ **Install dependencies:**
34
+
35
+ ```bash
36
+ # pipreqs . --force --mode no-pin
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ **Run API:**
41
+
42
+ ```bash
43
+ python -m apis.chat_api
44
+ ```
45
+
46
+ ## Run via Docker
47
+
48
+ **Docker build:**
49
+
50
+ ```bash
51
+ sudo docker build -t hf-llm-api:1.0 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
52
+ ```
53
+
54
+ **Docker run:**
55
+
56
+ ```bash
57
+ # no proxy
58
+ sudo docker run -p 23333:23333 hf-llm-api:1.0
59
+
60
+ # with proxy
61
+ sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.0
62
+ ```
63
+
64
+ ## API Usage
65
+
66
+ ### Using `openai-python`
67
+
68
+ See: [examples/chat_with_openai.py](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_openai.py)
69
+
70
+ ```py
71
+ from openai import OpenAI
72
+
73
+ # If runnning this service with proxy, you might need to unset `http(s)_proxy`.
74
+ base_url = "http://127.0.0.1:23333"
75
+ api_key = "sk-xxxxx"
76
+
77
+ client = OpenAI(base_url=base_url, api_key=api_key)
78
+ response = client.chat.completions.create(
79
+ model="mixtral-8x7b",
80
+ messages=[
81
+ {
82
+ "role": "user",
83
+ "content": "what is your model",
84
+ }
85
+ ],
86
+ stream=True,
87
+ )
88
+
89
+ for chunk in response:
90
+ if chunk.choices[0].delta.content is not None:
91
+ print(chunk.choices[0].delta.content, end="", flush=True)
92
+ elif chunk.choices[0].finish_reason == "stop":
93
+ print()
94
+ else:
95
+ pass
96
+ ```
97
+
98
+ ### Using post requests
99
+
100
+ See: [examples/chat_with_post.py](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_post.py)
101
+
102
+
103
+ ```py
104
+ import ast
105
+ import httpx
106
+ import json
107
+ import re
108
+
109
+ # If runnning this service with proxy, you might need to unset `http(s)_proxy`.
110
+ chat_api = "http://127.0.0.1:23333"
111
+ api_key = "sk-xxxxx"
112
+ requests_headers = {}
113
+ requests_payload = {
114
+ "model": "mixtral-8x7b",
115
+ "messages": [
116
+ {
117
+ "role": "user",
118
+ "content": "what is your model",
119
+ }
120
+ ],
121
+ "stream": True,
122
+ }
123
+
124
+ with httpx.stream(
125
+ "POST",
126
+ chat_api + "/chat/completions",
127
+ headers=requests_headers,
128
+ json=requests_payload,
129
+ timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None),
130
+ ) as response:
131
+ # https://docs.aiohttp.org/en/stable/streams.html
132
+ # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb
133
+ response_content = ""
134
+ for line in response.iter_lines():
135
+ remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"]
136
+ for pattern in remove_patterns:
137
+ line = re.sub(pattern, "", line).strip()
138
+
139
+ if line:
140
+ try:
141
+ line_data = json.loads(line)
142
+ except Exception as e:
143
+ try:
144
+ line_data = ast.literal_eval(line)
145
+ except:
146
+ print(f"Error: {line}")
147
+ raise e
148
+ # print(f"line: {line_data}")
149
+ delta_data = line_data["choices"][0]["delta"]
150
+ finish_reason = line_data["choices"][0]["finish_reason"]
151
+ if "role" in delta_data:
152
+ role = delta_data["role"]
153
+ if "content" in delta_data:
154
+ delta_content = delta_data["content"]
155
+ response_content += delta_content
156
+ print(delta_content, end="", flush=True)
157
+ if finish_reason == "stop":
158
+ print()
159
+
160
+ ```