Spaces:

yym68686
/

uni-api

Sleeping

yym68686 commited on Nov 4, 2024

Commit

76819d6

1 Parent(s): 5f2aeb0

🐛 Bug: 1. Fix the bug where the closed client cannot be found when closing the request client.

2. Fix the bug where, when there is only one provider but multiple API keys, an error prevents switching to the next API key.

✨ Feature: Add feature: Add support for API key request rate limiting, add support for automatically cooling down API key upon receiving a 429 status code.

📖 Docs: Update documentation

Files changed (5) hide show

.github/workflows/main.yml +1 -0
README.md +42 -1
README_CN.md +42 -1
main.py +28 -59
utils.py +124 -5

.github/workflows/main.yml CHANGED Viewed

@@ -21,6 +21,7 @@ on:
 jobs:
   build-and-push:
     runs-on: ubuntu-latest
     steps:
     - name: Checkout repository

 jobs:
   build-and-push:
     runs-on: ubuntu-latest
+    if: ${{ secrets.DOCKER_HUB_USERNAME != '' && secrets.DOCKER_HUB_ACCESS_TOKEN != '' }}
     steps:
     - name: Checkout repository

README.md CHANGED Viewed

@@ -80,12 +80,18 @@ providers:
   - provider: gemini
     base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini model use, required
-    api: AIzaSyAN2k6IRdgw
     model:
       - gemini-1.5-pro
       - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used, if you want to use the original name, you can add the original name in the model, just add the line below to use the original name
       - gemini-1.5-flash-exp-0827 # Add this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested
     tools: true
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
@@ -338,6 +344,41 @@ Thank you for your support!
 Setting ENABLE_MODERATION to false will fix this issue. When ENABLE_MODERATION is true, the API must be able to use the text-moderation-latest model, and if you have not provided text-moderation-latest in the provider model settings, an error will occur indicating that the model cannot be found.
 ## ⭐ Star History
 <a href="https://github.com/yym68686/uni-api/stargazers">

   - provider: gemini
     base_url: https://generativelanguage.googleapis.com/v1beta # base_url supports v1beta/v1, only for Gemini model use, required
+    api: # Supports multiple API Keys, multiple keys automatically enable polling load balancing, at least one key, required
+      - AIzaSyAN2k6IRdgw123
+      - AIzaSyAN2k6IRdgw456
+      - AIzaSyAN2k6IRdgw789
     model:
       - gemini-1.5-pro
       - gemini-1.5-flash-exp-0827: gemini-1.5-flash # After renaming, the original model name gemini-1.5-flash-exp-0827 cannot be used, if you want to use the original name, you can add the original name in the model, just add the line below to use the original name
       - gemini-1.5-flash-exp-0827 # Add this line, both gemini-1.5-flash-exp-0827 and gemini-1.5-flash can be requested
     tools: true
+    preferences:
+      API_KEY_RATE_LIMIT: 15/min # Each API Key can request up to 15 times per minute, optional. The default is 999999/min.
+      API_KEY_COOLDOWN_PERIOD: 60 # Each API Key will be cooled down for 60 seconds after encountering a 429 error. Optional, the default is 60 seconds.
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx # Description: Your Google Cloud project ID. Format: String, usually composed of lowercase letters, numbers, and hyphens. How to obtain: You can find your project ID in the project selector of the Google Cloud Console.
 Setting ENABLE_MODERATION to false will fix this issue. When ENABLE_MODERATION is true, the API must be able to use the text-moderation-latest model, and if you have not provided text-moderation-latest in the provider model settings, an error will occur indicating that the model cannot be found.
+- How to prioritize requests for a specific channel, how to set the priority of a channel?
+Directly set the channel order in the api_keys. No other settings are required. Sample configuration file:
+```yaml
+providers:
+  - provider: ai1
+    base_url: https://xxx/v1/chat/completions
+    api: sk-xxx
+  - provider: ai2
+    base_url: https://xxx/v1/chat/completions
+    api: sk-xxx
+api_keys:
+  - api: sk-1234
+    model:
+      - ai2/*
+      - ai1/*
+```
+In this way, request ai2 first, and if it fails, request ai1.
+- What is the behavior behind various scheduling algorithms? For example, fixed_priority, weighted_round_robin, lottery, random, round_robin?
+All scheduling algorithms need to be enabled by setting api_keys.(api).preferences.SCHEDULING_ALGORITHM in the configuration file to any of the values: fixed_priority, weighted_round_robin, lottery, random, round_robin.
+1. fixed_priority: Fixed priority scheduling. All requests are always executed by the channel of the model that first has a user request. In case of an error, it will switch to the next channel. This is the default scheduling algorithm.
+2. weighted_round_robin: Weighted round-robin load balancing, requests channels with the user's requested model according to the weight order set in the configuration file api_keys.(api).model.
+3. lottery: Draw round-robin load balancing, randomly request the channel of the model with user requests according to the weight set in the configuration file api_keys.(api).model.
+4. round_robin: Round-robin load balancing, requests the channel that owns the model requested by the user according to the configuration order in the configuration file api_keys.(api).model. You can check the previous question on how to set the priority of channels.
 ## ⭐ Star History
 <a href="https://github.com/yym68686/uni-api/stargazers">

README_CN.md CHANGED Viewed

@@ -80,12 +80,18 @@ providers:
   - provider: gemini
     base_url: https://generativelanguage.googleapis.com/v1beta # base_url 支持 v1beta/v1, 仅供 Gemini 模型使用，必填
-    api: AIzaSyAN2k6IRdgw
     model:
       - gemini-1.5-pro
       - gemini-1.5-flash-exp-0827: gemini-1.5-flash # 重命名后，原来的模型名字 gemini-1.5-flash-exp-0827 无法使用，如果要使用原来的名字，可以在 model 中添加原来的名字，只要加上下面一行就可以使用原来的名字了
       - gemini-1.5-flash-exp-0827 # 加上这一行，gemini-1.5-flash-exp-0827 和 gemini-1.5-flash 都可以被请求
     tools: true
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。
@@ -338,6 +344,41 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \
 将 ENABLE_MODERATION 设置为 false 将修复这个问题。当 ENABLE_MODERATION 为 true 时，API 必须能够使用 text-moderation-latest 模型，如果你没有在提供商模型设置里面提供 text-moderation-latest，将会报错找不到模型。
 ## ⭐ Star 历史
 <a href="https://github.com/yym68686/uni-api/stargazers">

   - provider: gemini
     base_url: https://generativelanguage.googleapis.com/v1beta # base_url 支持 v1beta/v1, 仅供 Gemini 模型使用，必填
+    api: # 支持多个 API Key，多个 key 自动开启轮训负载均衡，至少一个 key，必填
+      - AIzaSyAN2k6IRdgw123
+      - AIzaSyAN2k6IRdgw456
+      - AIzaSyAN2k6IRdgw789
     model:
       - gemini-1.5-pro
       - gemini-1.5-flash-exp-0827: gemini-1.5-flash # 重命名后，原来的模型名字 gemini-1.5-flash-exp-0827 无法使用，如果要使用原来的名字，可以在 model 中添加原来的名字，只要加上下面一行就可以使用原来的名字了
       - gemini-1.5-flash-exp-0827 # 加上这一行，gemini-1.5-flash-exp-0827 和 gemini-1.5-flash 都可以被请求
     tools: true
+    preferences:
+      API_KEY_RATE_LIMIT: 15/min # 每个 API Key 每分钟最多请求次数，选填。默认为 999999/min
+      API_KEY_COOLDOWN_PERIOD: 60 # 每个 API Key 遭遇 429 错误后的冷却时间，单位为秒，选填。默认为 60 秒
   - provider: vertex
     project_id: gen-lang-client-xxxxxxxxxxxxxx #    描述： 您的Google Cloud项目ID。格式： 字符串，通常由小写字母、数字和连字符组成。获取方式： 在Google Cloud Console的项目选择器中可以找到您的项目ID。
 将 ENABLE_MODERATION 设置为 false 将修复这个问题。当 ENABLE_MODERATION 为 true 时，API 必须能够使用 text-moderation-latest 模型，如果你没有在提供商模型设置里面提供 text-moderation-latest，将会报错找不到模型。
+- 怎么优先请求某个渠道，怎么设置渠道的优先级？
+直接在api_keys里面通过设置渠道顺序即可。不需要做其他设置，示例配置文件：
+```yaml
+providers:
+  - provider: ai1
+    base_url: https://xxx/v1/chat/completions
+    api: sk-xxx
+  - provider: ai2
+    base_url: https://xxx/v1/chat/completions
+    api: sk-xxx
+api_keys:
+  - api: sk-1234
+    model:
+      - ai2/*
+      - ai1/*
+```
+这样设置则先请求 ai2，失败后请求 ai1。
+- 各种调度算法背后的行为是怎样的？比如 fixed_priority，weighted_round_robin，lottery，random，round_robin？
+所有调度算法需要通过在配置文件的 api_keys.(api).preferences.SCHEDULING_ALGORITHM 设置为 fixed_priority，weighted_round_robin，lottery，random，round_robin 中的任意值来开启。
+1. fixed_priority：固定优先级调度。所有请求永远执行第一个拥有用户请求的模型的渠道。报错时，会切换下一个渠道。这是默认的调度算法。
+2. weighted_round_robin：加权轮训负载均衡，按照配置文件 api_keys.(api).model 设定的权重顺序请求拥有用户请求的模型的渠道。
+3. lottery：抽奖轮训负载均衡，按照配置文件 api_keys.(api).model 设置的权重随机请求拥有用户请求的模型的渠道。
+4. round_robin：轮训负载均衡，按照配置文件 api_keys.(api).model 的配置顺序请求拥有用户请求的模型的渠道。可以查看上一个问题，如何设置渠道的优先级。
 ## ⭐ Star 历史
 <a href="https://github.com/yym68686/uni-api/stargazers">

main.py CHANGED Viewed

@@ -1,6 +1,5 @@
 from log_config import logger
-import re
 import copy
 import httpx
 import secrets
@@ -19,7 +18,18 @@ from fastapi.exceptions import RequestValidationError
 from models import RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, UnifiedRequest, EmbeddingRequest
 from request import get_payload
 from response import fetch_response, fetch_response_stream
-from utils import error_handling_wrapper, post_all_models, load_config, safe_get, circular_list_encoder, get_model_dict, save_api_yaml
 from collections import defaultdict
 from typing import List, Dict, Union
@@ -542,6 +552,7 @@ class ClientManager:
     @asynccontextmanager
     async def get_client(self, timeout_value):
         # 直接获取或创建客户端,不使用锁
         if timeout_value not in self.clients:
             timeout = httpx.Timeout(
                 connect=15.0,
@@ -558,8 +569,10 @@ class ClientManager:
         try:
             yield self.clients[timeout_value]
         except Exception as e:
-            await self.clients[timeout_value].aclose()
-            del self.clients[timeout_value]
             raise e
     async def close(self):
@@ -955,8 +968,13 @@ class ModelRequestHandler:
         auto_retry = safe_get(config, 'api_keys', api_index, "preferences", "AUTO_RETRY", default=True)
         index = 0
         while True:
-            if index >= num_matching_providers:
                 break
             current_index = (start_index + index) % num_matching_providers
             index += 1
@@ -995,6 +1013,10 @@ class ModelRequestHandler:
                     num_matching_providers = len(matching_providers)
                     index = 0
                 logger.error(f"Error {status_code} with provider {channel_id}: {error_message}")
                 if is_debug:
                     import traceback
@@ -1012,59 +1034,6 @@ class ModelRequestHandler:
 model_handler = ModelRequestHandler()
-def parse_rate_limit(limit_string):
-    # 定义时间单位到秒的映射
-    time_units = {
-        's': 1, 'sec': 1, 'second': 1,
-        'm': 60, 'min': 60, 'minute': 60,
-        'h': 3600, 'hr': 3600, 'hour': 3600,
-        'd': 86400, 'day': 86400,
-        'mo': 2592000, 'month': 2592000,
-        'y': 31536000, 'year': 31536000
-    }
-    # 使用正则表达式匹配数字和单位
-    match = re.match(r'^(\d+)/(\w+)$', limit_string)
-    if not match:
-        raise ValueError(f"Invalid rate limit format: {limit_string}")
-    count, unit = match.groups()
-    count = int(count)
-    # 转换单位到秒
-    if unit not in time_units:
-        raise ValueError(f"Unknown time unit: {unit}")
-    seconds = time_units[unit]
-    return (count, seconds)
-class InMemoryRateLimiter:
-    def __init__(self):
-        self.requests = defaultdict(list)
-    async def is_rate_limited(self, key: str, limit: int, period: int) -> bool:
-        now = time()
-        self.requests[key] = [req for req in self.requests[key] if req > now - period]
-        if len(self.requests[key]) >= limit:
-            return True
-        self.requests[key].append(now)
-        return False
-rate_limiter = InMemoryRateLimiter()
-async def get_user_rate_limit(api_index: str = None):
-    # 这里应该实现根据 token 获取用户速率限制的逻辑
-    # 示例： 返回 (次数， 秒数)
-    config = app.state.config
-    raw_rate_limit = safe_get(config, 'api_keys', api_index, "preferences", "RATE_LIMIT")
-    if not api_index or not raw_rate_limit:
-        return (30, 60)
-    rate_limit = parse_rate_limit(raw_rate_limit)
-    return rate_limit
 security = HTTPBearer()
 async def rate_limit_dependency(request: Request, credentials: HTTPAuthorizationCredentials = Depends(security)):
@@ -1076,7 +1045,7 @@ async def rate_limit_dependency(request: Request, credentials: HTTPAuthorization
         print("error: Invalid or missing API Key:", token)
         api_index = None
         token = None
-    limit, period = await get_user_rate_limit(api_index)
     # 使用 IP 地址和 token（如果有）作为限制键
     client_ip = request.client.host

 from log_config import logger
 import copy
 import httpx
 import secrets
 from models import RequestModel, ImageGenerationRequest, AudioTranscriptionRequest, ModerationRequest, UnifiedRequest, EmbeddingRequest
 from request import get_payload
 from response import fetch_response, fetch_response_stream
+from utils import (
+    safe_get,
+    load_config,
+    save_api_yaml,
+    get_model_dict,
+    post_all_models,
+    get_user_rate_limit,
+    circular_list_encoder,
+    error_handling_wrapper,
+    rate_limiter,
+    provider_api_circular_list,
+)
 from collections import defaultdict
 from typing import List, Dict, Union
     @asynccontextmanager
     async def get_client(self, timeout_value):
         # 直接获取或创建客户端,不使用锁
+        timeout_value = int(timeout_value)
         if timeout_value not in self.clients:
             timeout = httpx.Timeout(
                 connect=15.0,
         try:
             yield self.clients[timeout_value]
         except Exception as e:
+            if timeout_value in self.clients:
+                tmp_client = self.clients[timeout_value]
+                del self.clients[timeout_value]  # 先删除引用
+                await tmp_client.aclose()  # 然后关闭客户端
             raise e
     async def close(self):
         auto_retry = safe_get(config, 'api_keys', api_index, "preferences", "AUTO_RETRY", default=True)
         index = 0
+        if num_matching_providers == 1 and (count := provider_api_circular_list[matching_providers[0]['provider']].get_items_count()) > 1:
+            retry_count = count
+        else:
+            retry_count = 0
         while True:
+            if index >= num_matching_providers + retry_count:
                 break
             current_index = (start_index + index) % num_matching_providers
             index += 1
                     num_matching_providers = len(matching_providers)
                     index = 0
+                if status_code == 429:
+                    current_api = await provider_api_circular_list[channel_id].after_next_current()
+                    await provider_api_circular_list[channel_id].set_cooling(current_api, cooldown_period=safe_get(provider, "preferences", "API_KEY_COOLDOWN_PERIOD", default=60))
                 logger.error(f"Error {status_code} with provider {channel_id}: {error_message}")
                 if is_debug:
                     import traceback
 model_handler = ModelRequestHandler()
 security = HTTPBearer()
 async def rate_limit_dependency(request: Request, credentials: HTTPAuthorizationCredentials = Depends(security)):
         print("error: Invalid or missing API Key:", token)
         api_index = None
         token = None
+    limit, period = await get_user_rate_limit(app, api_index)
     # 使用 IP 地址和 token（如果有）作为限制键
     client_ip = request.client.host

utils.py CHANGED Viewed

@@ -3,22 +3,135 @@ from fastapi import HTTPException
 import httpx
 from log_config import logger
 from collections import defaultdict
 import asyncio
 class ThreadSafeCircularList:
-    def __init__(self, items):
         self.items = items
         self.index = 0
         self.lock = asyncio.Lock()
     async def next(self):
         async with self.lock:
-            item = self.items[self.index]
-            self.index = (self.index + 1) % len(self.items)
             return item
 def circular_list_encoder(obj):
     if isinstance(obj, ThreadSafeCircularList):
         return obj.to_dict()
@@ -84,9 +197,15 @@ def update_config(config_data, use_config_url=False):
         provider_api = provider.get('api', None)
         if provider_api:
             if isinstance(provider_api, str):
-                provider_api_circular_list[provider['provider']] = ThreadSafeCircularList([provider_api])
             if isinstance(provider_api, list):
-                provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(provider_api)
         if not provider.get("model"):
             model_list = update_initial_model(provider['base_url'], provider['api'])

 import httpx
 from log_config import logger
+import re
+from time import time
+def parse_rate_limit(limit_string):
+    # 定义时间单位到秒的映射
+    time_units = {
+        's': 1, 'sec': 1, 'second': 1,
+        'm': 60, 'min': 60, 'minute': 60,
+        'h': 3600, 'hr': 3600, 'hour': 3600,
+        'd': 86400, 'day': 86400,
+        'mo': 2592000, 'month': 2592000,
+        'y': 31536000, 'year': 31536000
+    }
+    # 使用正则表达式匹配数字和单位
+    match = re.match(r'^(\d+)/(\w+)$', limit_string)
+    if not match:
+        raise ValueError(f"Invalid rate limit format: {limit_string}")
+    count, unit = match.groups()
+    count = int(count)
+    # 转换单位到秒
+    if unit not in time_units:
+        raise ValueError(f"Unknown time unit: {unit}")
+    seconds = time_units[unit]
+    return (count, seconds)
 from collections import defaultdict
+class InMemoryRateLimiter:
+    def __init__(self):
+        self.requests = defaultdict(list)
+    async def is_rate_limited(self, key: str, limit: int, period: int) -> bool:
+        now = time()
+        self.requests[key] = [req for req in self.requests[key] if req > now - period]
+        if len(self.requests[key]) >= limit:
+            return True
+        self.requests[key].append(now)
+        return False
+rate_limiter = InMemoryRateLimiter()
+async def get_user_rate_limit(app, api_index: str = None):
+    # 这里应该实现根据 token 获取用户速率限制的逻辑
+    # 示例： 返回 (次数， 秒数)
+    config = app.state.config
+    raw_rate_limit = safe_get(config, 'api_keys', api_index, "preferences", "RATE_LIMIT")
+    # print("raw_rate_limit", raw_rate_limit)
+    # print("not api_index or not raw_rate_limit", api_index == None, not raw_rate_limit, api_index == None or not raw_rate_limit, api_index, raw_rate_limit)
+    if api_index == None or not raw_rate_limit:
+        return (30, 60)
+    rate_limit = parse_rate_limit(raw_rate_limit)
+    return rate_limit
 import asyncio
 class ThreadSafeCircularList:
+    def __init__(self, items, rate_limit="99999/min"):
         self.items = items
         self.index = 0
         self.lock = asyncio.Lock()
+        self.requests = defaultdict(list)  # 用于追踪每个 API key 的请求时间
+        self.cooling_until = defaultdict(float)  # 记录每个 item 的冷却结束时间
+        count, period = parse_rate_limit(rate_limit)
+        self.rate_limit = count
+        self.period = period
+    async def set_cooling(self, item: str, cooling_time: int = 60):
+        """设置某个 item 进入冷却状态
+        Args:
+            item: 需要冷却的 item
+            cooling_time: 冷却时间(秒)，默认60秒
+        """
+        now = time()
+        async with self.lock:
+            self.cooling_until[item] = now + cooling_time
+            # 清空该 item 的请求记录
+            self.requests[item] = []
+            logger.warning(f"API key {item} 已进入冷却状态，冷却时间 {cooling_time} 秒")
+    async def is_rate_limited(self, item) -> bool:
+        now = time()
+        # 检查是否在冷却中
+        if now < self.cooling_until[item]:
+            return True
+        self.requests[item] = [req for req in self.requests[item] if req > now - self.period]
+        if len(self.requests[item]) >= self.rate_limit:
+            return True
+        self.requests[item].append(now)
+        return False
     async def next(self):
         async with self.lock:
+            start_index = self.index
+            while True:
+                item = self.items[self.index]
+                self.index = (self.index + 1) % len(self.items)
+                if not await self.is_rate_limited(item):
+                    return item
+                logger.warning(f"API key {item} 已达到速率限制 ({self.rate_limit}/{self.period}秒)")
+                # 如果已经检查了所有的 API key 都被限制
+                if self.index == start_index:
+                    logger.warning(f"所有 API key 都已达到速率限制 ({self.rate_limit}/{self.period}秒)")
+                    return None
+    async def after_next_current(self):
+        # 返回当前取出的 API，因为已经调用了 next，所以当前API应该是上一个
+        async with self.lock:
+            item = self.items[(self.index - 1) % len(self.items)]
             return item
+    def get_items_count(self) -> int:
+        """返回列表中的项目数量
+        Returns:
+            int: items列表的长度
+        """
+        return len(self.items)
 def circular_list_encoder(obj):
     if isinstance(obj, ThreadSafeCircularList):
         return obj.to_dict()
         provider_api = provider.get('api', None)
         if provider_api:
             if isinstance(provider_api, str):
+                provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
+                    [provider_api],
+                    safe_get(provider, "preferences", "API_KEY_RATE_LIMIT", default="999999/min")
+                )
             if isinstance(provider_api, list):
+                provider_api_circular_list[provider['provider']] = ThreadSafeCircularList(
+                    provider_api,
+                    safe_get(provider, "preferences", "API_KEY_RATE_LIMIT", default="999999/min")
+                )
         if not provider.get("model"):
             model_list = update_initial_model(provider['base_url'], provider['api'])