refactor(provider): use LiteLLM as unified LLM requester backend (#2150)

* refactor(provider): use LiteLLM as unified LLM requester backend - Replace 23+ individual requester implementations with unified litellmchat.py - Add litellm_provider field to 27 YAML manifests for provider routing - Delete redundant requester subclasses - Add unit tests for LiteLLMRequester (29 tests) - Fix num_retries parameter name (was max_retries) - Fix exception handling order for subclass exceptions LiteLLM provides unified API for 100+ providers, eliminating need for provider-specific requesters. * fix: ruff format provider.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(provider): simplify LiteLLM requester usage handling - Remove unused Anthropic-specific tool schema generation - Share completion argument construction between normal and streaming calls - Use LiteLLM/OpenAI native usage fields for monitoring - Collect stream token usage from LiteLLM stream_options - Update LiteLLM requester tests for unified usage fields * restore: restore deleted provider requester files Restore individual provider requester implementations that were removed in de61b5d3. These files coexist with the unified litellmchat.py backend. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat: update requesters and improve provider selection UI - Added `litellm_provider` field to various requesters' YAML configurations. - Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai. - Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons. - Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience. - Updated localization files to include search provider text for both English and Chinese. * fix(provider): align litellm rebase with master * fix(provider): capture streaming token usage; add token observability The LiteLLM streaming requester only captured usage when a chunk had an empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and providers send the final usage payload in a chunk that still carries an empty-delta choice, so streamed calls always recorded 0 tokens in the monitoring logs/dashboard (non-streaming worked). - Capture stream usage whenever a chunk carries it, regardless of choices - Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens) - Register litellm in bootutils/deps.py (was in pyproject only) - Add MonitoringService.get_token_statistics + /monitoring/token-statistics endpoint: summary, per-model breakdown, token timeseries, and a zero-token-success data-quality signal - Add TokenMonitoring dashboard tab (summary tiles, stacked token chart, per-model table) + i18n (en/zh) - Regression tests for stream usage capture and usage normalization Verified end-to-end against a real OpenAI-compatible endpoint with gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both streaming and non-streaming paths. * refactor(provider): simplify litellm capabilities * style: simplify wrapped expressions * feat(models): persist context metadata * fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester - invoke_embedding: support both object- and dict-shaped response.data entries (OpenAI-compatible gateways like new-api return dicts) - invoke_rerank: litellm.arerank rejects the 'openai' provider, so for openai-compatible (or unspecified) providers call the standard Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP - accept both 'relevance_score' and 'score' fields in rerank results - add unit tests for the openai-compatible HTTP rerank path * feat(provider): enforce requester support_type when adding models - frontend: AddModelPopover only shows model-type tabs (llm/embedding/ rerank) that the provider's requester declares in its manifest support_type; ModelsDialog fetches requester manifests and maps requester -> support_type, passed down through ProviderCard - backend: add _validate_provider_supports guard in create_llm_model / create_embedding_model / create_rerank_model so a model cannot be attached to a provider whose requester does not support that type, even if the frontend restriction is bypassed (manifests without support_type are allowed for backward compatibility) - manifests: correct support_type for providers that do not offer all three model types: - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai - llm + text-embedding: openai, gemini, mistral - add rerank to new-api (verified working via /v1/rerank) - set llm + text-embedding + rerank for aggregator/unknown gateways * feat(provider): add searchable alias to requester manifests - add a free-text 'alias' field to every requester manifest spec, containing the vendor's English/Chinese names, pinyin, common nicknames and flagship model-series names (e.g. moonshot -> kimi, 月之暗面; zhipu -> glm, 智谱清言) - frontend: ProviderForm requester search now also matches against alias (substring/contains), so searching 'kimi' surfaces Moonshot, '硅基' surfaces SiliconFlow, etc. - also fix support_type: openrouter (relay) supports embedding+rerank; LangBot Space gains rerank (coming soon) * fix(provider): make support_type guard defensive against incomplete model_mgr - _validate_provider_supports now uses getattr to gracefully skip when model_mgr / provider_dict / manifest lookup is unavailable, instead of raising AttributeError (fixes unit tests that mock ap.model_mgr as a bare SimpleNamespace) - add TestValidateProviderSupports covering: allow supported type, reject unsupported type, allow when support_type missing, allow when provider unknown, degrade safely when model_mgr is incomplete * fix(persistence): guard 0004 migration against missing llm_models table The 0004_add_llm_model_context_length migration called inspector.get_columns('llm_models') unconditionally, raising NoSuchTableError when the table does not exist (e.g. migrating a fresh/empty DB, as exercised by the integration tests where create_all() registers no tables because the ORM models are not imported). Every other migration guards with a table-existence check first; add the same guard here for both upgrade and downgrade. Also restore the test head assertion to 0004 (it had been lowered to 0003 to mask this failure). * Merge branch 'master' into feat/litellm Resolve conflicts: - uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid (ours) with openai bump (master). - Alembic migrations: master added 0004_add_mcp_readme while this branch added 0004_add_llm_model_context_length, both as children of 0003 (would create multiple heads). Re-chain the litellm migration as 0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme for a single linear head. Update test head assertion accordingly. * fix(persistence): shorten migration revision id to fit varchar(32) PostgreSQL stores alembic_version.version_num as varchar(32). '0005_add_llm_model_context_length' (33 chars) overflowed it, raising StringDataRightTruncationError in the PG migration tests. Rename the revision (and file) to '0005_add_llm_context_length' (27 chars) and update the head assertions in both SQLite and PostgreSQL migration tests. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: fdc310 <2213070223@qq.com> Co-authored-by: RockChinQ <rockchinq@gmail.com>
2026-06-18 19:44:21 +00:00 · 2026-06-13 16:59:48 +08:00
parent 7965d333ac
commit 9ecb587ac0
123 changed files with 4098 additions and 4513 deletions
@@ -201,6 +201,9 @@ const enUS = {
    selectModelAbilities: 'Select model abilities',
    visionAbility: 'Vision Ability',
    functionCallAbility: 'Function Call',
+    contextLength: 'Context Window',
+    contextLengthPlaceholder: 'Unknown',
+    contextLengthInvalid: 'Context window must be a positive integer',
    extraParameters: 'Extra Parameters',
    addParameter: 'Add Parameter',
    keyName: 'Key Name',
@@ -258,6 +261,7 @@ const enUS = {
    selectProvider: 'Select Provider',
    requester: 'Provider Type',
    selectRequester: 'Select Provider Type',
+    searchProviders: 'Search providers...',
    langbotModelsDescription: 'Cloud models powered by LangBot Space',
    credits: 'Credits',
    loginWithSpace: 'Login with Space',
@@ -1201,6 +1205,7 @@ const enUS = {
      llmCalls: 'LLM Calls',
      embeddingCalls: 'Embedding Calls',
      modelCalls: 'Model Calls',
+      tokens: 'Token Monitoring',
      feedback: 'User Feedback',
      sessions: 'Session Analysis',
      errors: 'Error Logs',
@@ -1239,6 +1244,30 @@ const enUS = {
      avgDuration: 'Avg Duration',
      calls: 'Calls',
    },
+    tokens: {
+      totalTokens: 'Total Tokens',
+      inputTokens: 'Input Tokens',
+      outputTokens: 'Output Tokens',
+      avgPerCall: 'Avg / Call',
+      throughput: 'Throughput',
+      tokensPerSec: 'tokens/sec',
+      errorCalls: 'Failed Calls',
+      acrossCalls: 'across {{count}} calls',
+      ofTotal: 'of {{count}} total',
+      usageOverTime: 'Token Usage Over Time',
+      byModel: 'By Model',
+      model: 'Model',
+      calls: 'Calls',
+      avgLatency: 'Avg Latency',
+      noData: 'No token usage in the selected time range',
+      loadError: 'Failed to load token statistics: {{error}}',
+      zeroTokenWarning:
+        '{{count}} successful call(s) reported zero token usage. This usually means the upstream provider did not return usage info — check the model provider configuration.',
+      bucket: {
+        hour: 'Hourly',
+        day: 'Daily',
+      },
+    },
    embeddingCalls: {
      title: 'Embedding Calls',
      model: 'Model',
@@ -206,6 +206,9 @@ const esES = {
    selectModelAbilities: 'Seleccionar capacidades del modelo',
    visionAbility: 'Capacidad de visión',
    functionCallAbility: 'Llamada a funciones',
+    contextLength: 'Ventana de contexto',
+    contextLengthPlaceholder: 'Desconocido',
+    contextLengthInvalid: 'La ventana de contexto debe ser un entero positivo',
    extraParameters: 'Parámetros adicionales',
    addParameter: 'Añadir parámetro',
    keyName: 'Nombre de la clave',
@@ -204,6 +204,10 @@ const jaJP = {
    selectModelAbilities: 'モデル機能を選択',
    visionAbility: '視覚機能',
    functionCallAbility: '関数呼び出し',
+    contextLength: 'コンテキストウィンドウ',
+    contextLengthPlaceholder: '不明',
+    contextLengthInvalid:
+      'コンテキストウィンドウは正の整数である必要があります',
    extraParameters: '追加パラメータ',
    addParameter: 'パラメータを追加',
    keyName: 'キー名',
@@ -203,6 +203,10 @@ const ruRU = {
    selectModelAbilities: 'Выберите возможности модели',
    visionAbility: 'Распознавание изображений',
    functionCallAbility: 'Вызов функций',
+    contextLength: 'Контекстное окно',
+    contextLengthPlaceholder: 'Неизвестно',
+    contextLengthInvalid:
+      'Контекстное окно должно быть положительным целым числом',
    extraParameters: 'Дополнительные параметры',
    addParameter: 'Добавить параметр',
    keyName: 'Имя ключа',
@@ -199,6 +199,9 @@ const thTH = {
    selectModelAbilities: 'เลือกความสามารถของโมเดล',
    visionAbility: 'ความสามารถด้านภาพ',
    functionCallAbility: 'การเรียกฟังก์ชัน',
+    contextLength: 'หน้าต่างบริบท',
+    contextLengthPlaceholder: 'ไม่ทราบ',
+    contextLengthInvalid: 'หน้าต่างบริบทต้องเป็นจำนวนเต็มบวก',
    extraParameters: 'พารามิเตอร์เพิ่มเติม',
    addParameter: 'เพิ่มพารามิเตอร์',
    keyName: 'ชื่อคีย์',
@@ -203,6 +203,9 @@ const viVN = {
    selectModelAbilities: 'Chọn khả năng mô hình',
    visionAbility: 'Khả năng thị giác',
    functionCallAbility: 'Gọi hàm',
+    contextLength: 'Cửa sổ ngữ cảnh',
+    contextLengthPlaceholder: 'Không rõ',
+    contextLengthInvalid: 'Cửa sổ ngữ cảnh phải là số nguyên dương',
    extraParameters: 'Tham số bổ sung',
    addParameter: 'Thêm tham số',
    keyName: 'Tên khóa',
@@ -193,6 +193,9 @@ const zhHans = {
    selectModelAbilities: '选择模型能力',
    visionAbility: '视觉能力',
    functionCallAbility: '函数调用',
+    contextLength: '上下文窗口',
+    contextLengthPlaceholder: '未知',
+    contextLengthInvalid: '上下文窗口必须是正整数',
    extraParameters: '额外参数',
    addParameter: '添加参数',
    keyName: '键名',
@@ -248,6 +251,7 @@ const zhHans = {
    selectProvider: '选择供应商',
    requester: '供应商类型',
    selectRequester: '选择供应商类型',
+    searchProviders: '搜索供应商...',
    langbotModelsDescription: 'LangBot Space 提供的云端模型',
    credits: '积分',
    loginWithSpace: '通过 Space 登录',
@@ -1144,6 +1148,7 @@ const zhHans = {
      llmCalls: 'LLM调用',
      embeddingCalls: 'Embedding调用',
      modelCalls: '模型调用',
+      tokens: 'Token 监控',
      feedback: '用户反馈',
      sessions: '会话分析',
      errors: '错误日志',
@@ -1182,6 +1187,30 @@ const zhHans = {
      avgDuration: '平均耗时',
      calls: '调用次数',
    },
+    tokens: {
+      totalTokens: '总 Token 数',
+      inputTokens: '输入 Token',
+      outputTokens: '输出 Token',
+      avgPerCall: '平均每次调用',
+      throughput: '吞吐量',
+      tokensPerSec: 'Token/秒',
+      errorCalls: '失败调用',
+      acrossCalls: '共 {{count}} 次调用',
+      ofTotal: '共 {{count}} 次',
+      usageOverTime: 'Token 用量趋势',
+      byModel: '按模型统计',
+      model: '模型',
+      calls: '调用次数',
+      avgLatency: '平均延迟',
+      noData: '所选时间范围内暂无 Token 用量数据',
+      loadError: '加载 Token 统计失败：{{error}}',
+      zeroTokenWarning:
+        '检测到 {{count}} 次成功调用未上报 Token 用量（记为 0）。这通常表示上游未返回 usage 信息，请检查模型供应商配置。',
+      bucket: {
+        hour: '按小时',
+        day: '按天',
+      },
+    },
    embeddingCalls: {
      title: 'Embedding调用',
      model: '模型',
@@ -193,6 +193,9 @@ const zhHant = {
    selectModelAbilities: '選擇模型能力',
    visionAbility: '視覺能力',
    functionCallAbility: '函數呼叫',
+    contextLength: '上下文視窗',
+    contextLengthPlaceholder: '未知',
+    contextLengthInvalid: '上下文視窗必須是正整數',
    extraParameters: '額外參數',
    addParameter: '新增參數',
    keyName: '鍵名',