refactor(provider): use LiteLLM as unified LLM requester backend (#2150)

* refactor(provider): use LiteLLM as unified LLM requester backend

  - Replace 23+ individual requester implementations with unified litellmchat.py
  - Add litellm_provider field to 27 YAML manifests for provider routing
  - Delete redundant requester subclasses
  - Add unit tests for LiteLLMRequester (29 tests)
  - Fix num_retries parameter name (was max_retries)
  - Fix exception handling order for subclass exceptions

  LiteLLM provides unified API for 100+ providers, eliminating need for
  provider-specific requesters.

* fix: ruff format provider.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(provider): simplify LiteLLM requester usage handling

  - Remove unused Anthropic-specific tool schema generation
  - Share completion argument construction between normal and streaming calls
  - Use LiteLLM/OpenAI native usage fields for monitoring
  - Collect stream token usage from LiteLLM stream_options
  - Update LiteLLM requester tests for unified usage fields

* restore: restore deleted provider requester files

Restore individual provider requester implementations that were
removed in de61b5d3. These files coexist with the unified
litellmchat.py backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: update requesters and improve provider selection UI

- Added `litellm_provider` field to various requesters' YAML configurations.
- Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai.
- Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons.
- Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience.
- Updated localization files to include search provider text for both English and Chinese.

* fix(provider): align litellm rebase with master

* fix(provider): capture streaming token usage; add token observability

The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).

- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
  endpoint: summary, per-model breakdown, token timeseries, and a
  zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
  per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization

Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.

* refactor(provider): simplify litellm capabilities

* style: simplify wrapped expressions

* feat(models): persist context metadata

* fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester

- invoke_embedding: support both object- and dict-shaped response.data
  entries (OpenAI-compatible gateways like new-api return dicts)
- invoke_rerank: litellm.arerank rejects the 'openai' provider, so for
  openai-compatible (or unspecified) providers call the standard
  Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP
- accept both 'relevance_score' and 'score' fields in rerank results
- add unit tests for the openai-compatible HTTP rerank path

* feat(provider): enforce requester support_type when adding models

- frontend: AddModelPopover only shows model-type tabs (llm/embedding/
  rerank) that the provider's requester declares in its manifest
  support_type; ModelsDialog fetches requester manifests and maps
  requester -> support_type, passed down through ProviderCard
- backend: add _validate_provider_supports guard in create_llm_model /
  create_embedding_model / create_rerank_model so a model cannot be
  attached to a provider whose requester does not support that type,
  even if the frontend restriction is bypassed (manifests without
  support_type are allowed for backward compatibility)
- manifests: correct support_type for providers that do not offer all
  three model types:
  - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai
  - llm + text-embedding: openai, gemini, mistral
  - add rerank to new-api (verified working via /v1/rerank)
  - set llm + text-embedding + rerank for aggregator/unknown gateways

* feat(provider): add searchable alias to requester manifests

- add a free-text 'alias' field to every requester manifest spec,
  containing the vendor's English/Chinese names, pinyin, common
  nicknames and flagship model-series names (e.g. moonshot -> kimi,
  月之暗面; zhipu -> glm, 智谱清言)
- frontend: ProviderForm requester search now also matches against
  alias (substring/contains), so searching 'kimi' surfaces Moonshot,
  '硅基' surfaces SiliconFlow, etc.
- also fix support_type: openrouter (relay) supports embedding+rerank;
  LangBot Space gains rerank (coming soon)

* fix(provider): make support_type guard defensive against incomplete model_mgr

- _validate_provider_supports now uses getattr to gracefully skip when
  model_mgr / provider_dict / manifest lookup is unavailable, instead of
  raising AttributeError (fixes unit tests that mock ap.model_mgr as a
  bare SimpleNamespace)
- add TestValidateProviderSupports covering: allow supported type,
  reject unsupported type, allow when support_type missing, allow when
  provider unknown, degrade safely when model_mgr is incomplete

* fix(persistence): guard 0004 migration against missing llm_models table

The 0004_add_llm_model_context_length migration called
inspector.get_columns('llm_models') unconditionally, raising
NoSuchTableError when the table does not exist (e.g. migrating a
fresh/empty DB, as exercised by the integration tests where
create_all() registers no tables because the ORM models are not
imported). Every other migration guards with a table-existence check
first; add the same guard here for both upgrade and downgrade.

Also restore the test head assertion to 0004 (it had been lowered to
0003 to mask this failure).

* Merge branch 'master' into feat/litellm

Resolve conflicts:
- uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid
  (ours) with openai bump (master).
- Alembic migrations: master added 0004_add_mcp_readme while this
  branch added 0004_add_llm_model_context_length, both as children of
  0003 (would create multiple heads). Re-chain the litellm migration as
  0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme
  for a single linear head. Update test head assertion accordingly.

* fix(persistence): shorten migration revision id to fit varchar(32)

PostgreSQL stores alembic_version.version_num as varchar(32).
'0005_add_llm_model_context_length' (33 chars) overflowed it, raising
StringDataRightTruncationError in the PG migration tests. Rename the
revision (and file) to '0005_add_llm_context_length' (27 chars) and
update the head assertions in both SQLite and PostgreSQL migration
tests.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
This commit is contained in:
huanghuoguoguo
2026-06-13 16:59:48 +08:00
committed by GitHub
parent 7965d333ac
commit 9ecb587ac0
123 changed files with 4098 additions and 4513 deletions

View File

@@ -23,6 +23,7 @@ from langbot.pkg.api.http.service.model import (
RerankModelsService,
_parse_provider_api_keys,
_runtime_model_data,
_validate_provider_supports,
)
from langbot.pkg.entity.persistence.model import LLMModel, EmbeddingModel, RerankModel, ModelProvider
@@ -35,6 +36,7 @@ def _create_mock_llm_model(
name: str = 'Test LLM',
provider_uuid: str = 'provider-uuid',
abilities: list = None,
context_length: int | None = None,
extra_args: dict = None,
) -> Mock:
"""Helper to create mock LLMModel entity."""
@@ -43,6 +45,7 @@ def _create_mock_llm_model(
model.name = name
model.provider_uuid = provider_uuid
model.abilities = abilities or []
model.context_length = context_length
model.extra_args = extra_args or {}
return model
@@ -142,10 +145,12 @@ class TestRuntimeModelData:
'name': 'Model',
'provider_uuid': 'provider',
'abilities': ['vision'],
'context_length': 128000,
'extra_args': {'temp': 0.7},
}
result = _runtime_model_data('uuid', update_payload)
assert result['abilities'] == ['vision']
assert result['context_length'] == 128000
assert result['extra_args'] == {'temp': 0.7}
@@ -188,7 +193,7 @@ class TestLLMModelsServiceGetLLMModels:
ap = SimpleNamespace()
ap.persistence_mgr = SimpleNamespace()
model = _create_mock_llm_model()
model = _create_mock_llm_model(context_length=128000)
provider = _create_mock_provider()
mock_model_result = _create_mock_result([model])
@@ -206,6 +211,7 @@ class TestLLMModelsServiceGetLLMModels:
'uuid': entity.uuid,
'name': entity.name,
'provider_uuid': entity.provider_uuid if hasattr(entity, 'provider_uuid') else None,
'context_length': getattr(entity, 'context_length', None),
'api_keys': entity.api_keys if hasattr(entity, 'api_keys') else None,
}
)
@@ -218,6 +224,7 @@ class TestLLMModelsServiceGetLLMModels:
# Verify
assert len(result) == 1
assert result[0]['name'] == 'Test LLM'
assert result[0]['context_length'] == 128000
async def test_get_llm_models_hide_secret_keys(self):
"""Hides secret API keys when include_secret=False."""
@@ -265,7 +272,7 @@ class TestLLMModelsServiceGetLLMModel:
ap = SimpleNamespace()
ap.persistence_mgr = SimpleNamespace()
model = _create_mock_llm_model(model_uuid='found-uuid')
model = _create_mock_llm_model(model_uuid='found-uuid', context_length=128000)
provider = _create_mock_provider()
mock_model_result = _create_mock_result([], first_item=model)
@@ -279,11 +286,12 @@ class TestLLMModelsServiceGetLLMModel:
ap.persistence_mgr.execute_async = AsyncMock(side_effect=mock_execute)
ap.persistence_mgr.serialize_model = Mock(
return_value={
'uuid': 'found-uuid',
'name': 'Test LLM',
'provider_uuid': 'provider-uuid',
'provider': {'uuid': 'provider-uuid', 'api_keys': ['key']},
side_effect=lambda model_cls, entity: {
'uuid': entity.uuid,
'name': entity.name,
'provider_uuid': getattr(entity, 'provider_uuid', None),
'context_length': getattr(entity, 'context_length', None),
'api_keys': getattr(entity, 'api_keys', None),
}
)
@@ -295,6 +303,7 @@ class TestLLMModelsServiceGetLLMModel:
# Verify
assert result is not None
assert result['uuid'] == 'found-uuid'
assert result['context_length'] == 128000
async def test_get_llm_model_not_found(self):
"""Returns None when model not found."""
@@ -402,6 +411,39 @@ class TestLLMModelsServiceCreateLLMModel:
# Verify
assert model_uuid == 'preserved-uuid'
async def test_create_llm_model_persists_context_length_as_column(self):
"""Creates LLM model with context_length outside extra_args."""
ap = SimpleNamespace()
ap.persistence_mgr = SimpleNamespace()
ap.model_mgr = SimpleNamespace()
ap.model_mgr.provider_dict = {'provider-uuid': Mock()}
ap.model_mgr.llm_models = []
ap.model_mgr.load_llm_model_with_provider = AsyncMock(return_value=Mock())
ap.pipeline_service = SimpleNamespace(update_pipeline=AsyncMock())
mock_result = _create_mock_result([])
ap.persistence_mgr.execute_async = AsyncMock(return_value=mock_result)
service = LLMModelsService(ap)
await service.create_llm_model(
{
'uuid': 'model-with-context',
'name': 'Context Model',
'provider_uuid': 'provider-uuid',
'abilities': ['func_call'],
'context_length': 128000,
'extra_args': {'temperature': 0.2},
},
preserve_uuid=True,
auto_set_to_default_pipeline=False,
)
runtime_entity = ap.model_mgr.load_llm_model_with_provider.await_args.args[0]
assert runtime_entity.context_length == 128000
assert runtime_entity.extra_args == {'temperature': 0.2}
assert 'context_length' not in runtime_entity.extra_args
async def test_create_llm_model_provider_not_found_raises_error(self):
"""Raises Exception when provider not found in runtime."""
# Setup
@@ -512,6 +554,35 @@ class TestLLMModelsServiceUpdateLLMModel:
'provider_uuid': 'nonexistent-provider',
})
async def test_update_llm_model_reloads_context_length_as_column(self):
"""Updates runtime model with context_length outside extra_args."""
ap = SimpleNamespace()
ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
ap.model_mgr = SimpleNamespace()
ap.model_mgr.provider_dict = {'provider-uuid': Mock()}
ap.model_mgr.llm_models = []
ap.model_mgr.remove_llm_model = AsyncMock()
ap.model_mgr.load_llm_model_with_provider = AsyncMock(return_value=Mock())
service = LLMModelsService(ap)
await service.update_llm_model(
'existing-uuid',
{
'name': 'Updated Name',
'provider_uuid': 'provider-uuid',
'abilities': ['vision'],
'context_length': 64000,
'extra_args': {'temperature': 0.4},
},
)
runtime_entity = ap.model_mgr.load_llm_model_with_provider.await_args.args[0]
assert runtime_entity.uuid == 'existing-uuid'
assert runtime_entity.context_length == 64000
assert runtime_entity.extra_args == {'temperature': 0.4}
assert 'context_length' not in runtime_entity.extra_args
class TestLLMModelsServiceDeleteLLMModel:
"""Tests for LLMModelsService.delete_llm_model method."""
@@ -961,4 +1032,56 @@ class TestRerankModelsServiceGetRerankModelsByProvider:
result = await service.get_rerank_models_by_provider('provider-uuid')
# Verify
assert len(result) == 2
assert len(result) == 2
class TestValidateProviderSupports:
"""Tests for _validate_provider_supports guard."""
@staticmethod
def _make_ap(requester_name: str, support_type):
"""Build a fake ap whose model_mgr resolves a manifest with support_type."""
manifest = SimpleNamespace(spec={'support_type': support_type})
runtime_provider = SimpleNamespace(
provider_entity=SimpleNamespace(requester=requester_name)
)
model_mgr = SimpleNamespace(
provider_dict={'p1': runtime_provider},
get_available_requester_manifest_by_name=lambda name: manifest
if name == requester_name
else None,
)
return SimpleNamespace(model_mgr=model_mgr)
async def test_allows_supported_type(self):
ap = self._make_ap('cohere-rerank', ['rerank'])
# Should not raise
await _validate_provider_supports(ap, 'p1', 'rerank')
async def test_rejects_unsupported_type(self):
ap = self._make_ap('cohere-rerank', ['rerank'])
with pytest.raises(ValueError, match='does not support llm'):
await _validate_provider_supports(ap, 'p1', 'llm')
async def test_allows_when_support_type_missing(self):
# Manifest without support_type must not block (backward compatible)
manifest = SimpleNamespace(spec={})
runtime_provider = SimpleNamespace(
provider_entity=SimpleNamespace(requester='legacy')
)
model_mgr = SimpleNamespace(
provider_dict={'p1': runtime_provider},
get_available_requester_manifest_by_name=lambda name: manifest,
)
ap = SimpleNamespace(model_mgr=model_mgr)
await _validate_provider_supports(ap, 'p1', 'rerank')
async def test_allows_when_provider_unknown(self):
ap = self._make_ap('cohere-rerank', ['rerank'])
# Unknown provider uuid -> no entry -> no block
await _validate_provider_supports(ap, 'missing', 'llm')
async def test_degrades_when_model_mgr_incomplete(self):
# A bare ap without a usable model_mgr must not raise (defensive)
ap = SimpleNamespace(model_mgr=SimpleNamespace())
await _validate_provider_supports(ap, 'p1', 'llm')