refactor(provider): use LiteLLM as unified LLM requester backend (#2150)

* refactor(provider): use LiteLLM as unified LLM requester backend - Replace 23+ individual requester implementations with unified litellmchat.py - Add litellm_provider field to 27 YAML manifests for provider routing - Delete redundant requester subclasses - Add unit tests for LiteLLMRequester (29 tests) - Fix num_retries parameter name (was max_retries) - Fix exception handling order for subclass exceptions LiteLLM provides unified API for 100+ providers, eliminating need for provider-specific requesters. * fix: ruff format provider.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(provider): simplify LiteLLM requester usage handling - Remove unused Anthropic-specific tool schema generation - Share completion argument construction between normal and streaming calls - Use LiteLLM/OpenAI native usage fields for monitoring - Collect stream token usage from LiteLLM stream_options - Update LiteLLM requester tests for unified usage fields * restore: restore deleted provider requester files Restore individual provider requester implementations that were removed in de61b5d3. These files coexist with the unified litellmchat.py backend. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat: update requesters and improve provider selection UI - Added `litellm_provider` field to various requesters' YAML configurations. - Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai. - Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons. - Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience. - Updated localization files to include search provider text for both English and Chinese. * fix(provider): align litellm rebase with master * fix(provider): capture streaming token usage; add token observability The LiteLLM streaming requester only captured usage when a chunk had an empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and providers send the final usage payload in a chunk that still carries an empty-delta choice, so streamed calls always recorded 0 tokens in the monitoring logs/dashboard (non-streaming worked). - Capture stream usage whenever a chunk carries it, regardless of choices - Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens) - Register litellm in bootutils/deps.py (was in pyproject only) - Add MonitoringService.get_token_statistics + /monitoring/token-statistics endpoint: summary, per-model breakdown, token timeseries, and a zero-token-success data-quality signal - Add TokenMonitoring dashboard tab (summary tiles, stacked token chart, per-model table) + i18n (en/zh) - Regression tests for stream usage capture and usage normalization Verified end-to-end against a real OpenAI-compatible endpoint with gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both streaming and non-streaming paths. * refactor(provider): simplify litellm capabilities * style: simplify wrapped expressions * feat(models): persist context metadata * fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester - invoke_embedding: support both object- and dict-shaped response.data entries (OpenAI-compatible gateways like new-api return dicts) - invoke_rerank: litellm.arerank rejects the 'openai' provider, so for openai-compatible (or unspecified) providers call the standard Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP - accept both 'relevance_score' and 'score' fields in rerank results - add unit tests for the openai-compatible HTTP rerank path * feat(provider): enforce requester support_type when adding models - frontend: AddModelPopover only shows model-type tabs (llm/embedding/ rerank) that the provider's requester declares in its manifest support_type; ModelsDialog fetches requester manifests and maps requester -> support_type, passed down through ProviderCard - backend: add _validate_provider_supports guard in create_llm_model / create_embedding_model / create_rerank_model so a model cannot be attached to a provider whose requester does not support that type, even if the frontend restriction is bypassed (manifests without support_type are allowed for backward compatibility) - manifests: correct support_type for providers that do not offer all three model types: - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai - llm + text-embedding: openai, gemini, mistral - add rerank to new-api (verified working via /v1/rerank) - set llm + text-embedding + rerank for aggregator/unknown gateways * feat(provider): add searchable alias to requester manifests - add a free-text 'alias' field to every requester manifest spec, containing the vendor's English/Chinese names, pinyin, common nicknames and flagship model-series names (e.g. moonshot -> kimi, 月之暗面; zhipu -> glm, 智谱清言) - frontend: ProviderForm requester search now also matches against alias (substring/contains), so searching 'kimi' surfaces Moonshot, '硅基' surfaces SiliconFlow, etc. - also fix support_type: openrouter (relay) supports embedding+rerank; LangBot Space gains rerank (coming soon) * fix(provider): make support_type guard defensive against incomplete model_mgr - _validate_provider_supports now uses getattr to gracefully skip when model_mgr / provider_dict / manifest lookup is unavailable, instead of raising AttributeError (fixes unit tests that mock ap.model_mgr as a bare SimpleNamespace) - add TestValidateProviderSupports covering: allow supported type, reject unsupported type, allow when support_type missing, allow when provider unknown, degrade safely when model_mgr is incomplete * fix(persistence): guard 0004 migration against missing llm_models table The 0004_add_llm_model_context_length migration called inspector.get_columns('llm_models') unconditionally, raising NoSuchTableError when the table does not exist (e.g. migrating a fresh/empty DB, as exercised by the integration tests where create_all() registers no tables because the ORM models are not imported). Every other migration guards with a table-existence check first; add the same guard here for both upgrade and downgrade. Also restore the test head assertion to 0004 (it had been lowered to 0003 to mask this failure). * Merge branch 'master' into feat/litellm Resolve conflicts: - uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid (ours) with openai bump (master). - Alembic migrations: master added 0004_add_mcp_readme while this branch added 0004_add_llm_model_context_length, both as children of 0003 (would create multiple heads). Re-chain the litellm migration as 0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme for a single linear head. Update test head assertion accordingly. * fix(persistence): shorten migration revision id to fit varchar(32) PostgreSQL stores alembic_version.version_num as varchar(32). '0005_add_llm_model_context_length' (33 chars) overflowed it, raising StringDataRightTruncationError in the PG migration tests. Rename the revision (and file) to '0005_add_llm_context_length' (27 chars) and update the head assertions in both SQLite and PostgreSQL migration tests. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: fdc310 <2213070223@qq.com> Co-authored-by: RockChinQ <rockchinq@gmail.com>
2026-06-14 01:36:03 +00:00 · 2026-06-13 16:59:48 +08:00
parent 7965d333ac
commit 9ecb587ac0
123 changed files with 4098 additions and 4513 deletions
--- a/tests/unit_tests/api/service/test_model_service.py
+++ b/tests/unit_tests/api/service/test_model_service.py
@@ -23,6 +23,7 @@ from langbot.pkg.api.http.service.model import (
    RerankModelsService,
    _parse_provider_api_keys,
    _runtime_model_data,
+    _validate_provider_supports,
 )
 from langbot.pkg.entity.persistence.model import LLMModel, EmbeddingModel, RerankModel, ModelProvider

@@ -35,6 +36,7 @@ def _create_mock_llm_model(
    name: str = 'Test LLM',
    provider_uuid: str = 'provider-uuid',
    abilities: list = None,
+    context_length: int | None = None,
    extra_args: dict = None,
 ) -> Mock:
    """Helper to create mock LLMModel entity."""
@@ -43,6 +45,7 @@ def _create_mock_llm_model(
    model.name = name
    model.provider_uuid = provider_uuid
    model.abilities = abilities or []
+    model.context_length = context_length
    model.extra_args = extra_args or {}
    return model

@@ -142,10 +145,12 @@ class TestRuntimeModelData:
            'name': 'Model',
            'provider_uuid': 'provider',
            'abilities': ['vision'],
+            'context_length': 128000,
            'extra_args': {'temp': 0.7},
        }
        result = _runtime_model_data('uuid', update_payload)
        assert result['abilities'] == ['vision']
+        assert result['context_length'] == 128000
        assert result['extra_args'] == {'temp': 0.7}


@@ -188,7 +193,7 @@ class TestLLMModelsServiceGetLLMModels:
        ap = SimpleNamespace()
        ap.persistence_mgr = SimpleNamespace()

-        model = _create_mock_llm_model()
+        model = _create_mock_llm_model(context_length=128000)
        provider = _create_mock_provider()

        mock_model_result = _create_mock_result([model])
@@ -206,6 +211,7 @@ class TestLLMModelsServiceGetLLMModels:
                'uuid': entity.uuid,
                'name': entity.name,
                'provider_uuid': entity.provider_uuid if hasattr(entity, 'provider_uuid') else None,
+                'context_length': getattr(entity, 'context_length', None),
                'api_keys': entity.api_keys if hasattr(entity, 'api_keys') else None,
            }
        )
@@ -218,6 +224,7 @@ class TestLLMModelsServiceGetLLMModels:
        # Verify
        assert len(result) == 1
        assert result[0]['name'] == 'Test LLM'
+        assert result[0]['context_length'] == 128000

    async def test_get_llm_models_hide_secret_keys(self):
        """Hides secret API keys when include_secret=False."""
@@ -265,7 +272,7 @@ class TestLLMModelsServiceGetLLMModel:
        ap = SimpleNamespace()
        ap.persistence_mgr = SimpleNamespace()

-        model = _create_mock_llm_model(model_uuid='found-uuid')
+        model = _create_mock_llm_model(model_uuid='found-uuid', context_length=128000)
        provider = _create_mock_provider()

        mock_model_result = _create_mock_result([], first_item=model)
@@ -279,11 +286,12 @@ class TestLLMModelsServiceGetLLMModel:

        ap.persistence_mgr.execute_async = AsyncMock(side_effect=mock_execute)
        ap.persistence_mgr.serialize_model = Mock(
-            return_value={
-                'uuid': 'found-uuid',
-                'name': 'Test LLM',
-                'provider_uuid': 'provider-uuid',
-                'provider': {'uuid': 'provider-uuid', 'api_keys': ['key']},
+            side_effect=lambda model_cls, entity: {
+                'uuid': entity.uuid,
+                'name': entity.name,
+                'provider_uuid': getattr(entity, 'provider_uuid', None),
+                'context_length': getattr(entity, 'context_length', None),
+                'api_keys': getattr(entity, 'api_keys', None),
            }
        )

@@ -295,6 +303,7 @@ class TestLLMModelsServiceGetLLMModel:
        # Verify
        assert result is not None
        assert result['uuid'] == 'found-uuid'
+        assert result['context_length'] == 128000

    async def test_get_llm_model_not_found(self):
        """Returns None when model not found."""
@@ -402,6 +411,39 @@ class TestLLMModelsServiceCreateLLMModel:
        # Verify
        assert model_uuid == 'preserved-uuid'

+    async def test_create_llm_model_persists_context_length_as_column(self):
+        """Creates LLM model with context_length outside extra_args."""
+        ap = SimpleNamespace()
+        ap.persistence_mgr = SimpleNamespace()
+        ap.model_mgr = SimpleNamespace()
+        ap.model_mgr.provider_dict = {'provider-uuid': Mock()}
+        ap.model_mgr.llm_models = []
+        ap.model_mgr.load_llm_model_with_provider = AsyncMock(return_value=Mock())
+        ap.pipeline_service = SimpleNamespace(update_pipeline=AsyncMock())
+
+        mock_result = _create_mock_result([])
+        ap.persistence_mgr.execute_async = AsyncMock(return_value=mock_result)
+
+        service = LLMModelsService(ap)
+
+        await service.create_llm_model(
+            {
+                'uuid': 'model-with-context',
+                'name': 'Context Model',
+                'provider_uuid': 'provider-uuid',
+                'abilities': ['func_call'],
+                'context_length': 128000,
+                'extra_args': {'temperature': 0.2},
+            },
+            preserve_uuid=True,
+            auto_set_to_default_pipeline=False,
+        )
+
+        runtime_entity = ap.model_mgr.load_llm_model_with_provider.await_args.args[0]
+        assert runtime_entity.context_length == 128000
+        assert runtime_entity.extra_args == {'temperature': 0.2}
+        assert 'context_length' not in runtime_entity.extra_args
+
    async def test_create_llm_model_provider_not_found_raises_error(self):
        """Raises Exception when provider not found in runtime."""
        # Setup
@@ -512,6 +554,35 @@ class TestLLMModelsServiceUpdateLLMModel:
                'provider_uuid': 'nonexistent-provider',
            })

+    async def test_update_llm_model_reloads_context_length_as_column(self):
+        """Updates runtime model with context_length outside extra_args."""
+        ap = SimpleNamespace()
+        ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
+        ap.model_mgr = SimpleNamespace()
+        ap.model_mgr.provider_dict = {'provider-uuid': Mock()}
+        ap.model_mgr.llm_models = []
+        ap.model_mgr.remove_llm_model = AsyncMock()
+        ap.model_mgr.load_llm_model_with_provider = AsyncMock(return_value=Mock())
+
+        service = LLMModelsService(ap)
+
+        await service.update_llm_model(
+            'existing-uuid',
+            {
+                'name': 'Updated Name',
+                'provider_uuid': 'provider-uuid',
+                'abilities': ['vision'],
+                'context_length': 64000,
+                'extra_args': {'temperature': 0.4},
+            },
+        )
+
+        runtime_entity = ap.model_mgr.load_llm_model_with_provider.await_args.args[0]
+        assert runtime_entity.uuid == 'existing-uuid'
+        assert runtime_entity.context_length == 64000
+        assert runtime_entity.extra_args == {'temperature': 0.4}
+        assert 'context_length' not in runtime_entity.extra_args
+

 class TestLLMModelsServiceDeleteLLMModel:
    """Tests for LLMModelsService.delete_llm_model method."""
@@ -961,4 +1032,56 @@ class TestRerankModelsServiceGetRerankModelsByProvider:
        result = await service.get_rerank_models_by_provider('provider-uuid')

        # Verify
-        assert len(result) == 2
+        assert len(result) == 2
+
+
+class TestValidateProviderSupports:
+    """Tests for _validate_provider_supports guard."""
+
+    @staticmethod
+    def _make_ap(requester_name: str, support_type):
+        """Build a fake ap whose model_mgr resolves a manifest with support_type."""
+        manifest = SimpleNamespace(spec={'support_type': support_type})
+        runtime_provider = SimpleNamespace(
+            provider_entity=SimpleNamespace(requester=requester_name)
+        )
+        model_mgr = SimpleNamespace(
+            provider_dict={'p1': runtime_provider},
+            get_available_requester_manifest_by_name=lambda name: manifest
+            if name == requester_name
+            else None,
+        )
+        return SimpleNamespace(model_mgr=model_mgr)
+
+    async def test_allows_supported_type(self):
+        ap = self._make_ap('cohere-rerank', ['rerank'])
+        # Should not raise
+        await _validate_provider_supports(ap, 'p1', 'rerank')
+
+    async def test_rejects_unsupported_type(self):
+        ap = self._make_ap('cohere-rerank', ['rerank'])
+        with pytest.raises(ValueError, match='does not support llm'):
+            await _validate_provider_supports(ap, 'p1', 'llm')
+
+    async def test_allows_when_support_type_missing(self):
+        # Manifest without support_type must not block (backward compatible)
+        manifest = SimpleNamespace(spec={})
+        runtime_provider = SimpleNamespace(
+            provider_entity=SimpleNamespace(requester='legacy')
+        )
+        model_mgr = SimpleNamespace(
+            provider_dict={'p1': runtime_provider},
+            get_available_requester_manifest_by_name=lambda name: manifest,
+        )
+        ap = SimpleNamespace(model_mgr=model_mgr)
+        await _validate_provider_supports(ap, 'p1', 'rerank')
+
+    async def test_allows_when_provider_unknown(self):
+        ap = self._make_ap('cohere-rerank', ['rerank'])
+        # Unknown provider uuid -> no entry -> no block
+        await _validate_provider_supports(ap, 'missing', 'llm')
+
+    async def test_degrades_when_model_mgr_incomplete(self):
+        # A bare ap without a usable model_mgr must not raise (defensive)
+        ap = SimpleNamespace(model_mgr=SimpleNamespace())
+        await _validate_provider_supports(ap, 'p1', 'llm')