Files
LangBot/tests/unit_tests/provider/test_model_service.py
huanghuoguoguo 9ecb587ac0 refactor(provider): use LiteLLM as unified LLM requester backend (#2150)
* refactor(provider): use LiteLLM as unified LLM requester backend

  - Replace 23+ individual requester implementations with unified litellmchat.py
  - Add litellm_provider field to 27 YAML manifests for provider routing
  - Delete redundant requester subclasses
  - Add unit tests for LiteLLMRequester (29 tests)
  - Fix num_retries parameter name (was max_retries)
  - Fix exception handling order for subclass exceptions

  LiteLLM provides unified API for 100+ providers, eliminating need for
  provider-specific requesters.

* fix: ruff format provider.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(provider): simplify LiteLLM requester usage handling

  - Remove unused Anthropic-specific tool schema generation
  - Share completion argument construction between normal and streaming calls
  - Use LiteLLM/OpenAI native usage fields for monitoring
  - Collect stream token usage from LiteLLM stream_options
  - Update LiteLLM requester tests for unified usage fields

* restore: restore deleted provider requester files

Restore individual provider requester implementations that were
removed in de61b5d3. These files coexist with the unified
litellmchat.py backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: update requesters and improve provider selection UI

- Added `litellm_provider` field to various requesters' YAML configurations.
- Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai.
- Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons.
- Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience.
- Updated localization files to include search provider text for both English and Chinese.

* fix(provider): align litellm rebase with master

* fix(provider): capture streaming token usage; add token observability

The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).

- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
  endpoint: summary, per-model breakdown, token timeseries, and a
  zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
  per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization

Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.

* refactor(provider): simplify litellm capabilities

* style: simplify wrapped expressions

* feat(models): persist context metadata

* fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester

- invoke_embedding: support both object- and dict-shaped response.data
  entries (OpenAI-compatible gateways like new-api return dicts)
- invoke_rerank: litellm.arerank rejects the 'openai' provider, so for
  openai-compatible (or unspecified) providers call the standard
  Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP
- accept both 'relevance_score' and 'score' fields in rerank results
- add unit tests for the openai-compatible HTTP rerank path

* feat(provider): enforce requester support_type when adding models

- frontend: AddModelPopover only shows model-type tabs (llm/embedding/
  rerank) that the provider's requester declares in its manifest
  support_type; ModelsDialog fetches requester manifests and maps
  requester -> support_type, passed down through ProviderCard
- backend: add _validate_provider_supports guard in create_llm_model /
  create_embedding_model / create_rerank_model so a model cannot be
  attached to a provider whose requester does not support that type,
  even if the frontend restriction is bypassed (manifests without
  support_type are allowed for backward compatibility)
- manifests: correct support_type for providers that do not offer all
  three model types:
  - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai
  - llm + text-embedding: openai, gemini, mistral
  - add rerank to new-api (verified working via /v1/rerank)
  - set llm + text-embedding + rerank for aggregator/unknown gateways

* feat(provider): add searchable alias to requester manifests

- add a free-text 'alias' field to every requester manifest spec,
  containing the vendor's English/Chinese names, pinyin, common
  nicknames and flagship model-series names (e.g. moonshot -> kimi,
  月之暗面; zhipu -> glm, 智谱清言)
- frontend: ProviderForm requester search now also matches against
  alias (substring/contains), so searching 'kimi' surfaces Moonshot,
  '硅基' surfaces SiliconFlow, etc.
- also fix support_type: openrouter (relay) supports embedding+rerank;
  LangBot Space gains rerank (coming soon)

* fix(provider): make support_type guard defensive against incomplete model_mgr

- _validate_provider_supports now uses getattr to gracefully skip when
  model_mgr / provider_dict / manifest lookup is unavailable, instead of
  raising AttributeError (fixes unit tests that mock ap.model_mgr as a
  bare SimpleNamespace)
- add TestValidateProviderSupports covering: allow supported type,
  reject unsupported type, allow when support_type missing, allow when
  provider unknown, degrade safely when model_mgr is incomplete

* fix(persistence): guard 0004 migration against missing llm_models table

The 0004_add_llm_model_context_length migration called
inspector.get_columns('llm_models') unconditionally, raising
NoSuchTableError when the table does not exist (e.g. migrating a
fresh/empty DB, as exercised by the integration tests where
create_all() registers no tables because the ORM models are not
imported). Every other migration guards with a table-existence check
first; add the same guard here for both upgrade and downgrade.

Also restore the test head assertion to 0004 (it had been lowered to
0003 to mask this failure).

* Merge branch 'master' into feat/litellm

Resolve conflicts:
- uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid
  (ours) with openai bump (master).
- Alembic migrations: master added 0004_add_mcp_readme while this
  branch added 0004_add_llm_model_context_length, both as children of
  0003 (would create multiple heads). Re-chain the litellm migration as
  0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme
  for a single linear head. Update test head assertion accordingly.

* fix(persistence): shorten migration revision id to fit varchar(32)

PostgreSQL stores alembic_version.version_num as varchar(32).
'0005_add_llm_model_context_length' (33 chars) overflowed it, raising
StringDataRightTruncationError in the PG migration tests. Rename the
revision (and file) to '0005_add_llm_context_length' (27 chars) and
update the head assertions in both SQLite and PostgreSQL migration
tests.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
2026-06-13 16:59:48 +08:00

205 lines
7.2 KiB
Python

from __future__ import annotations
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
import langbot_plugin.api.entities.builtin.platform.entities as platform_entities
import langbot_plugin.api.entities.builtin.platform.events as platform_events
import langbot_plugin.api.entities.builtin.platform.message as platform_message
import langbot_plugin.api.entities.builtin.provider.session as provider_session
from langbot.pkg.api.http.service.model import _runtime_model_data
from langbot.pkg.api.http.service.provider import ModelProviderService
from langbot.pkg.entity.persistence import model as persistence_model
from langbot.pkg.pipeline.preproc.preproc import PreProcessor
from langbot.pkg.provider.modelmgr import requester
from langbot.pkg.provider.modelmgr.modelmgr import ModelManager
from langbot.pkg.provider.modelmgr.token import TokenManager
from langbot.pkg.provider.runners.localagent import LocalAgentRunner
def test_runtime_llm_model_data_preserves_uuid_after_update_payload_uuid_removed():
update_payload = {
'name': 'Qwen3.5-27B',
'provider_uuid': 'provider-uuid',
'abilities': [],
'extra_args': {},
}
runtime_entity = persistence_model.LLMModel(**_runtime_model_data('model-uuid', update_payload))
assert runtime_entity.uuid == 'model-uuid'
assert runtime_entity.name == 'Qwen3.5-27B'
def test_runtime_embedding_model_data_preserves_uuid_after_update_payload_uuid_removed():
update_payload = {
'name': 'embedding-model',
'provider_uuid': 'provider-uuid',
'extra_args': {},
}
runtime_entity = persistence_model.EmbeddingModel(**_runtime_model_data('embedding-uuid', update_payload))
assert runtime_entity.uuid == 'embedding-uuid'
assert runtime_entity.name == 'embedding-model'
def test_runtime_rerank_model_data_preserves_uuid_after_update_payload_uuid_removed():
update_payload = {
'name': 'rerank-model',
'provider_uuid': 'provider-uuid',
'extra_args': {},
}
runtime_entity = persistence_model.RerankModel(**_runtime_model_data('rerank-uuid', update_payload))
assert runtime_entity.uuid == 'rerank-uuid'
assert runtime_entity.name == 'rerank-model'
def test_normalize_space_provider_api_keys_filters_blank_values():
assert ModelProviderService._normalize_api_keys('space-key') == ['space-key']
assert ModelProviderService._normalize_api_keys(' trimmed-key ') == ['trimmed-key']
assert ModelProviderService._normalize_api_keys('') == []
assert ModelProviderService._normalize_api_keys(' ') == []
assert ModelProviderService._normalize_api_keys(None) == []
assert ModelProviderService._normalize_api_keys([' first-key ', '', 'first-key', 'second-key']) == [
'first-key',
'second-key',
]
def test_token_manager_filters_blank_and_duplicate_tokens():
token_mgr = TokenManager('provider-uuid', [' first-key ', '', 'first-key', 'second-key', ' '])
assert token_mgr.tokens == ['first-key', 'second-key']
assert token_mgr.get_token() == 'first-key'
def test_token_manager_next_token_ignores_empty_token_list():
token_mgr = TokenManager('provider-uuid', [])
token_mgr.next_token()
assert token_mgr.get_token() == ''
assert token_mgr.using_token_index == 0
@pytest.mark.asyncio
async def test_updated_llm_model_is_immediately_usable_by_local_agent_pipeline():
from langbot.pkg.api.http.service.model import LLMModelsService
model_uuid = 'qwen-model-uuid'
provider_uuid = 'ollama-provider-uuid'
ap = SimpleNamespace()
ap.logger = Mock()
ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
ap.tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
ap.skill_mgr = None # PreProcessor only uses skill_mgr for the local-agent skill-binding branch
ap.plugin_connector = SimpleNamespace(
emit_event=AsyncMock(return_value=SimpleNamespace(event=SimpleNamespace(default_prompt=[], prompt=[])))
)
ap.model_mgr = ModelManager(ap)
runtime_provider = Mock()
ap.model_mgr.provider_dict = {provider_uuid: runtime_provider}
ap.model_mgr.llm_models = [
requester.RuntimeLLMModel(
model_entity=persistence_model.LLMModel(
uuid=model_uuid,
name='old-qwen-name',
provider_uuid=provider_uuid,
abilities=[],
extra_args={},
),
provider=runtime_provider,
)
]
await LLMModelsService(ap).update_llm_model(
model_uuid,
{
'name': 'Qwen3.5-27B',
'provider_uuid': provider_uuid,
'abilities': [],
'extra_args': {},
},
)
runtime_model = await ap.model_mgr.get_model_by_uuid(model_uuid)
assert runtime_model.model_entity.uuid == model_uuid
assert runtime_model.model_entity.name == 'Qwen3.5-27B'
session = SimpleNamespace(
launcher_type=provider_session.LauncherTypes.PERSON,
launcher_id=12345,
)
conversation = SimpleNamespace(
uuid='conversation-uuid',
create_time=None,
update_time=None,
prompt=SimpleNamespace(messages=[], copy=Mock(return_value=SimpleNamespace(messages=[]))),
messages=[],
)
ap.sess_mgr = SimpleNamespace(
get_session=AsyncMock(return_value=session),
get_conversation=AsyncMock(return_value=conversation),
)
message_chain = platform_message.MessageChain([platform_message.Plain(text='hello')])
sender = platform_entities.Friend(id=12345, nickname='Tester', remark=None)
message_event = platform_events.FriendMessage(
type='FriendMessage',
sender=sender,
message_chain=message_chain,
time=1710000000,
)
pipeline_config = {
'ai': {
'runner': {'runner': 'local-agent'},
'local-agent': {
'model': {'primary': model_uuid, 'fallbacks': []},
'prompt': [],
'knowledge-bases': [],
},
},
'trigger': {'misc': {'combine-quote-message': False}},
'output': {'misc': {'remove-think': False}},
}
query = pipeline_query.Query.model_construct(
query_id='query-id',
launcher_type=provider_session.LauncherTypes.PERSON,
launcher_id=12345,
sender_id=12345,
message_chain=message_chain,
message_event=message_event,
adapter=AsyncMock(),
pipeline_uuid='pipeline-uuid',
bot_uuid='bot-uuid',
pipeline_config=pipeline_config,
session=None,
prompt=None,
messages=[],
user_message=None,
use_funcs=[],
use_llm_model_uuid=None,
variables={},
resp_messages=[],
resp_message_chain=None,
current_stage_name=None,
)
result = await PreProcessor(ap).process(query, 'PreProcessor')
processed_query = result.new_query
assert processed_query.use_llm_model_uuid == model_uuid
runner = SimpleNamespace(ap=ap, pipeline_config=pipeline_config)
candidates = await LocalAgentRunner._get_model_candidates(runner, processed_query)
assert [model.model_entity.uuid for model in candidates] == [model_uuid]