Files
LangBot/tests/unit_tests/provider/test_tool_manager.py
huanghuoguoguo 9ecb587ac0 refactor(provider): use LiteLLM as unified LLM requester backend (#2150)
* refactor(provider): use LiteLLM as unified LLM requester backend

  - Replace 23+ individual requester implementations with unified litellmchat.py
  - Add litellm_provider field to 27 YAML manifests for provider routing
  - Delete redundant requester subclasses
  - Add unit tests for LiteLLMRequester (29 tests)
  - Fix num_retries parameter name (was max_retries)
  - Fix exception handling order for subclass exceptions

  LiteLLM provides unified API for 100+ providers, eliminating need for
  provider-specific requesters.

* fix: ruff format provider.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(provider): simplify LiteLLM requester usage handling

  - Remove unused Anthropic-specific tool schema generation
  - Share completion argument construction between normal and streaming calls
  - Use LiteLLM/OpenAI native usage fields for monitoring
  - Collect stream token usage from LiteLLM stream_options
  - Update LiteLLM requester tests for unified usage fields

* restore: restore deleted provider requester files

Restore individual provider requester implementations that were
removed in de61b5d3. These files coexist with the unified
litellmchat.py backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: update requesters and improve provider selection UI

- Added `litellm_provider` field to various requesters' YAML configurations.
- Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai.
- Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons.
- Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience.
- Updated localization files to include search provider text for both English and Chinese.

* fix(provider): align litellm rebase with master

* fix(provider): capture streaming token usage; add token observability

The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).

- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
  endpoint: summary, per-model breakdown, token timeseries, and a
  zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
  per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization

Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.

* refactor(provider): simplify litellm capabilities

* style: simplify wrapped expressions

* feat(models): persist context metadata

* fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester

- invoke_embedding: support both object- and dict-shaped response.data
  entries (OpenAI-compatible gateways like new-api return dicts)
- invoke_rerank: litellm.arerank rejects the 'openai' provider, so for
  openai-compatible (or unspecified) providers call the standard
  Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP
- accept both 'relevance_score' and 'score' fields in rerank results
- add unit tests for the openai-compatible HTTP rerank path

* feat(provider): enforce requester support_type when adding models

- frontend: AddModelPopover only shows model-type tabs (llm/embedding/
  rerank) that the provider's requester declares in its manifest
  support_type; ModelsDialog fetches requester manifests and maps
  requester -> support_type, passed down through ProviderCard
- backend: add _validate_provider_supports guard in create_llm_model /
  create_embedding_model / create_rerank_model so a model cannot be
  attached to a provider whose requester does not support that type,
  even if the frontend restriction is bypassed (manifests without
  support_type are allowed for backward compatibility)
- manifests: correct support_type for providers that do not offer all
  three model types:
  - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai
  - llm + text-embedding: openai, gemini, mistral
  - add rerank to new-api (verified working via /v1/rerank)
  - set llm + text-embedding + rerank for aggregator/unknown gateways

* feat(provider): add searchable alias to requester manifests

- add a free-text 'alias' field to every requester manifest spec,
  containing the vendor's English/Chinese names, pinyin, common
  nicknames and flagship model-series names (e.g. moonshot -> kimi,
  月之暗面; zhipu -> glm, 智谱清言)
- frontend: ProviderForm requester search now also matches against
  alias (substring/contains), so searching 'kimi' surfaces Moonshot,
  '硅基' surfaces SiliconFlow, etc.
- also fix support_type: openrouter (relay) supports embedding+rerank;
  LangBot Space gains rerank (coming soon)

* fix(provider): make support_type guard defensive against incomplete model_mgr

- _validate_provider_supports now uses getattr to gracefully skip when
  model_mgr / provider_dict / manifest lookup is unavailable, instead of
  raising AttributeError (fixes unit tests that mock ap.model_mgr as a
  bare SimpleNamespace)
- add TestValidateProviderSupports covering: allow supported type,
  reject unsupported type, allow when support_type missing, allow when
  provider unknown, degrade safely when model_mgr is incomplete

* fix(persistence): guard 0004 migration against missing llm_models table

The 0004_add_llm_model_context_length migration called
inspector.get_columns('llm_models') unconditionally, raising
NoSuchTableError when the table does not exist (e.g. migrating a
fresh/empty DB, as exercised by the integration tests where
create_all() registers no tables because the ORM models are not
imported). Every other migration guards with a table-existence check
first; add the same guard here for both upgrade and downgrade.

Also restore the test head assertion to 0004 (it had been lowered to
0003 to mask this failure).

* Merge branch 'master' into feat/litellm

Resolve conflicts:
- uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid
  (ours) with openai bump (master).
- Alembic migrations: master added 0004_add_mcp_readme while this
  branch added 0004_add_llm_model_context_length, both as children of
  0003 (would create multiple heads). Re-chain the litellm migration as
  0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme
  for a single linear head. Update test head assertion accordingly.

* fix(persistence): shorten migration revision id to fit varchar(32)

PostgreSQL stores alembic_version.version_num as varchar(32).
'0005_add_llm_model_context_length' (33 chars) overflowed it, raising
StringDataRightTruncationError in the PG migration tests. Rename the
revision (and file) to '0005_add_llm_context_length' (27 chars) and
update the head assertions in both SQLite and PostgreSQL migration
tests.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
2026-06-13 16:59:48 +08:00

294 lines
11 KiB
Python

"""Unit tests for ToolManager.
Tests cover:
- Tool schema generation for OpenAI/LiteLLM
- Tool execution dispatch
"""
from __future__ import annotations
import pytest
from unittest.mock import Mock, AsyncMock
from importlib import import_module
import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
def get_toolmgr_module():
"""Lazy import to avoid circular import issues."""
return import_module('langbot.pkg.provider.tools.toolmgr')
class TestToolManagerInit:
"""Tests for ToolManager initialization."""
def test_init_stores_app_reference(self):
"""Test that __init__ stores the Application reference."""
toolmgr = get_toolmgr_module()
mock_app = Mock()
manager = toolmgr.ToolManager(mock_app)
assert manager.ap is mock_app
def test_init_no_tool_loaders(self):
"""Test that tool loaders are not initialized before initialize()."""
toolmgr = get_toolmgr_module()
mock_app = Mock()
manager = toolmgr.ToolManager(mock_app)
assert hasattr(manager, 'plugin_tool_loader') is False or manager.plugin_tool_loader is None
class TestToolManagerSchemaGeneration:
"""Tests for tool schema generation methods."""
@pytest.fixture
def mock_app(self):
"""Create mock app."""
mock_app = Mock()
mock_app.logger = Mock()
return mock_app
@pytest.fixture
def sample_tools(self):
"""Create sample LLMTool list for testing."""
def dummy_weather_func(**kwargs):
return 'weather result'
def dummy_calc_func(**kwargs):
return 'calc result'
tools = [
resource_tool.LLMTool(
name='get_weather',
human_desc='Get current weather for a location',
description='Get current weather for a location',
parameters={
'type': 'object',
'properties': {'location': {'type': 'string', 'description': 'City name'}},
'required': ['location'],
},
func=dummy_weather_func,
),
resource_tool.LLMTool(
name='calculate',
human_desc='Perform a calculation',
description='Perform a calculation',
parameters={
'type': 'object',
'properties': {'expression': {'type': 'string', 'description': 'Math expression'}},
'required': ['expression'],
},
func=dummy_calc_func,
),
]
return tools
@pytest.mark.asyncio
async def test_generate_tools_for_openai(self, mock_app, sample_tools):
"""Test that generate_tools_for_openai produces correct schema."""
toolmgr = get_toolmgr_module()
manager = toolmgr.ToolManager(mock_app)
result = await manager.generate_tools_for_openai(sample_tools)
assert len(result) == 2
# Verify first tool schema
tool1 = result[0]
assert tool1['type'] == 'function'
assert tool1['function']['name'] == 'get_weather'
assert tool1['function']['description'] == 'Get current weather for a location'
assert 'parameters' in tool1['function']
assert tool1['function']['parameters']['type'] == 'object'
# Verify second tool schema
tool2 = result[1]
assert tool2['type'] == 'function'
assert tool2['function']['name'] == 'calculate'
@pytest.mark.asyncio
async def test_generate_tools_empty_list(self, mock_app):
"""Test that generating tools from empty list returns empty list."""
toolmgr = get_toolmgr_module()
manager = toolmgr.ToolManager(mock_app)
openai_result = await manager.generate_tools_for_openai([])
assert openai_result == []
@pytest.mark.asyncio
async def test_openai_schema_fields_complete(self, mock_app, sample_tools):
"""Test that OpenAI schema includes all required fields."""
toolmgr = get_toolmgr_module()
manager = toolmgr.ToolManager(mock_app)
result = await manager.generate_tools_for_openai(sample_tools)
for tool_schema in result:
assert 'type' in tool_schema
assert tool_schema['type'] == 'function'
assert 'function' in tool_schema
func = tool_schema['function']
assert 'name' in func
assert 'description' in func
assert 'parameters' in func
class TestToolManagerExecuteFuncCall:
"""Tests for execute_func_call method."""
@pytest.fixture
def mock_app_with_loaders(self):
"""Create mock app with mock tool loaders.
Returns (app, plugin_loader, mcp_loader). The native and skill loaders
are attached directly to the app for tests that don't need to assert
against them — they all default to ``has_tool == False`` so the
execute_func_call probe falls through to the plugin/mcp pair.
"""
mock_app = Mock()
mock_app.logger = Mock()
def _make_inert_loader():
loader = Mock()
loader.has_tool = AsyncMock(return_value=False)
loader.invoke_tool = AsyncMock(return_value=None)
loader.initialize = AsyncMock()
loader.shutdown = AsyncMock()
return loader
# Create mock plugin loader
mock_plugin_loader = _make_inert_loader()
mock_plugin_loader.invoke_tool = AsyncMock(return_value='plugin_result')
# Create mock MCP loader
mock_mcp_loader = _make_inert_loader()
mock_mcp_loader.invoke_tool = AsyncMock(return_value='mcp_result')
# Stash inert native/skill loaders so the ToolManager probe order
# (native → plugin → mcp → skill) doesn't AttributeError. Tests that
# need to override these can replace the attributes on the manager.
mock_app._inert_native_loader = _make_inert_loader()
mock_app._inert_skill_loader = _make_inert_loader()
return mock_app, mock_plugin_loader, mock_mcp_loader
@staticmethod
def _wire_loaders(manager, mock_app, plugin_loader, mcp_loader):
"""Attach all four loaders (native + plugin + mcp + skill) to manager."""
manager.native_tool_loader = mock_app._inert_native_loader
manager.plugin_tool_loader = plugin_loader
manager.mcp_tool_loader = mcp_loader
manager.skill_tool_loader = mock_app._inert_skill_loader
@pytest.fixture
def sample_query(self):
"""Create sample query for testing."""
query = Mock(spec=pipeline_query.Query)
return query
@pytest.mark.asyncio
async def test_execute_calls_plugin_loader_when_has_tool(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call uses plugin loader when tool exists there."""
toolmgr = get_toolmgr_module()
mock_app, mock_plugin_loader, mock_mcp_loader = mock_app_with_loaders
mock_plugin_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
assert result == 'plugin_result'
mock_plugin_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
# MCP loader should not be called
mock_mcp_loader.invoke_tool.assert_not_called()
@pytest.mark.asyncio
async def test_execute_calls_mcp_loader_when_plugin_not_found(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call uses MCP loader when plugin doesn't have tool."""
toolmgr = get_toolmgr_module()
mock_app, mock_plugin_loader, mock_mcp_loader = mock_app_with_loaders
mock_plugin_loader.has_tool = AsyncMock(return_value=False)
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
assert result == 'mcp_result'
mock_mcp_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
@pytest.mark.asyncio
async def test_execute_raises_when_tool_not_found(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call raises ValueError when tool not found."""
toolmgr = get_toolmgr_module()
mock_app, mock_plugin_loader, mock_mcp_loader = mock_app_with_loaders
mock_plugin_loader.has_tool = AsyncMock(return_value=False)
mock_mcp_loader.has_tool = AsyncMock(return_value=False)
manager = toolmgr.ToolManager(mock_app)
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
with pytest.raises(ValueError, match='未找到工具'):
await manager.execute_func_call('unknown_tool', {}, sample_query)
@pytest.mark.asyncio
async def test_plugin_loader_checked_first(self, mock_app_with_loaders, sample_query):
"""Test that plugin loader is checked before MCP loader."""
toolmgr = get_toolmgr_module()
mock_app, mock_plugin_loader, mock_mcp_loader = mock_app_with_loaders
# Both loaders have the tool, but plugin should be used
mock_plugin_loader.has_tool = AsyncMock(return_value=True)
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
await manager.execute_func_call('test_tool', {}, sample_query)
# Plugin loader should be invoked, MCP should not
mock_plugin_loader.invoke_tool.assert_called_once()
mock_mcp_loader.invoke_tool.assert_not_called()
class TestToolManagerShutdown:
"""Tests for shutdown method."""
@pytest.mark.asyncio
async def test_shutdown_calls_loader_shutdown(self):
"""Test that shutdown calls shutdown on every registered loader."""
toolmgr = get_toolmgr_module()
mock_app = Mock()
def _make_loader():
loader = Mock()
loader.shutdown = AsyncMock()
return loader
mock_native_loader = _make_loader()
mock_plugin_loader = _make_loader()
mock_mcp_loader = _make_loader()
mock_skill_loader = _make_loader()
manager = toolmgr.ToolManager(mock_app)
manager.native_tool_loader = mock_native_loader
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
manager.skill_tool_loader = mock_skill_loader
await manager.shutdown()
mock_native_loader.shutdown.assert_called_once()
mock_plugin_loader.shutdown.assert_called_once()
mock_mcp_loader.shutdown.assert_called_once()
mock_skill_loader.shutdown.assert_called_once()