Files
LangBot/src/langbot/pkg/api/http/controller/groups/monitoring.py
huanghuoguoguo 9ecb587ac0 refactor(provider): use LiteLLM as unified LLM requester backend (#2150)
* refactor(provider): use LiteLLM as unified LLM requester backend

  - Replace 23+ individual requester implementations with unified litellmchat.py
  - Add litellm_provider field to 27 YAML manifests for provider routing
  - Delete redundant requester subclasses
  - Add unit tests for LiteLLMRequester (29 tests)
  - Fix num_retries parameter name (was max_retries)
  - Fix exception handling order for subclass exceptions

  LiteLLM provides unified API for 100+ providers, eliminating need for
  provider-specific requesters.

* fix: ruff format provider.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(provider): simplify LiteLLM requester usage handling

  - Remove unused Anthropic-specific tool schema generation
  - Share completion argument construction between normal and streaming calls
  - Use LiteLLM/OpenAI native usage fields for monitoring
  - Collect stream token usage from LiteLLM stream_options
  - Update LiteLLM requester tests for unified usage fields

* restore: restore deleted provider requester files

Restore individual provider requester implementations that were
removed in de61b5d3. These files coexist with the unified
litellmchat.py backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: update requesters and improve provider selection UI

- Added `litellm_provider` field to various requesters' YAML configurations.
- Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai.
- Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons.
- Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience.
- Updated localization files to include search provider text for both English and Chinese.

* fix(provider): align litellm rebase with master

* fix(provider): capture streaming token usage; add token observability

The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).

- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
  endpoint: summary, per-model breakdown, token timeseries, and a
  zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
  per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization

Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.

* refactor(provider): simplify litellm capabilities

* style: simplify wrapped expressions

* feat(models): persist context metadata

* fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester

- invoke_embedding: support both object- and dict-shaped response.data
  entries (OpenAI-compatible gateways like new-api return dicts)
- invoke_rerank: litellm.arerank rejects the 'openai' provider, so for
  openai-compatible (or unspecified) providers call the standard
  Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP
- accept both 'relevance_score' and 'score' fields in rerank results
- add unit tests for the openai-compatible HTTP rerank path

* feat(provider): enforce requester support_type when adding models

- frontend: AddModelPopover only shows model-type tabs (llm/embedding/
  rerank) that the provider's requester declares in its manifest
  support_type; ModelsDialog fetches requester manifests and maps
  requester -> support_type, passed down through ProviderCard
- backend: add _validate_provider_supports guard in create_llm_model /
  create_embedding_model / create_rerank_model so a model cannot be
  attached to a provider whose requester does not support that type,
  even if the frontend restriction is bypassed (manifests without
  support_type are allowed for backward compatibility)
- manifests: correct support_type for providers that do not offer all
  three model types:
  - llm only: anthropic, deepseek, groq, moonshot, openrouter, xai
  - llm + text-embedding: openai, gemini, mistral
  - add rerank to new-api (verified working via /v1/rerank)
  - set llm + text-embedding + rerank for aggregator/unknown gateways

* feat(provider): add searchable alias to requester manifests

- add a free-text 'alias' field to every requester manifest spec,
  containing the vendor's English/Chinese names, pinyin, common
  nicknames and flagship model-series names (e.g. moonshot -> kimi,
  月之暗面; zhipu -> glm, 智谱清言)
- frontend: ProviderForm requester search now also matches against
  alias (substring/contains), so searching 'kimi' surfaces Moonshot,
  '硅基' surfaces SiliconFlow, etc.
- also fix support_type: openrouter (relay) supports embedding+rerank;
  LangBot Space gains rerank (coming soon)

* fix(provider): make support_type guard defensive against incomplete model_mgr

- _validate_provider_supports now uses getattr to gracefully skip when
  model_mgr / provider_dict / manifest lookup is unavailable, instead of
  raising AttributeError (fixes unit tests that mock ap.model_mgr as a
  bare SimpleNamespace)
- add TestValidateProviderSupports covering: allow supported type,
  reject unsupported type, allow when support_type missing, allow when
  provider unknown, degrade safely when model_mgr is incomplete

* fix(persistence): guard 0004 migration against missing llm_models table

The 0004_add_llm_model_context_length migration called
inspector.get_columns('llm_models') unconditionally, raising
NoSuchTableError when the table does not exist (e.g. migrating a
fresh/empty DB, as exercised by the integration tests where
create_all() registers no tables because the ORM models are not
imported). Every other migration guards with a table-existence check
first; add the same guard here for both upgrade and downgrade.

Also restore the test head assertion to 0004 (it had been lowered to
0003 to mask this failure).

* Merge branch 'master' into feat/litellm

Resolve conflicts:
- uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid
  (ours) with openai bump (master).
- Alembic migrations: master added 0004_add_mcp_readme while this
  branch added 0004_add_llm_model_context_length, both as children of
  0003 (would create multiple heads). Re-chain the litellm migration as
  0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme
  for a single linear head. Update test head assertion accordingly.

* fix(persistence): shorten migration revision id to fit varchar(32)

PostgreSQL stores alembic_version.version_num as varchar(32).
'0005_add_llm_model_context_length' (33 chars) overflowed it, raising
StringDataRightTruncationError in the PG migration tests. Rename the
revision (and file) to '0005_add_llm_context_length' (27 chars) and
update the head assertions in both SQLite and PostgreSQL migration
tests.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
2026-06-13 16:59:48 +08:00

598 lines
24 KiB
Python

from __future__ import annotations
import datetime
import quart
from .. import group
def parse_iso_datetime(datetime_str: str | None) -> datetime.datetime | None:
"""Parse ISO 8601 datetime string, handling 'Z' suffix for UTC timezone"""
if not datetime_str:
return None
# Replace 'Z' with '+00:00' for Python 3.10 compatibility
if datetime_str.endswith('Z'):
datetime_str = datetime_str[:-1] + '+00:00'
dt = datetime.datetime.fromisoformat(datetime_str)
# Convert to UTC and remove timezone info to match database storage (which stores UTC as naive datetime)
if dt.tzinfo is not None:
# Convert to UTC and remove timezone info
dt = dt.astimezone(datetime.timezone.utc).replace(tzinfo=None)
return dt
@group.group_class('monitoring', '/api/v1/monitoring')
class MonitoringRouterGroup(group.RouterGroup):
async def initialize(self) -> None:
@self.route('/overview', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_overview() -> str:
"""Get overview metrics"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
metrics = await self.ap.monitoring_service.get_overview_metrics(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
)
return self.success(data=metrics)
@self.route('/token-statistics', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_token_statistics() -> str:
"""Get detailed token usage statistics (summary, per-model, timeseries)."""
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
bucket = quart.request.args.get('bucket', 'hour')
if bucket not in ('hour', 'day'):
bucket = 'hour'
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
stats = await self.ap.monitoring_service.get_token_statistics(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
bucket=bucket,
)
return self.success(data=stats)
@self.route('/messages', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_messages() -> str:
"""Get message logs"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
session_ids = quart.request.args.getlist('sessionId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
messages, total = await self.ap.monitoring_service.get_messages(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
session_ids=session_ids if session_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=offset,
)
return self.success(
data={
'messages': messages,
'total': total,
'limit': limit,
'offset': offset,
}
)
@self.route('/llm-calls', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_llm_calls() -> str:
"""Get LLM call records"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
llm_calls, total = await self.ap.monitoring_service.get_llm_calls(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=offset,
)
return self.success(
data={
'llm_calls': llm_calls,
'total': total,
'limit': limit,
'offset': offset,
}
)
@self.route('/embedding-calls', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_embedding_calls() -> str:
"""Get embedding call records"""
# Parse query parameters
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
knowledge_base_id = quart.request.args.get('knowledgeBaseId')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
embedding_calls, total = await self.ap.monitoring_service.get_embedding_calls(
start_time=start_time,
end_time=end_time,
knowledge_base_id=knowledge_base_id if knowledge_base_id else None,
limit=limit,
offset=offset,
)
return self.success(
data={
'embedding_calls': embedding_calls,
'total': total,
'limit': limit,
'offset': offset,
}
)
@self.route('/sessions', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_sessions() -> str:
"""Get session information"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
is_active_str = quart.request.args.get('isActive')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
# Parse is_active
is_active = None
if is_active_str:
is_active = is_active_str.lower() == 'true'
sessions, total = await self.ap.monitoring_service.get_sessions(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
is_active=is_active,
limit=limit,
offset=offset,
)
return self.success(
data={
'sessions': sessions,
'total': total,
'limit': limit,
'offset': offset,
}
)
@self.route('/errors', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_errors() -> str:
"""Get error logs"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
errors, total = await self.ap.monitoring_service.get_errors(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=offset,
)
return self.success(
data={
'errors': errors,
'total': total,
'limit': limit,
'offset': offset,
}
)
@self.route('/data', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_all_data() -> str:
"""Get all monitoring data in a single request"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 50))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
# Get overview metrics
overview = await self.ap.monitoring_service.get_overview_metrics(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
)
# Get messages
messages, messages_total = await self.ap.monitoring_service.get_messages(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=0,
)
# Get LLM calls
llm_calls, llm_calls_total = await self.ap.monitoring_service.get_llm_calls(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=0,
)
# Get sessions
sessions, sessions_total = await self.ap.monitoring_service.get_sessions(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
is_active=None,
limit=limit,
offset=0,
)
# Get errors
errors, errors_total = await self.ap.monitoring_service.get_errors(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=0,
)
# Get embedding calls
embedding_calls, embedding_calls_total = await self.ap.monitoring_service.get_embedding_calls(
start_time=start_time,
end_time=end_time,
limit=limit,
offset=0,
)
return self.success(
data={
'overview': overview,
'messages': messages,
'llmCalls': llm_calls,
'embeddingCalls': embedding_calls,
'sessions': sessions,
'errors': errors,
'totalCount': {
'messages': messages_total,
'llmCalls': llm_calls_total,
'embeddingCalls': embedding_calls_total,
'sessions': sessions_total,
'errors': errors_total,
},
}
)
@self.route('/sessions/<session_id>/analysis', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_session_analysis(session_id: str) -> str:
"""Get detailed analysis for a specific session"""
analysis = await self.ap.monitoring_service.get_session_analysis(session_id)
# Always return success with the analysis data
# The frontend will handle the 'found: false' case
return self.success(data=analysis)
@self.route('/messages/<message_id>/details', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_message_details(message_id: str) -> str:
"""Get detailed information for a specific message"""
details = await self.ap.monitoring_service.get_message_details(message_id)
if not details.get('found'):
return self.error(message=f'Message {message_id} not found', code=404)
return self.success(data=details)
@self.route('/export', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def export_data() -> tuple[str, int]:
"""Export monitoring data as CSV"""
# Parse query parameters
export_type = quart.request.args.get('type', 'messages')
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 100000))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
# Get data based on export type
if export_type == 'messages':
data = await self.ap.monitoring_service.export_messages(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'id',
'timestamp',
'bot_id',
'bot_name',
'pipeline_id',
'pipeline_name',
'runner_name',
'message_content',
'message_text',
'session_id',
'status',
'level',
'platform',
'user_id',
]
elif export_type == 'llm-calls':
data = await self.ap.monitoring_service.export_llm_calls(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'id',
'timestamp',
'model_name',
'input_tokens',
'output_tokens',
'total_tokens',
'duration_ms',
'cost',
'status',
'bot_id',
'bot_name',
'pipeline_id',
'pipeline_name',
'session_id',
'message_id',
'error_message',
]
elif export_type == 'embedding-calls':
data = await self.ap.monitoring_service.export_embedding_calls(
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'id',
'timestamp',
'model_name',
'prompt_tokens',
'total_tokens',
'duration_ms',
'input_count',
'status',
'error_message',
'knowledge_base_id',
'query_text',
'session_id',
'message_id',
'call_type',
]
elif export_type == 'errors':
data = await self.ap.monitoring_service.export_errors(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'id',
'timestamp',
'error_type',
'error_message',
'bot_id',
'bot_name',
'pipeline_id',
'pipeline_name',
'session_id',
'message_id',
'stack_trace',
]
elif export_type == 'sessions':
data = await self.ap.monitoring_service.export_sessions(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'session_id',
'bot_id',
'bot_name',
'pipeline_id',
'pipeline_name',
'message_count',
'start_time',
'last_activity',
'is_active',
'platform',
'user_id',
]
elif export_type == 'feedback':
data = await self.ap.monitoring_service.export_feedback(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
limit=limit,
)
headers = [
'id',
'timestamp',
'feedback_id',
'feedback_type',
'feedback_content',
'inaccurate_reasons',
'bot_id',
'bot_name',
'pipeline_id',
'pipeline_name',
'session_id',
'message_id',
'stream_id',
'user_id',
'platform',
]
else:
return self.error(message=f'Invalid export type: {export_type}', code=400)
# Generate CSV content with UTF-8 BOM for Excel compatibility
import io
output = io.StringIO()
# Write UTF-8 BOM for Excel
output.write('\ufeff')
# Write header
output.write(','.join(headers) + '\n')
# Escape and write each row
for row in data:
escaped_values = []
for header in headers:
value = row.get(header, '')
escaped_values.append(self.ap.monitoring_service._escape_csv_field(value))
output.write(','.join(escaped_values) + '\n')
csv_content = output.getvalue()
# Return as file download
response = await quart.make_response(csv_content)
response.headers['Content-Type'] = 'text/csv; charset=utf-8'
response.headers['Content-Disposition'] = (
f'attachment; filename="monitoring-{export_type}-{int(datetime.datetime.now().timestamp())}.csv"'
)
return response, 200
@self.route('/feedback/stats', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_feedback_stats() -> str:
"""Get feedback statistics"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
stats = await self.ap.monitoring_service.get_feedback_stats(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
)
return self.success(data=stats)
@self.route('/feedback', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_feedback() -> str:
"""Get feedback list"""
# Parse query parameters
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
feedback_type_str = quart.request.args.get('feedbackType')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
limit = int(quart.request.args.get('limit', 100))
offset = int(quart.request.args.get('offset', 0))
# Parse datetime
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
# Parse feedback type
feedback_type = int(feedback_type_str) if feedback_type_str else None
feedback_list, total = await self.ap.monitoring_service.get_feedback_list(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
feedback_type=feedback_type,
start_time=start_time,
end_time=end_time,
limit=limit,
offset=offset,
)
return self.success(
data={
'feedback': feedback_list,
'total': total,
'limit': limit,
'offset': offset,
}
)