fix(provider): capture streaming token usage; add token observability

The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).

- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
  endpoint: summary, per-model breakdown, token timeseries, and a
  zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
  per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization

Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.
This commit is contained in:
RockChinQ
2026-06-05 09:13:57 -04:00
parent d450226701
commit 39673444d2
10 changed files with 986 additions and 15 deletions

View File

@@ -46,6 +46,30 @@ class MonitoringRouterGroup(group.RouterGroup):
return self.success(data=metrics)
@self.route('/token-statistics', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_token_statistics() -> str:
"""Get detailed token usage statistics (summary, per-model, timeseries)."""
bot_ids = quart.request.args.getlist('botId')
pipeline_ids = quart.request.args.getlist('pipelineId')
start_time_str = quart.request.args.get('startTime')
end_time_str = quart.request.args.get('endTime')
bucket = quart.request.args.get('bucket', 'hour')
if bucket not in ('hour', 'day'):
bucket = 'hour'
start_time = parse_iso_datetime(start_time_str)
end_time = parse_iso_datetime(end_time_str)
stats = await self.ap.monitoring_service.get_token_statistics(
bot_ids=bot_ids if bot_ids else None,
pipeline_ids=pipeline_ids if pipeline_ids else None,
start_time=start_time,
end_time=end_time,
bucket=bucket,
)
return self.success(data=stats)
@self.route('/messages', methods=['GET'], auth_type=group.AuthType.USER_TOKEN)
async def get_messages() -> str:
"""Get message logs"""