* refactor(provider): use LiteLLM as unified LLM requester backend
- Replace 23+ individual requester implementations with unified litellmchat.py
- Add litellm_provider field to 27 YAML manifests for provider routing
- Delete redundant requester subclasses
- Add unit tests for LiteLLMRequester (29 tests)
- Fix num_retries parameter name (was max_retries)
- Fix exception handling order for subclass exceptions
LiteLLM provides unified API for 100+ providers, eliminating need for
provider-specific requesters.
* fix: ruff format provider.py
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* refactor(provider): simplify LiteLLM requester usage handling
- Remove unused Anthropic-specific tool schema generation
- Share completion argument construction between normal and streaming calls
- Use LiteLLM/OpenAI native usage fields for monitoring
- Collect stream token usage from LiteLLM stream_options
- Update LiteLLM requester tests for unified usage fields
* restore: restore deleted provider requester files
Restore individual provider requester implementations that were
removed in de61b5d3. These files coexist with the unified
litellmchat.py backend.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat: update requesters and improve provider selection UI
- Added `litellm_provider` field to various requesters' YAML configurations.
- Removed obsolete Python requester files for OpenRouter, PPIO, QHAIGC, ShengSuanYun, SiliconFlow, Space, TokenPony, VolcArk, and Xai.
- Introduced new requesters for Tencent and Together AI with corresponding YAML configurations and SVG icons.
- Enhanced the ProviderForm component to include a searchable dropdown for selecting providers, improving user experience.
- Updated localization files to include search provider text for both English and Chinese.
* fix(provider): align litellm rebase with master
* fix(provider): capture streaming token usage; add token observability
The LiteLLM streaming requester only captured usage when a chunk had an
empty `choices` list. Many OpenAI-compatible gateways (e.g. new-api) and
providers send the final usage payload in a chunk that still carries an
empty-delta choice, so streamed calls always recorded 0 tokens in the
monitoring logs/dashboard (non-streaming worked).
- Capture stream usage whenever a chunk carries it, regardless of choices
- Add robust _normalize_usage (dict/obj shapes, derive missing total_tokens)
- Register litellm in bootutils/deps.py (was in pyproject only)
- Add MonitoringService.get_token_statistics + /monitoring/token-statistics
endpoint: summary, per-model breakdown, token timeseries, and a
zero-token-success data-quality signal
- Add TokenMonitoring dashboard tab (summary tiles, stacked token chart,
per-model table) + i18n (en/zh)
- Regression tests for stream usage capture and usage normalization
Verified end-to-end against a real OpenAI-compatible endpoint with
gpt-5.5 and claude-opus-4-8: tokens now recorded non-zero for both
streaming and non-streaming paths.
* refactor(provider): simplify litellm capabilities
* style: simplify wrapped expressions
* feat(models): persist context metadata
* fix(provider): handle dict embeddings and openai-compatible rerank in LiteLLMRequester
- invoke_embedding: support both object- and dict-shaped response.data
entries (OpenAI-compatible gateways like new-api return dicts)
- invoke_rerank: litellm.arerank rejects the 'openai' provider, so for
openai-compatible (or unspecified) providers call the standard
Jina/Cohere-style POST /v1/rerank endpoint directly over HTTP
- accept both 'relevance_score' and 'score' fields in rerank results
- add unit tests for the openai-compatible HTTP rerank path
* feat(provider): enforce requester support_type when adding models
- frontend: AddModelPopover only shows model-type tabs (llm/embedding/
rerank) that the provider's requester declares in its manifest
support_type; ModelsDialog fetches requester manifests and maps
requester -> support_type, passed down through ProviderCard
- backend: add _validate_provider_supports guard in create_llm_model /
create_embedding_model / create_rerank_model so a model cannot be
attached to a provider whose requester does not support that type,
even if the frontend restriction is bypassed (manifests without
support_type are allowed for backward compatibility)
- manifests: correct support_type for providers that do not offer all
three model types:
- llm only: anthropic, deepseek, groq, moonshot, openrouter, xai
- llm + text-embedding: openai, gemini, mistral
- add rerank to new-api (verified working via /v1/rerank)
- set llm + text-embedding + rerank for aggregator/unknown gateways
* feat(provider): add searchable alias to requester manifests
- add a free-text 'alias' field to every requester manifest spec,
containing the vendor's English/Chinese names, pinyin, common
nicknames and flagship model-series names (e.g. moonshot -> kimi,
月之暗面; zhipu -> glm, 智谱清言)
- frontend: ProviderForm requester search now also matches against
alias (substring/contains), so searching 'kimi' surfaces Moonshot,
'硅基' surfaces SiliconFlow, etc.
- also fix support_type: openrouter (relay) supports embedding+rerank;
LangBot Space gains rerank (coming soon)
* fix(provider): make support_type guard defensive against incomplete model_mgr
- _validate_provider_supports now uses getattr to gracefully skip when
model_mgr / provider_dict / manifest lookup is unavailable, instead of
raising AttributeError (fixes unit tests that mock ap.model_mgr as a
bare SimpleNamespace)
- add TestValidateProviderSupports covering: allow supported type,
reject unsupported type, allow when support_type missing, allow when
provider unknown, degrade safely when model_mgr is incomplete
* fix(persistence): guard 0004 migration against missing llm_models table
The 0004_add_llm_model_context_length migration called
inspector.get_columns('llm_models') unconditionally, raising
NoSuchTableError when the table does not exist (e.g. migrating a
fresh/empty DB, as exercised by the integration tests where
create_all() registers no tables because the ORM models are not
imported). Every other migration guards with a table-existence check
first; add the same guard here for both upgrade and downgrade.
Also restore the test head assertion to 0004 (it had been lowered to
0003 to mask this failure).
* Merge branch 'master' into feat/litellm
Resolve conflicts:
- uv.lock: regenerated via 'uv lock' to reconcile litellm/fastuuid
(ours) with openai bump (master).
- Alembic migrations: master added 0004_add_mcp_readme while this
branch added 0004_add_llm_model_context_length, both as children of
0003 (would create multiple heads). Re-chain the litellm migration as
0005_add_llm_model_context_length with down_revision=0004_add_mcp_readme
for a single linear head. Update test head assertion accordingly.
* fix(persistence): shorten migration revision id to fit varchar(32)
PostgreSQL stores alembic_version.version_num as varchar(32).
'0005_add_llm_model_context_length' (33 chars) overflowed it, raising
StringDataRightTruncationError in the PG migration tests. Rename the
revision (and file) to '0005_add_llm_context_length' (27 chars) and
update the head assertions in both SQLite and PostgreSQL migration
tests.
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: fdc310 <2213070223@qq.com>
Co-authored-by: RockChinQ <rockchinq@gmail.com>
LangBot Test Suite
This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.
Quality Gate Layers
LangBot uses a layered quality gate system for developers and CI:
| Layer | Command | What it runs | When to use |
|---|---|---|---|
| Quick | make test-quick or bash scripts/test-quick.sh |
Ruff lint + Unit tests + Smoke tests | Before every commit |
| Fast Integration | make test-integration-fast or bash scripts/test-integration-fast.sh |
SQLite/API/Pipeline integration (no external services) | Before PR, weekly |
| Coverage Gate | make test-coverage or bash scripts/test-coverage.sh |
All tests with coverage, threshold: 18% | Before merge, CI |
| Full Local | make test-all-local |
Quick + Integration + Coverage | Before major changes |
Note: PostgreSQL migration tests and slow tests are NOT in local default gates. They run in separate CI workflows.
Developer Workflow
# Daily: Quick self-test
bash scripts/test-quick.sh
# Before PR: Full local gate
make test-all-local
# Or run each layer separately:
bash scripts/test-quick.sh # ~2 min
bash scripts/test-integration-fast.sh # ~3 min
bash scripts/test-coverage.sh # ~8 min
Coverage Baseline
Current coverage threshold: 18% Actual coverage: 30%
This is a conservative baseline to prevent coverage regression. It does NOT represent the final quality target. Key modules have higher coverage:
pipeline.preproc.preproc: 53%pipeline.process.process: 96%pipeline.respback.respback: 88%telemetry.telemetry: 87%provider.session.sessionmgr: 100%provider.tools.toolmgr: 83%storage.providers.s3storage: 80%
Important Note
Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.
Structure
tests/
├── __init__.py
├── factories/ # Shared test factories
│ ├── __init__.py # Factory exports
│ ├── app.py # FakeApp factory
│ ├── message.py # Message/query factories
│ ├── provider.py # FakeProvider factory
│ └── platform.py # FakePlatform factory
├── integration/ # Integration tests (real resources)
│ ├── __init__.py
│ ├── api/ # HTTP API tests
│ │ ├── __init__.py
│ │ └── test_smoke.py # API smoke tests
│ ├── pipeline/ # Pipeline stage-chain tests
│ │ ├── __init__.py
│ │ └── test_full_flow.py # Full flow integration
│ └── persistence/ # Database/persistence tests
│ ├── __init__.py
│ └── test_migrations.py # Alembic migration tests
├── smoke/ # Smoke tests (quick validation)
│ └── test_fake_message_flow.py
├── unit_tests/ # Unit tests
│ ├── box/ # Box module tests
│ ├── config/ # Configuration tests
│ ├── pipeline/ # Pipeline stage tests
│ │ └── conftest.py # Shared fixtures and test infrastructure
│ ├── platform/ # Platform adapter tests
│ ├── plugin/ # Plugin system tests
│ │ └── test_handler_actions.py # Action handler tests
│ ├── provider/ # Provider tests
│ │ ├── test_session_manager.py # SessionManager tests
│ │ └── test_tool_manager.py # ToolManager tests
│ ├── rag/ # RAG tests
│ │ └── test_file_storage.py # File/ZIP storage tests
│ ├── storage/ # Storage tests
│ │ └── test_s3storage.py # S3StorageProvider tests
│ ├── vector/ # Vector tests
│ │ └── test_vdb_filter_conversion.py # VDB filter tests
│ └── telemetry/ # Telemetry tests (rewritten)
├── utils/ # Test utilities
│ ├── __init__.py
│ └── import_isolation.py # sys.modules isolation for circular imports
└── README.md # This file
Test Factories
The tests/factories/ package provides reusable test factories:
from tests.factories import (
FakeApp, # Mock application
FakeProvider, # Fake LLM provider
FakePlatform, # Fake platform adapter
text_query, # Create text query
group_text_query, # Create group query
command_query, # Create command query
)
# Create fake app
app = FakeApp()
# Create query with text
query = text_query("hello world")
# Create fake provider that returns specific response
provider = FakeProvider().returns("test response")
# Create fake platform for outbound capture
platform = FakePlatform()
await platform.reply_message(query.message_event, reply_chain)
outbound = platform.get_outbound_messages()
See tests/factories/__init__.py for all available factories.
Test Architecture
Fixtures (conftest.py)
The test suite uses a centralized fixture system that provides:
- MockApplication: Comprehensive mock of the Application object with all dependencies
- Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
- Sample data: Ready-to-use Query objects, message chains, and configurations
- Helper functions: Utilities for creating results and common assertions
Design Principles
- Isolation: Each test is independent and doesn't rely on external systems
- Mocking: All external dependencies are mocked to ensure fast, reliable tests
- Coverage: Tests cover happy paths, edge cases, and error conditions
- Extensibility: Easy to add new tests by reusing existing fixtures
Running Tests
Quick self-test for developers
For local branch validation without real provider keys:
make test-quick
or
bash scripts/test-quick.sh
This runs:
- Ruff lint check
- Unit tests
- Smoke tests
Suitable for quick validation before committing.
Using the test runner script (recommended for full coverage)
bash run_tests.sh
This script automatically:
- Activates the virtual environment
- Installs test dependencies if needed
- Runs tests with coverage
- Generates HTML coverage report
Manual test execution
Run all unit tests
uv run pytest tests/unit_tests/ --cov=langbot --cov-report=xml --cov-report=term
Run specific test module
uv run pytest tests/unit_tests/pipeline/ -v
Run specific test file
uv run pytest tests/unit_tests/pipeline/test_bansess.py -v
Run with coverage
uv run pytest tests/unit_tests/pipeline/ --cov=langbot --cov-report=html
Run specific test
uv run pytest tests/unit_tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v
Using markers
# Run only unit tests
uv run pytest tests/unit_tests/ -m unit
# Run only integration tests
uv run pytest tests/integration/ -m integration
# Run integration tests excluding slow ones
uv run pytest tests/integration/ -m "not slow" -q
# Skip slow tests
uv run pytest tests/unit_tests/ -m "not slow"
Running integration tests
Integration tests validate real system behavior with actual database/network resources.
# Run all integration tests (excluding slow ones)
uv run pytest tests/integration/ -m "not slow" -q
# Run SQLite migration integration tests
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short
# Run API smoke integration tests
uv run pytest tests/integration/api/test_smoke.py -q
# Run pipeline full-flow integration tests
uv run pytest tests/integration/pipeline/test_full_flow.py -q
# Run with verbose output
uv run pytest tests/integration/ -v
Note: Integration tests use:
- Temporary databases (tmp_path) for persistence tests
- Fake app/services for API tests (no real provider/platform)
- Fake runner/provider for pipeline tests (no real LLM API)
- Do not require external services
Running migration tests locally
SQLite migration tests can be run locally without any external dependencies:
# SQLite migration tests (uses tmp_path, no external DB needed)
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short
PostgreSQL migration tests require an external PostgreSQL database:
# PostgreSQL migration tests (requires PostgreSQL service)
# Tests are marked as slow and skipped if TEST_POSTGRES_URL is not set
TEST_POSTGRES_URL=postgresql+asyncpg://user:pass@localhost:5432/test_db \
uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short
# Or skip by default (no PostgreSQL available)
uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short
# Output: SKIPPED (TEST_POSTGRES_URL not set)
Note: PostgreSQL tests are not included in fast integration gate because they:
- Require external PostgreSQL service
- Are marked with
@pytest.mark.slow - Need
TEST_POSTGRES_URLenvironment variable
CI workflow .github/workflows/test-migrations.yml runs:
- SQLite tests in
test-migrations-sqlitejob (fast, no external services) - PostgreSQL tests in
test-migrations-postgresjob (uses PostgreSQL service container)
Running pipeline integration tests locally
Pipeline full-flow integration tests validate real stage interactions:
# Run pipeline integration tests (uses fake runner, no real LLM API)
uv run pytest tests/integration/pipeline/test_full_flow.py -q --tb=short
# Run with coverage for pipeline modules
uv run pytest tests/integration/pipeline \
--cov=langbot.pkg.pipeline.preproc.preproc \
--cov=langbot.pkg.pipeline.process.process \
--cov=langbot.pkg.pipeline.respback.respback \
--cov-report=term -q
These tests:
- Use
FakeRunnerclass to simulate LLM responses without real API calls - Import real
PreProcessor,MessageProcessor,SendResponseBackStagestages - Validate stage chain: PreProcessor → Processor → SendResponseBackStage
- Test prevent_default, exception handling, and full message flow
- Do not require real LLM provider keys
Known Issues
Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:
- Make sure you're running from the project root directory
- Ensure dependencies are installed:
uv sync --dev - Try running a simple test first to verify the test infrastructure works
CI/CD Integration
Tests are automatically run on:
- Pull request opened
- Pull request marked ready for review
- Push to PR branch
- Push to master/develop branches
The workflow runs tests on Python 3.11, 3.12, and 3.13 to ensure compatibility.
Adding New Tests
1. For a new pipeline stage
Create a new test file test_<stage_name>.py:
"""
<StageName> stage unit tests
"""
import pytest
from langbot.pkg.pipeline.<module>.<stage> import <StageClass>
from langbot.pkg.pipeline import entities as pipeline_entities
@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
"""Test basic flow"""
stage = <StageClass>(mock_app)
await stage.initialize({})
result = await stage.process(sample_query, '<StageName>')
assert result.result_type == pipeline_entities.ResultType.CONTINUE
2. For additional fixtures
Add new fixtures to the appropriate conftest.py:
@pytest.fixture
def my_custom_fixture():
"""Description of fixture"""
return create_test_data()
3. For test data
Use the helper functions in conftest.py:
from tests.unit_tests.pipeline.conftest import create_stage_result, assert_result_continue
result = create_stage_result(
result_type=pipeline_entities.ResultType.CONTINUE,
query=sample_query
)
assert_result_continue(result)
Best Practices
- Test naming: Use descriptive names that explain what's being tested
- Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
- One assertion per test: Focus each test on a single behavior
- Mock appropriately: Mock external dependencies, not the code under test
- Use fixtures: Reuse common test data through fixtures
- Document tests: Add docstrings explaining what each test validates
Troubleshooting
Import errors
Make sure you've installed the package in development mode:
uv sync --dev
Async test failures
Ensure you're using @pytest.mark.asyncio decorator for async tests.
Mock not working
Check that you're mocking at the right level and using AsyncMock for async functions.
Future Enhancements
- Add integration tests for database migrations (SQLite)
- Add PostgreSQL migration integration tests (G-003)
- Add integration tests for full pipeline execution
- Add API smoke integration tests
- Add E2E tests
- Add performance benchmarks
- Add mutation testing for better coverage quality
- Add property-based testing with Hypothesis