mirror of https://github.com/langbot-app/LangBot.git synced 2026-06-02 03:55:55 +00:00

Files

huanghuoguoguo 17bbc8bf10 Feat/test build (#2174 )

* fix(ci): update unit-test workflow paths to match current source layout

Replace stale pkg/** filter with src/langbot/** and add uv.lock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(tests): update README to reflect current test layout

- Fix stale paths: tests/pipeline → tests/unit_tests/pipeline
- Update CI Python versions: 3.11, 3.12, 3.13
- Add test directory structure for box, config, platform, plugin, provider, storage
- Document pytest markers and uv commands
- Mention planned E2E tests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add shared test factories package

Create tests/factories/ with reusable test factories:
- FakeApp: mock application with all dependencies
- Message chains: text_chain, mention_chain, image_chain
- Query factories: text_query, group_text_query, command_query, etc.

No test changes - maintains backward compatibility.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add fake provider factory

Add tests/factories/provider.py with:
- FakeProvider: deterministic fake LLM provider
- Error simulation: timeout, auth, rate-limit, malformed
- Request capture for assertions
- fake_model: mock model with attached provider

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add fake platform factory

Add tests/factories/platform.py with:
- FakePlatform: simulated platform adapter
- Inbound message construction: friend/group/image
- Mention-bot flag simulation
- Outbound message capture for assertions
- Streaming output support simulation
- Send failure simulation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add comprehensive message/query factories

Extend tests/factories/message.py with:
- file_query: file attachment query
- unsupported_query: unknown message segment
- voice_query: audio/voice query
- at_all_query: group @All mention
- query_with_session: query with session object
- query_with_config: query with custom pipeline config

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add fake message flow smoke test

Create tests/smoke/test_fake_message_flow.py:
- TestFakeMessageFlow: factory verification tests
- TestMessageFlowIntegration: minimal flow smoke test
- Tests FakeApp, FakeProvider, FakePlatform, query factories
- Verifies LANGBOT_FAKE_PONG marker response
- Captures outbound messages for assertions

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add developer test-quick command

Add scripts/test-quick.sh and Makefile with:
- test-quick: runs ruff check + unit tests + smoke tests
- No real provider keys or platform accounts required
- Suitable for local branch self-test

Update tests/README.md:
- Document test-quick command
- Document test factories package
- Add smoke tests and factories directory structure

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(test): make test-quick reliable as developer gate

Fixes for D-001验收问题:
1. test-quick.sh: use set -euo pipefail, uv run ruff, no tail pipe
2. Remove unused imports in factories (app.py, platform.py, provider.py)
3. Fix unused variable in smoke test
4. Add noqa: E402 to test_n8nsvapi.py lazy imports
5. Update smoke test docs: "minimal fake flow" not full pipeline

Now test-quick is a reliable gate: lint failures exit 1, test failures propagate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(unit): add preproc and taskmgr unit tests

U-001: Pipeline Preprocessor tests
- Normal text message processing
- Empty message handling
- Image segment with/without vision model
- Model selection and fallback
- Variable extraction

U-004: Core Task Manager tests (pattern-based)
- Task creation and tracking patterns
- Task cancellation patterns
- Scope-based cancellation
- Task type filtering
- Pruning completed tasks
- Wait all tasks

Taskmgr tests use pattern-based approach to avoid circular import
in source code (taskmgr → app → http_controller → migration → taskmgr).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(unit): add config loader unit tests

U-005: Config Loader tests
- Valid YAML config loading
- Valid JSON config loading
- Invalid YAML/JSON error behavior
- Missing config file creation from template
- Template completion for missing keys
- ConfigManager load/dump operations
- Exists check for both YAML and JSON

All tests use tmp_path fixture, no real project config.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(unit): add chat and command handler pattern tests

U-002: Chat Handler tests (pattern-based)
- Normal message event emission pattern
- prevent_default handling
- User message alteration pattern
- Runner selection pattern
- Streaming/non-streaming response patterns
- Exception handling modes (show-error, show-hint, hide)
- Message history update pattern
- Telemetry payload pattern

U-003: Command Handler tests (pattern-based)
- Command parsing and text extraction
- Event creation pattern
- Privilege/admin check pattern
- Command result handling (text, error, image)
- prevent_default handling
- String truncation helper

Uses pattern-based testing to avoid circular import issues in source code.
Direct imports of handler modules trigger circular import chain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style: fix unused imports after ruff auto-fix

Remove unused imports in test files:
- test_config_loader.py: remove unused os
- test_taskmgr.py: remove unused Mock
- test_preproc.py: remove unused unsupported_query, image_chain

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(unit): improve taskmgr tests to test real classes

U-004 improved: Tests now import and test actual classes:
- TaskContext: new(), trace(), to_dict(), placeholder()
- TaskWrapper: task creation, context, exception/result capture, cancel, to_dict
- AsyncTaskManager: create_task, create_user_task, cancel_task, cancel_by_scope
- Task pruning behavior

Uses pre-mocking technique:
- Mock langbot.pkg.core.app before import (breaks circular chain)
- Mock langbot.pkg.core.entities with proper Enum

All 24 tests now test real class behavior, not patterns.
taskmgr.py coverage should improve significantly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(test): consolidate FakeApp and add sys.modules isolation utility

- Extract tests/utils/import_isolation.py with isolated_sys_modules context manager
- Extend tests/factories/app.py FakeApp with handler-specific attributes
- Refactor test_chat_handler.py to use centralized FakeApp and cached imports
- Refactor test_command_handler.py with mock_execute_factory fixture
- Refactor test_smoke.py to move import-time sys.modules manipulation into fixture
- Add SQLite migration integration tests (G-002)
- Add HTTP API smoke integration tests (G-005)
- Update CI workflow to call pytest for SQLite migrations (G-004)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add developer quality gate consolidation (G-007)

- Add scripts/test-integration-fast.sh for fast integration tests
- Add scripts/test-coverage.sh with 12% baseline threshold
- Update Makefile with test-integration-fast, test-coverage, test-all-local
- Update CI workflow with integration and coverage jobs
- Add smoke marker to pytest.ini
- Update tests/README.md with quality gate layers documentation
- Add tests/integration/pipeline/ for pipeline stage-chain tests

Quality gate layers:
- Quick: ruff + unit + smoke (~2 min)
- Fast Integration: SQLite/API/Pipeline (~3 min)
- Coverage: 12% threshold gate (~8 min)
- Full Local: all three combined

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): add PostgreSQL migration slow integration tests (G-003)

- Add tests/integration/persistence/test_migrations_postgres.py
- All tests marked with @pytest.mark.slow
- Tests skip when TEST_POSTGRES_URL is not set (no local PostgreSQL)
- Database isolation via clean_tables and clean_alembic_version fixtures
- Update CI workflow to use pytest instead of inline Python script
- Remove TODO(G-003) comment
- Update tests/README.md with PostgreSQL test documentation

Covered scenarios:
- Baseline stamp sets revision
- Upgrade from baseline to head
- Upgrade idempotent
- Get current on unstamped DB returns None

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(test): Phase 1.5 coverage expansion - COV-001 to COV-013

Coverage baseline raised from 13.65% to 26% (+12.35%)
Gate raised from 12% to 18%

Tasks completed:
- COV-001: Command system unit tests (100% coverage)
- COV-002: API service unit tests batch 1 (user/apikey/model/provider)
- COV-003: Provider model manager unit tests
- COV-004: Pipeline remaining stage tests (aggregator/cntfilter/longtext/msgtrun)
- COV-005: Storage and utils coverage pass
- COV-006: Gate ratchet 12%→15%
- COV-007: Gate ratchet 15%→18%
- COV-008: API service batch 2 (bot/pipeline/webhook/space/maintenance/mcp)
- COV-009: Blocked - API controller circular import issue documented
- COV-010: Plugin runtime unit tests (+0.08%)
- COV-011: RAG and vector unit tests (+0.68%)
- COV-012: Core boot and migration unit tests
- COV-013: Provider requester logic unit tests (+0.62%)

Key additions:
- tests/utils/import_isolation.py: sys.modules isolation for circular imports
- Provider requester mock tests: proved HTTP-dependent code can be tested locally
- Vector filter utilities: 100% coverage on pure functions
- API services: fake persistence pattern for unit testing

Blocked issue COV-009 documented in langbot-test-plan/1.5/issues/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(phase1): add unit tests for telemetry, plugin, rag, persistence

Add initial unit tests for Phase 1 of test coverage improvement:
- telemetry: test initialization, payload sanitization, early returns (14.3% → 62.9%)
- plugin: test _parse_plugin_id static method
- rag: test _to_i18n_name static method
- persistence: test serialize_model with datetime handling

Overall core coverage: 41.9% → 42.2%

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(phase2): add unit tests for core, persistence, plugin, utils

- Add test_handler_helpers.py for plugin handler helpers (7 tests)
- Add test_mgr_methods.py for persistence manager (5 tests)
- Add test_app_config_validation.py for core app config (12 tests)
- Add test_knowledge_service.py for API knowledge service (22 tests)
- Add test_kbmgr.py for RAG knowledge base manager (39 tests)
- Add test_survey_manager.py for survey manager (22 tests)
- Add test_connector_methods.py for plugin connector (24 tests)
- Add test_funcschema.py for utils function schema (9 tests)
- Add test_platform.py for utils platform detection (7 tests)
- Add test_extract_deps.py for plugin deps extraction (7 tests)
- Add test_database_decorator.py for persistence decorator (7 tests)
- Add test_load_config.py for core config loading (19 tests)
- Add COVERAGE_EXCLUSIONS.md documenting external adapter exclusions
- Fix test_chat_session_limit.py path for portability

Coverage: core 28% → 30%, persistence 24% → 24.4%, plugin 27% → 28%
Total: 1082 tests passed, core module coverage 45.5%

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(integration): add API controller integration tests

- Add test_pipelines.py (10 tests) covering pipelines CRUD operations
  - GET/POST/PUT/DELETE on /api/v1/pipelines
  - Extensions endpoint
  - Metadata endpoint
  - Coverage: pipelines controller 27% → 80%

- Add test_providers.py (10 tests) covering provider/model management
  - Provider CRUD with model counts
  - LLM model CRUD
  - Coverage: providers controller 23% → 81%, models 29% → 45%

Tests use Quart TestClient with mocked services for real HTTP behavior
without external dependencies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(integration): add knowledge, bots, and model endpoints tests

- Add test_knowledge.py (10 tests) covering knowledge base management
  - CRUD operations on /api/v1/knowledge/bases
  - Files management endpoints
  - Retrieve endpoint with validation
  - Coverage: knowledge/base.py 26% → 91%

- Add test_bots.py (9 tests) covering bot management
  - CRUD operations on /api/v1/platform/bots
  - Logs endpoint
  - Send message endpoint with validation
  - Coverage: platform/bots.py 24% → 87%

- Extend test_providers.py (+4 tests) for embedding/rerank models
  - Embedding models CRUD
  - Rerank models CRUD
  - Coverage: provider/models.py 29% → 60%

Total integration tests: 53 (smoke 12 + pipelines 10 + providers 14 + knowledge 10 + bots 9)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(integration): add embed and monitoring endpoint tests

Add integration tests for embed widget and monitoring API endpoints:
- test_embed.py: 15 tests for widget.js, logo, turnstile, messages, reset, feedback
- test_monitoring.py: 15 tests for overview, messages, llm-calls, sessions, errors, export

Coverage improvements:
- embed.py: 17% → 56%
- monitoring.py: 17% → 93%

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(e2e): add minimal startup E2E tests

Add E2E tests for LangBot startup flow:
- tests/e2e/utils/config_factory.py: minimal config generation
- tests/e2e/utils/process_manager.py: LangBot subprocess management
- tests/e2e/conftest.py: E2E fixtures (session-scoped process)
- tests/e2e/test_startup.py: 12 tests for startup verification

Tests verify:
- boot.py + stages execution
- database initialization (SQLite)
- API availability
- migrations applied

Uses embedded databases (SQLite, Chroma) - no external dependencies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(quality): fix fake tests and add missing coverage

P0 fixes:
- telemetry: rewrite fake tests with real behavior verification (25 tests)
- config: delete copied-source tests, use proper imports (2 deleted)
- persistence: fix try-except pass to verify specific errors

P1 fixes:
- pipeline: add real FixedWindowAlgo tests instead of mocks (12 tests)
- provider: add SessionManager and ToolManager tests (25 tests)
- storage: add S3StorageProvider tests with moto mock (16 tests)
- plugin: add handler action tests for setting inheritance (15 tests)
- rag: add file storage and ZIP processing tests (21 tests)
- vector: add VDB filter conversion tests (30 tests)

P2 fixes:
- pipeline/msgtrun: strengthen assertions for exact message count
- api: add response structure validation in integration tests

New test files:
- provider/test_session_manager.py
- provider/test_tool_manager.py
- storage/test_s3storage.py
- plugin/test_handler_actions.py
- rag/test_file_storage.py
- vector/test_vdb_filter_conversion.py

Source code bugs documented:
- provider: TokenManager.next_token() ZeroDivisionError
- telemetry: send_tasks class variable shared state
- command: empty command IndexError, unused parameters
- utils: funcschema KeyError
- entity: vector.py independent declarative_base

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(test): update coverage stats and test structure

- Update coverage from 22% to 30%
- Add new test files to structure:
  - provider: session_manager, tool_manager
  - storage: s3storage
  - plugin: handler_actions
  - rag: file_storage
  - vector: vdb_filter_conversion
  - telemetry: rewritten tests
- Update module coverage percentages

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: add 105 new unit tests for untested core functionality

Add comprehensive tests for B-class issues (core functionality untested):

Pipeline:
- test_pool.py: QueryPool ID generation, caching, async context (12 tests)
- test_ratelimit.py: Fixed timing-sensitive test tolerance
- test_pipelinemgr.py: Use real Pydantic StageProcessResult instead of Mock

Utils:
- test_version.py: Version comparison functions (20 tests)
- test_logcache.py: Log page management and retrieval (18 tests)
- test_httpclient.py: HTTP session pool management (10 tests)
- test_proxy.py: Proxy configuration from env and config (10 tests)
- test_image.py: URL parsing and base64 extraction (12 tests)
- test_pkgmgr.py: Pip command generation (8 tests)

Discover:
- test_engine.py: I18nString, Metadata, Component manifest (15 tests)

Test count: 1193 → 1298 (+105 tests)

Note: Some B-class issues cannot be tested due to circular import bugs
filed as GitHub issues #2175 (pipeline) and #2176 (persistence).

* test: tighten phase 1 coverage contracts

* test: align ci integration isolation

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-16 12:05:54 +08:00

e2e

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

factories

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

integration

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

smoke

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

unit_tests

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

utils

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

__init__.py

feat: add comprehensive unit tests for pipeline stages (#1701 )

2025-10-01 10:56:59 +08:00

README.md

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

README.md

LangBot Test Suite

This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.

Quality Gate Layers

LangBot uses a layered quality gate system for developers and CI:

Layer	Command	What it runs	When to use
Quick	`make test-quick` or `bash scripts/test-quick.sh`	Ruff lint + Unit tests + Smoke tests	Before every commit
Fast Integration	`make test-integration-fast` or `bash scripts/test-integration-fast.sh`	SQLite/API/Pipeline integration (no external services)	Before PR, weekly
Coverage Gate	`make test-coverage` or `bash scripts/test-coverage.sh`	All tests with coverage, threshold: 18%	Before merge, CI
Full Local	`make test-all-local`	Quick + Integration + Coverage	Before major changes

Note: PostgreSQL migration tests and slow tests are NOT in local default gates. They run in separate CI workflows.

Developer Workflow

# Daily: Quick self-test
bash scripts/test-quick.sh

# Before PR: Full local gate
make test-all-local

# Or run each layer separately:
bash scripts/test-quick.sh           # ~2 min
bash scripts/test-integration-fast.sh # ~3 min
bash scripts/test-coverage.sh         # ~8 min

Coverage Baseline

Current coverage threshold: 18% Actual coverage: 30%

This is a conservative baseline to prevent coverage regression. It does NOT represent the final quality target. Key modules have higher coverage:

pipeline.preproc.preproc: 53%
pipeline.process.process: 96%
pipeline.respback.respback: 88%
telemetry.telemetry: 87%
provider.session.sessionmgr: 100%
provider.tools.toolmgr: 83%
storage.providers.s3storage: 80%

Important Note

Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.

Structure

tests/
├── __init__.py
├── factories/                    # Shared test factories
│   ├── __init__.py              # Factory exports
│   ├── app.py                   # FakeApp factory
│   ├── message.py               # Message/query factories
│   ├── provider.py              # FakeProvider factory
│   └── platform.py              # FakePlatform factory
├── integration/                  # Integration tests (real resources)
│   ├── __init__.py
│   ├── api/                     # HTTP API tests
│   │   ├── __init__.py
│   │   └── test_smoke.py        # API smoke tests
│   ├── pipeline/                # Pipeline stage-chain tests
│   │   ├── __init__.py
│   │   └── test_full_flow.py    # Full flow integration
│   └── persistence/             # Database/persistence tests
│       ├── __init__.py
│       └── test_migrations.py   # Alembic migration tests
├── smoke/                        # Smoke tests (quick validation)
│   └── test_fake_message_flow.py
├── unit_tests/                   # Unit tests
│   ├── box/                      # Box module tests
│   ├── config/                   # Configuration tests
│   ├── pipeline/                 # Pipeline stage tests
│   │   └── conftest.py          # Shared fixtures and test infrastructure
│   ├── platform/                 # Platform adapter tests
│   ├── plugin/                   # Plugin system tests
│   │   └── test_handler_actions.py # Action handler tests
│   ├── provider/                 # Provider tests
│   │   ├── test_session_manager.py # SessionManager tests
│   │   └── test_tool_manager.py    # ToolManager tests
│   ├── rag/                      # RAG tests
│   │   └── test_file_storage.py   # File/ZIP storage tests
│   ├── storage/                  # Storage tests
│   │   └── test_s3storage.py      # S3StorageProvider tests
│   ├── vector/                   # Vector tests
│   │   └── test_vdb_filter_conversion.py # VDB filter tests
│   └── telemetry/                # Telemetry tests (rewritten)
├── utils/                        # Test utilities
│   ├── __init__.py
│   └── import_isolation.py      # sys.modules isolation for circular imports
└── README.md                     # This file

Test Factories

The tests/factories/ package provides reusable test factories:

from tests.factories import (
    FakeApp,          # Mock application
    FakeProvider,     # Fake LLM provider
    FakePlatform,     # Fake platform adapter
    text_query,       # Create text query
    group_text_query, # Create group query
    command_query,    # Create command query
)

# Create fake app
app = FakeApp()

# Create query with text
query = text_query("hello world")

# Create fake provider that returns specific response
provider = FakeProvider().returns("test response")

# Create fake platform for outbound capture
platform = FakePlatform()
await platform.reply_message(query.message_event, reply_chain)
outbound = platform.get_outbound_messages()

See tests/factories/__init__.py for all available factories.

Test Architecture

Fixtures (`conftest.py`)

The test suite uses a centralized fixture system that provides:

MockApplication: Comprehensive mock of the Application object with all dependencies
Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
Sample data: Ready-to-use Query objects, message chains, and configurations
Helper functions: Utilities for creating results and common assertions

Design Principles

Isolation: Each test is independent and doesn't rely on external systems
Mocking: All external dependencies are mocked to ensure fast, reliable tests
Coverage: Tests cover happy paths, edge cases, and error conditions
Extensibility: Easy to add new tests by reusing existing fixtures

Running Tests

Quick self-test for developers

For local branch validation without real provider keys:

make test-quick

bash scripts/test-quick.sh

This runs:

Ruff lint check
Unit tests
Smoke tests

Suitable for quick validation before committing.

Using the test runner script (recommended for full coverage)

bash run_tests.sh

This script automatically:

Activates the virtual environment
Installs test dependencies if needed
Runs tests with coverage
Generates HTML coverage report

Manual test execution

Run all unit tests

uv run pytest tests/unit_tests/ --cov=langbot --cov-report=xml --cov-report=term

Run specific test module

uv run pytest tests/unit_tests/pipeline/ -v

Run specific test file

uv run pytest tests/unit_tests/pipeline/test_bansess.py -v

Run with coverage

uv run pytest tests/unit_tests/pipeline/ --cov=langbot --cov-report=html

Run specific test

uv run pytest tests/unit_tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v

Using markers

# Run only unit tests
uv run pytest tests/unit_tests/ -m unit

# Run only integration tests
uv run pytest tests/integration/ -m integration

# Run integration tests excluding slow ones
uv run pytest tests/integration/ -m "not slow" -q

# Skip slow tests
uv run pytest tests/unit_tests/ -m "not slow"

Running integration tests

Integration tests validate real system behavior with actual database/network resources.

# Run all integration tests (excluding slow ones)
uv run pytest tests/integration/ -m "not slow" -q

# Run SQLite migration integration tests
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short

# Run API smoke integration tests
uv run pytest tests/integration/api/test_smoke.py -q

# Run pipeline full-flow integration tests
uv run pytest tests/integration/pipeline/test_full_flow.py -q

# Run with verbose output
uv run pytest tests/integration/ -v

Note: Integration tests use:

Temporary databases (tmp_path) for persistence tests
Fake app/services for API tests (no real provider/platform)
Fake runner/provider for pipeline tests (no real LLM API)
Do not require external services

Running migration tests locally

SQLite migration tests can be run locally without any external dependencies:

# SQLite migration tests (uses tmp_path, no external DB needed)
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short

PostgreSQL migration tests require an external PostgreSQL database:

# PostgreSQL migration tests (requires PostgreSQL service)
# Tests are marked as slow and skipped if TEST_POSTGRES_URL is not set
TEST_POSTGRES_URL=postgresql+asyncpg://user:pass@localhost:5432/test_db \
    uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short

# Or skip by default (no PostgreSQL available)
uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short
# Output: SKIPPED (TEST_POSTGRES_URL not set)

Note: PostgreSQL tests are not included in fast integration gate because they:

Require external PostgreSQL service
Are marked with @pytest.mark.slow
Need TEST_POSTGRES_URL environment variable

CI workflow .github/workflows/test-migrations.yml runs:

SQLite tests in test-migrations-sqlite job (fast, no external services)
PostgreSQL tests in test-migrations-postgres job (uses PostgreSQL service container)

Running pipeline integration tests locally

Pipeline full-flow integration tests validate real stage interactions:

# Run pipeline integration tests (uses fake runner, no real LLM API)
uv run pytest tests/integration/pipeline/test_full_flow.py -q --tb=short

# Run with coverage for pipeline modules
uv run pytest tests/integration/pipeline \
    --cov=langbot.pkg.pipeline.preproc.preproc \
    --cov=langbot.pkg.pipeline.process.process \
    --cov=langbot.pkg.pipeline.respback.respback \
    --cov-report=term -q

These tests:

Use FakeRunner class to simulate LLM responses without real API calls
Import real PreProcessor, MessageProcessor, SendResponseBackStage stages
Validate stage chain: PreProcessor → Processor → SendResponseBackStage
Test prevent_default, exception handling, and full message flow
Do not require real LLM provider keys

Known Issues

Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:

Make sure you're running from the project root directory
Ensure dependencies are installed: uv sync --dev
Try running a simple test first to verify the test infrastructure works

CI/CD Integration

Tests are automatically run on:

Pull request opened
Pull request marked ready for review
Push to PR branch
Push to master/develop branches

The workflow runs tests on Python 3.11, 3.12, and 3.13 to ensure compatibility.

Adding New Tests

1. For a new pipeline stage

Create a new test file test_<stage_name>.py:

"""
<StageName> stage unit tests
"""

import pytest
from langbot.pkg.pipeline.<module>.<stage> import <StageClass>
from langbot.pkg.pipeline import entities as pipeline_entities


@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
    """Test basic flow"""
    stage = <StageClass>(mock_app)
    await stage.initialize({})

    result = await stage.process(sample_query, '<StageName>')

    assert result.result_type == pipeline_entities.ResultType.CONTINUE

2. For additional fixtures

Add new fixtures to the appropriate conftest.py:

@pytest.fixture
def my_custom_fixture():
    """Description of fixture"""
    return create_test_data()

3. For test data

Use the helper functions in conftest.py:

from tests.unit_tests.pipeline.conftest import create_stage_result, assert_result_continue

result = create_stage_result(
    result_type=pipeline_entities.ResultType.CONTINUE,
    query=sample_query
)

assert_result_continue(result)

Best Practices

Test naming: Use descriptive names that explain what's being tested
Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
One assertion per test: Focus each test on a single behavior
Mock appropriately: Mock external dependencies, not the code under test
Use fixtures: Reuse common test data through fixtures
Document tests: Add docstrings explaining what each test validates

Troubleshooting

Import errors

Make sure you've installed the package in development mode:

uv sync --dev

Async test failures

Ensure you're using @pytest.mark.asyncio decorator for async tests.

Mock not working

Check that you're mocking at the right level and using AsyncMock for async functions.

Future Enhancements

Add integration tests for database migrations (SQLite)
Add PostgreSQL migration integration tests (G-003)
Add integration tests for full pipeline execution
Add API smoke integration tests
Add E2E tests
Add performance benchmarks
Add mutation testing for better coverage quality
Add property-based testing with Hypothesis

README.md

LangBot Test Suite

Quality Gate Layers

Developer Workflow

Coverage Baseline

Important Note

Structure

Test Factories

Test Architecture

Fixtures (conftest.py)

Design Principles

Running Tests

Quick self-test for developers

Using the test runner script (recommended for full coverage)

Manual test execution

Run all unit tests

Run specific test module

Run specific test file

Run with coverage

Run specific test

Using markers

Running integration tests

Running migration tests locally

Running pipeline integration tests locally

Known Issues

CI/CD Integration

Adding New Tests

1. For a new pipeline stage

2. For additional fixtures

3. For test data

Best Practices

Troubleshooting

Import errors

Async test failures

Mock not working

Future Enhancements

Fixtures (`conftest.py`)