# LangBot Test Suite This directory contains the LangBot backend test suite, including unit tests, integration tests, startup E2E tests, and container-backed Box runtime tests. ## Quality Gate Layers LangBot uses a layered quality gate system for developers and CI: | Layer | Command | What it runs | When to use | |-------|---------|--------------|-------------| | **Quick** | `make test-quick` or `bash scripts/test-quick.sh` | Ruff lint + Unit tests + Smoke tests | Before every commit | | **Fast Integration** | `make test-integration-fast` or `bash scripts/test-integration-fast.sh` | SQLite/API/Pipeline integration (no external services) | Before PR, weekly | | **Backend E2E** | `uv run --python 3.12 pytest tests/e2e -q --tb=short` | Starts a real LangBot process with minimal config | Before release, CI | | **Box Integration** | `uv run --python 3.12 pytest tests/integration_tests -q --tb=short` | Real Box sandbox/runtime integration | Before Box/runtime changes, CI | | **Frontend E2E** | `cd web && pnpm test:e2e` | Playwright smoke tests with mocked backend and Space APIs | Before web changes, CI | | **Coverage Gate** | `make test-coverage` or `bash scripts/test-coverage.sh` | All tests with coverage, threshold: 18% | Before merge, CI | | **Full Local** | `make test-all-local` | Quick + Integration + Coverage | Before major changes | **Note**: PostgreSQL migration tests and slow tests are NOT in local default gates. They run in separate CI workflows. Frontend Playwright tests live under `web/tests/e2e` and are documented in `web/README.md`. ### Developer Workflow ```bash # Daily: Quick self-test bash scripts/test-quick.sh # Before PR: Full local gate make test-all-local # Or run each layer separately: bash scripts/test-quick.sh # ~2 min bash scripts/test-integration-fast.sh # ~3 min bash scripts/test-coverage.sh # ~8 min uv run --python 3.12 pytest tests/e2e -q --tb=short uv run --python 3.12 pytest tests/integration_tests -q --tb=short cd web && pnpm test:e2e ``` ### Coverage Baseline Current coverage threshold: **18%** Actual coverage: **30%** This is a conservative baseline to prevent coverage regression. It does NOT represent the final quality target. Key modules have higher coverage: - `pipeline.preproc.preproc`: 53% - `pipeline.process.process`: 96% - `pipeline.respback.respback`: 88% - `telemetry.telemetry`: 87% - `provider.session.sessionmgr`: 100% - `provider.tools.toolmgr`: 83% - `storage.providers.s3storage`: 80% ## Important Note Due to circular import dependencies in the pipeline module structure, the test files use **lazy imports** via `importlib.import_module()` instead of direct imports. This ensures tests can run without triggering circular import errors. ## Structure ``` tests/ ├── __init__.py ├── factories/ # Shared test factories │ ├── __init__.py # Factory exports │ ├── app.py # FakeApp factory │ ├── message.py # Message/query factories │ ├── provider.py # FakeProvider factory │ └── platform.py # FakePlatform factory ├── integration/ # Integration tests (real resources) │ ├── __init__.py │ ├── api/ # HTTP API tests │ │ ├── __init__.py │ │ └── test_smoke.py # API smoke tests │ ├── pipeline/ # Pipeline stage-chain tests │ │ ├── __init__.py │ │ └── test_full_flow.py # Full flow integration │ └── persistence/ # Database/persistence tests │ ├── __init__.py │ └── test_migrations.py # Alembic migration tests ├── e2e/ # Real LangBot startup E2E tests │ ├── conftest.py │ ├── test_startup.py │ └── utils/ ├── integration_tests/ # Container-backed integration tests │ └── box/ # Box runtime and MCP process tests ├── smoke/ # Smoke tests (quick validation) │ └── test_fake_message_flow.py ├── unit_tests/ # Unit tests │ ├── box/ # Box module tests │ ├── config/ # Configuration tests │ ├── pipeline/ # Pipeline stage tests │ │ └── conftest.py # Shared fixtures and test infrastructure │ ├── platform/ # Platform adapter tests │ ├── plugin/ # Plugin system tests │ │ └── test_handler_actions.py # Action handler tests │ ├── provider/ # Provider tests │ │ ├── test_session_manager.py # SessionManager tests │ │ └── test_tool_manager.py # ToolManager tests │ ├── rag/ # RAG tests │ │ └── test_file_storage.py # File/ZIP storage tests │ ├── storage/ # Storage tests │ │ └── test_s3storage.py # S3StorageProvider tests │ ├── vector/ # Vector tests │ │ └── test_vdb_filter_conversion.py # VDB filter tests │ └── telemetry/ # Telemetry tests (rewritten) ├── utils/ # Test utilities │ ├── __init__.py │ └── import_isolation.py # sys.modules isolation for circular imports └── README.md # This file ``` ## Test Factories The `tests/factories/` package provides reusable test factories: ```python from tests.factories import ( FakeApp, # Mock application FakeProvider, # Fake LLM provider FakePlatform, # Fake platform adapter text_query, # Create text query group_text_query, # Create group query command_query, # Create command query ) # Create fake app app = FakeApp() # Create query with text query = text_query("hello world") # Create fake provider that returns specific response provider = FakeProvider().returns("test response") # Create fake platform for outbound capture platform = FakePlatform() await platform.reply_message(query.message_event, reply_chain) outbound = platform.get_outbound_messages() ``` See `tests/factories/__init__.py` for all available factories. ## Test Architecture ### Fixtures (`conftest.py`) The test suite uses a centralized fixture system that provides: - **MockApplication**: Comprehensive mock of the Application object with all dependencies - **Mock objects**: Pre-configured mocks for Session, Conversation, Model, Adapter - **Sample data**: Ready-to-use Query objects, message chains, and configurations - **Helper functions**: Utilities for creating results and common assertions ### Design Principles 1. **Isolation**: Each test is independent and doesn't rely on external systems 2. **Mocking**: All external dependencies are mocked to ensure fast, reliable tests 3. **Coverage**: Tests cover happy paths, edge cases, and error conditions 4. **Extensibility**: Easy to add new tests by reusing existing fixtures ## Running Tests ### Quick self-test for developers For local branch validation without real provider keys: ```bash make test-quick ``` or ```bash bash scripts/test-quick.sh ``` This runs: 1. Ruff lint check 2. Unit tests 3. Smoke tests Suitable for quick validation before committing. ### Using the test runner script (recommended for full coverage) ```bash bash run_tests.sh ``` This script automatically: - Activates the virtual environment - Installs test dependencies if needed - Runs tests with coverage - Generates HTML coverage report ### Manual test execution #### Run all unit tests ```bash uv run pytest tests/unit_tests/ --cov=langbot --cov-report=xml --cov-report=term ``` #### Run specific test module ```bash uv run pytest tests/unit_tests/pipeline/ -v ``` #### Run specific test file ```bash uv run pytest tests/unit_tests/pipeline/test_bansess.py -v ``` #### Run with coverage ```bash uv run pytest tests/unit_tests/pipeline/ --cov=langbot --cov-report=html ``` #### Run specific test ```bash uv run pytest tests/unit_tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v ``` ### Using markers ```bash # Run only unit tests uv run pytest tests/unit_tests/ -m unit # Run only integration tests uv run pytest tests/integration/ -m integration # Run integration tests excluding slow ones uv run pytest tests/integration/ -m "not slow" -q # Skip slow tests uv run pytest tests/unit_tests/ -m "not slow" ``` ### Running integration tests Integration tests validate real system behavior with actual database/network resources. ```bash # Run all integration tests (excluding slow ones) uv run pytest tests/integration/ -m "not slow" -q # Run SQLite migration integration tests uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short # Run API smoke integration tests uv run pytest tests/integration/api/test_smoke.py -q # Run pipeline full-flow integration tests uv run pytest tests/integration/pipeline/test_full_flow.py -q # Run with verbose output uv run pytest tests/integration/ -v ``` Note: Integration tests use: - Temporary databases (tmp_path) for persistence tests - Fake app/services for API tests (no real provider/platform) - Fake runner/provider for pipeline tests (no real LLM API) - Do not require external services ### Running migration tests locally SQLite migration tests can be run locally without any external dependencies: ```bash # SQLite migration tests (uses tmp_path, no external DB needed) uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short ``` PostgreSQL migration tests require an external PostgreSQL database: ```bash # PostgreSQL migration tests (requires PostgreSQL service) # Tests are marked as slow and skipped if TEST_POSTGRES_URL is not set TEST_POSTGRES_URL=postgresql+asyncpg://user:pass@localhost:5432/test_db \ uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short # Or skip by default (no PostgreSQL available) uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short # Output: SKIPPED (TEST_POSTGRES_URL not set) ``` Note: PostgreSQL tests are **not** included in fast integration gate because they: - Require external PostgreSQL service - Are marked with `@pytest.mark.slow` - Need `TEST_POSTGRES_URL` environment variable CI workflow `.github/workflows/test-migrations.yml` runs: - SQLite tests in `test-migrations-sqlite` job (fast, no external services) - PostgreSQL tests in `test-migrations-postgres` job (uses PostgreSQL service container) ### Running pipeline integration tests locally Pipeline full-flow integration tests validate real stage interactions: ```bash # Run pipeline integration tests (uses fake runner, no real LLM API) uv run pytest tests/integration/pipeline/test_full_flow.py -q --tb=short # Run with coverage for pipeline modules uv run pytest tests/integration/pipeline \ --cov=langbot.pkg.pipeline.preproc.preproc \ --cov=langbot.pkg.pipeline.process.process \ --cov=langbot.pkg.pipeline.respback.respback \ --cov-report=term -q ``` These tests: - Use `FakeRunner` class to simulate LLM responses without real API calls - Import real `PreProcessor`, `MessageProcessor`, `SendResponseBackStage` stages - Validate stage chain: PreProcessor → Processor → SendResponseBackStage - Test prevent_default, exception handling, and full message flow - Do not require real LLM provider keys ### Running backend E2E startup tests Backend E2E tests start a real LangBot process with a generated minimal `data/config.yaml`, SQLite database, local storage, and embedded Chroma path. They do not require provider keys or external services. ```bash uv run --python 3.12 pytest tests/e2e -q --tb=short ``` These tests verify startup orchestration, migrations, API route registration, and the minimal no-LLM startup path. The E2E process manager disables ambient proxy variables for subprocess startup and uses direct localhost HTTP clients, so local proxy settings should not affect the health checks. ### Running Box integration tests Box integration tests exercise the real sandbox runtime path, including command execution, session persistence, managed process WebSocket attachment, and cleanup behavior. ```bash uv run --python 3.12 pytest tests/integration_tests -q --tb=short ``` These tests require a working Docker or Podman runtime. In CI, the dedicated Box integration job checks Docker availability before running the tests. ### Running frontend E2E tests Frontend E2E tests live in `web/tests/e2e` and use Playwright. They start Vite and mock the LangBot backend and Space APIs, so no backend process is required. ```bash cd web pnpm test:e2e ``` ### Known Issues Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues: 1. Make sure you're running from the project root directory 2. Ensure dependencies are installed: `uv sync --dev` 3. Try running a simple test first to verify the test infrastructure works ## CI/CD Integration Tests are automatically run on: - Pull request opened - Pull request marked ready for review - Push to PR branch - Push to master/develop branches The workflow runs tests on Python 3.11, 3.12, and 3.13 to ensure compatibility. Startup E2E and Box integration tests run as separate Python 3.12 jobs because they exercise process/container behavior instead of pure Python compatibility. Frontend Playwright smoke tests run in `.github/workflows/frontend-tests.yml`. ## Adding New Tests ### 1. For a new pipeline stage Create a new test file `test_.py`: ```python """ stage unit tests """ import pytest from langbot.pkg.pipeline.. import from langbot.pkg.pipeline import entities as pipeline_entities @pytest.mark.asyncio async def test_stage_basic_flow(mock_app, sample_query): """Test basic flow""" stage = (mock_app) await stage.initialize({}) result = await stage.process(sample_query, '') assert result.result_type == pipeline_entities.ResultType.CONTINUE ``` ### 2. For additional fixtures Add new fixtures to the appropriate `conftest.py`: ```python @pytest.fixture def my_custom_fixture(): """Description of fixture""" return create_test_data() ``` ### 3. For test data Use the helper functions in `conftest.py`: ```python from tests.unit_tests.pipeline.conftest import create_stage_result, assert_result_continue result = create_stage_result( result_type=pipeline_entities.ResultType.CONTINUE, query=sample_query ) assert_result_continue(result) ``` ## Best Practices 1. **Test naming**: Use descriptive names that explain what's being tested 2. **Arrange-Act-Assert**: Structure tests clearly with setup, execution, and verification 3. **One assertion per test**: Focus each test on a single behavior 4. **Mock appropriately**: Mock external dependencies, not the code under test 5. **Use fixtures**: Reuse common test data through fixtures 6. **Document tests**: Add docstrings explaining what each test validates ## Troubleshooting ### Import errors Make sure you've installed the package in development mode: ```bash uv sync --dev ``` ### Async test failures Ensure you're using `@pytest.mark.asyncio` decorator for async tests. ### Mock not working Check that you're mocking at the right level and using `AsyncMock` for async functions. ## Future Enhancements - [x] Add integration tests for database migrations (SQLite) - [x] Add PostgreSQL migration integration tests (G-003) - [x] Add integration tests for full pipeline execution - [x] Add API smoke integration tests - [ ] Add E2E tests - [ ] Add performance benchmarks - [ ] Add mutation testing for better coverage quality - [ ] Add property-based testing with Hypothesis