Files
LangBot/tests
youhuanghe e8aa7b2e6d feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security
## Summary

  When Podman/Docker is available, all stdio-mode MCP servers now automatically
  run inside Box containers with dependency installation, path rewriting, and
  lifecycle management. When no container runtime exists, LangBot starts normally
  and stdio MCP falls back to host-direct execution.

  ## What changed

  ### MCP stdio → Box integration (mcp.py)
  - Add `MCPServerBoxConfig` pydantic model for structured box configuration
    with validation and defaults (network, host_path_mode, timeouts, resources)
  - Auto-infer `host_path` from command/args with venv detection: recognizes
    `.venv/bin/python` patterns and walks up to the project root
  - Rewrite host paths to container `/workspace` paths transparently
  - Replace venv python commands with container-native `python`
  - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
    `pip install` inside the container before starting the MCP server
  - Copy project to `/tmp` before install to handle read-only mounts
  - Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
  - Add Box managed process health monitoring (poll every 5s)
  - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
    block of `_lifecycle_loop`, covering all exit paths
  - Fix retry logic: `_ready_event` is only set after all retries exhaust
    or on success, not on first failure
  - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`

  ### Box security (security.py — new)
  - `validate_sandbox_security()` blocks dangerous host paths:
    `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
    docker.sock, podman socket
  - Called at the start of `CLISandboxBackend.start_session()`

  ### Box models (models.py)
  - Add `BoxHostMountMode.NONE` — skips volume mount entirely
  - Adjust `validate_host_mount_consistency` to allow arbitrary workdir
    when `host_path_mode=NONE`

  ### Box backend (backend.py)
  - Add `validate_sandbox_security()` call in `start_session()`
  - Add `langbot.box.config_hash` label on containers for drift detection
  - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
  - Add `cleanup_orphaned_containers()` to base class (no-op default) and
    CLI implementation (single batched `rm -f` command)

  ### Box runtime (runtime.py)
  - Call `cleanup_orphaned_containers()` during `initialize()` to remove
    lingering containers from previous runs

  ### Box service (service.py)
  - Graceful degradation: `initialize()` catches runtime errors and sets
    `available=False` instead of crashing LangBot startup
  - Add `available` property and guard on `execute_sandbox_tool()`
  - Add `skip_host_mount_validation` parameter to `build_spec()` and
    `create_session()` — MCP paths are admin-configured and trusted,
    bypassing `allowed_host_mount_roots` restrictions meant for
    LLM-generated sandbox_exec commands

  ### Default behavior
  - stdio MCP servers automatically use Box when `box_service.available`
    is True (Podman/Docker detected); no explicit `box` config needed
  - When no container runtime exists, falls back to host-direct stdio
  - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
    (for site-packages), `host_path_mode=ro`, `startup_timeout=120s`

  ### Tests
  - `test_box_security.py`: blocked paths, safe paths, subpath rejection
  - `test_mcp_box_integration.py`: config model, path rewriting, venv
    unwrap, host_path inference, payload building, runtime info, box
    availability check
  - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
2026-05-04 21:23:23 +08:00
..

LangBot Test Suite

This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.

Important Note

Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.

Structure

tests/
├── pipeline/                      # Pipeline stage tests
│   ├── conftest.py               # Shared fixtures and test infrastructure
│   ├── test_simple.py            # Basic infrastructure tests (always pass)
│   ├── test_bansess.py           # BanSessionCheckStage tests
│   ├── test_ratelimit.py         # RateLimit stage tests
│   ├── test_preproc.py           # PreProcessor stage tests
│   ├── test_respback.py          # SendResponseBackStage tests
│   ├── test_resprule.py          # GroupRespondRuleCheckStage tests
│   ├── test_pipelinemgr.py       # PipelineManager tests
│   └── test_stages_integration.py # Integration tests
└── README.md                      # This file

Test Architecture

Fixtures (conftest.py)

The test suite uses a centralized fixture system that provides:

  • MockApplication: Comprehensive mock of the Application object with all dependencies
  • Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
  • Sample data: Ready-to-use Query objects, message chains, and configurations
  • Helper functions: Utilities for creating results and common assertions

Design Principles

  1. Isolation: Each test is independent and doesn't rely on external systems
  2. Mocking: All external dependencies are mocked to ensure fast, reliable tests
  3. Coverage: Tests cover happy paths, edge cases, and error conditions
  4. Extensibility: Easy to add new tests by reusing existing fixtures

Running Tests

bash run_tests.sh

This script automatically:

  • Activates the virtual environment
  • Installs test dependencies if needed
  • Runs tests with coverage
  • Generates HTML coverage report

Manual test execution

Run all tests

pytest tests/pipeline/

Run only simple tests (no imports, always pass)

pytest tests/pipeline/test_simple.py -v

Run specific test file

pytest tests/pipeline/test_bansess.py -v

Run with coverage

pytest tests/pipeline/ --cov=pkg/pipeline --cov-report=html

Run specific test

pytest tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v

Known Issues

Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:

  1. Make sure you're running from the project root directory
  2. Ensure the virtual environment is activated
  3. Try running test_simple.py first to verify the test infrastructure works

CI/CD Integration

Tests are automatically run on:

  • Pull request opened
  • Pull request marked ready for review
  • Push to PR branch
  • Push to master/develop branches

The workflow runs tests on Python 3.10, 3.11, and 3.12 to ensure compatibility.

Adding New Tests

1. For a new pipeline stage

Create a new test file test_<stage_name>.py:

"""
<StageName> stage unit tests
"""

import pytest
from pkg.pipeline.<module>.<stage> import <StageClass>
from pkg.pipeline import entities as pipeline_entities


@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
    """Test basic flow"""
    stage = <StageClass>(mock_app)
    await stage.initialize({})

    result = await stage.process(sample_query, '<StageName>')

    assert result.result_type == pipeline_entities.ResultType.CONTINUE

2. For additional fixtures

Add new fixtures to conftest.py:

@pytest.fixture
def my_custom_fixture():
    """Description of fixture"""
    return create_test_data()

3. For test data

Use the helper functions in conftest.py:

from tests.pipeline.conftest import create_stage_result, assert_result_continue

result = create_stage_result(
    result_type=pipeline_entities.ResultType.CONTINUE,
    query=sample_query
)

assert_result_continue(result)

Best Practices

  1. Test naming: Use descriptive names that explain what's being tested
  2. Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
  3. One assertion per test: Focus each test on a single behavior
  4. Mock appropriately: Mock external dependencies, not the code under test
  5. Use fixtures: Reuse common test data through fixtures
  6. Document tests: Add docstrings explaining what each test validates

Troubleshooting

Import errors

Make sure you've installed the package in development mode:

uv pip install -e .

Async test failures

Ensure you're using @pytest.mark.asyncio decorator for async tests.

Mock not working

Check that you're mocking at the right level and using AsyncMock for async functions.

Future Enhancements

  • Add integration tests for full pipeline execution
  • Add performance benchmarks
  • Add mutation testing for better coverage quality
  • Add property-based testing with Hypothesis