mirror of https://github.com/langbot-app/LangBot.git synced 2026-06-03 04:24:36 +00:00

Files

youhuanghe e8aa7b2e6d feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security

## Summary

  When Podman/Docker is available, all stdio-mode MCP servers now automatically
  run inside Box containers with dependency installation, path rewriting, and
  lifecycle management. When no container runtime exists, LangBot starts normally
  and stdio MCP falls back to host-direct execution.

  ## What changed

  ### MCP stdio → Box integration (mcp.py)
  - Add `MCPServerBoxConfig` pydantic model for structured box configuration
    with validation and defaults (network, host_path_mode, timeouts, resources)
  - Auto-infer `host_path` from command/args with venv detection: recognizes
    `.venv/bin/python` patterns and walks up to the project root
  - Rewrite host paths to container `/workspace` paths transparently
  - Replace venv python commands with container-native `python`
  - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
    `pip install` inside the container before starting the MCP server
  - Copy project to `/tmp` before install to handle read-only mounts
  - Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
  - Add Box managed process health monitoring (poll every 5s)
  - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
    block of `_lifecycle_loop`, covering all exit paths
  - Fix retry logic: `_ready_event` is only set after all retries exhaust
    or on success, not on first failure
  - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`

  ### Box security (security.py — new)
  - `validate_sandbox_security()` blocks dangerous host paths:
    `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
    docker.sock, podman socket
  - Called at the start of `CLISandboxBackend.start_session()`

  ### Box models (models.py)
  - Add `BoxHostMountMode.NONE` — skips volume mount entirely
  - Adjust `validate_host_mount_consistency` to allow arbitrary workdir
    when `host_path_mode=NONE`

  ### Box backend (backend.py)
  - Add `validate_sandbox_security()` call in `start_session()`
  - Add `langbot.box.config_hash` label on containers for drift detection
  - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
  - Add `cleanup_orphaned_containers()` to base class (no-op default) and
    CLI implementation (single batched `rm -f` command)

  ### Box runtime (runtime.py)
  - Call `cleanup_orphaned_containers()` during `initialize()` to remove
    lingering containers from previous runs

  ### Box service (service.py)
  - Graceful degradation: `initialize()` catches runtime errors and sets
    `available=False` instead of crashing LangBot startup
  - Add `available` property and guard on `execute_sandbox_tool()`
  - Add `skip_host_mount_validation` parameter to `build_spec()` and
    `create_session()` — MCP paths are admin-configured and trusted,
    bypassing `allowed_host_mount_roots` restrictions meant for
    LLM-generated sandbox_exec commands

  ### Default behavior
  - stdio MCP servers automatically use Box when `box_service.available`
    is True (Podman/Docker detected); no explicit `box` config needed
  - When no container runtime exists, falls back to host-direct stdio
  - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
    (for site-packages), `host_path_mode=ro`, `startup_timeout=120s`

  ### Tests
  - `test_box_security.py`: blocked paths, safe paths, subpath rejection
  - `test_mcp_box_integration.py`: config model, path rewriting, venv
    unwrap, host_path inference, payload building, runtime info, box
    availability check
  - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests

2026-05-04 21:23:23 +08:00

integration_tests

fix: fix box intergration test

2026-05-04 21:23:23 +08:00

unit_tests

feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security

2026-05-04 21:23:23 +08:00

__init__.py

feat: add comprehensive unit tests for pipeline stages (#1701 )

2025-10-01 10:56:59 +08:00

README.md

feat: add comprehensive unit tests for pipeline stages (#1701 )

2025-10-01 10:56:59 +08:00

README.md

LangBot Test Suite

This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.

Important Note

Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.

Structure

tests/
├── pipeline/                      # Pipeline stage tests
│   ├── conftest.py               # Shared fixtures and test infrastructure
│   ├── test_simple.py            # Basic infrastructure tests (always pass)
│   ├── test_bansess.py           # BanSessionCheckStage tests
│   ├── test_ratelimit.py         # RateLimit stage tests
│   ├── test_preproc.py           # PreProcessor stage tests
│   ├── test_respback.py          # SendResponseBackStage tests
│   ├── test_resprule.py          # GroupRespondRuleCheckStage tests
│   ├── test_pipelinemgr.py       # PipelineManager tests
│   └── test_stages_integration.py # Integration tests
└── README.md                      # This file

Test Architecture

Fixtures (`conftest.py`)

The test suite uses a centralized fixture system that provides:

MockApplication: Comprehensive mock of the Application object with all dependencies
Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
Sample data: Ready-to-use Query objects, message chains, and configurations
Helper functions: Utilities for creating results and common assertions

Design Principles

Isolation: Each test is independent and doesn't rely on external systems
Mocking: All external dependencies are mocked to ensure fast, reliable tests
Coverage: Tests cover happy paths, edge cases, and error conditions
Extensibility: Easy to add new tests by reusing existing fixtures

Running Tests

Using the test runner script (recommended)

bash run_tests.sh

This script automatically:

Activates the virtual environment
Installs test dependencies if needed
Runs tests with coverage
Generates HTML coverage report

Manual test execution

Run all tests

pytest tests/pipeline/

Run only simple tests (no imports, always pass)

pytest tests/pipeline/test_simple.py -v

Run specific test file

pytest tests/pipeline/test_bansess.py -v

Run with coverage

pytest tests/pipeline/ --cov=pkg/pipeline --cov-report=html

Run specific test

pytest tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v

Known Issues

Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:

Make sure you're running from the project root directory
Ensure the virtual environment is activated
Try running test_simple.py first to verify the test infrastructure works

CI/CD Integration

Tests are automatically run on:

Pull request opened
Pull request marked ready for review
Push to PR branch
Push to master/develop branches

The workflow runs tests on Python 3.10, 3.11, and 3.12 to ensure compatibility.

Adding New Tests

1. For a new pipeline stage

Create a new test file test_<stage_name>.py:

"""
<StageName> stage unit tests
"""

import pytest
from pkg.pipeline.<module>.<stage> import <StageClass>
from pkg.pipeline import entities as pipeline_entities


@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
    """Test basic flow"""
    stage = <StageClass>(mock_app)
    await stage.initialize({})

    result = await stage.process(sample_query, '<StageName>')

    assert result.result_type == pipeline_entities.ResultType.CONTINUE

2. For additional fixtures

Add new fixtures to conftest.py:

@pytest.fixture
def my_custom_fixture():
    """Description of fixture"""
    return create_test_data()

3. For test data

Use the helper functions in conftest.py:

from tests.pipeline.conftest import create_stage_result, assert_result_continue

result = create_stage_result(
    result_type=pipeline_entities.ResultType.CONTINUE,
    query=sample_query
)

assert_result_continue(result)

Best Practices

Test naming: Use descriptive names that explain what's being tested
Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
One assertion per test: Focus each test on a single behavior
Mock appropriately: Mock external dependencies, not the code under test
Use fixtures: Reuse common test data through fixtures
Document tests: Add docstrings explaining what each test validates

Troubleshooting

Import errors

Make sure you've installed the package in development mode:

uv pip install -e .

Async test failures

Ensure you're using @pytest.mark.asyncio decorator for async tests.

Mock not working

Check that you're mocking at the right level and using AsyncMock for async functions.

Future Enhancements

Add integration tests for full pipeline execution
Add performance benchmarks
Add mutation testing for better coverage quality
Add property-based testing with Hypothesis

README.md

LangBot Test Suite

Important Note

Structure

Test Architecture

Fixtures (conftest.py)

Design Principles

Running Tests

Using the test runner script (recommended)

Manual test execution

Run all tests

Run only simple tests (no imports, always pass)

Run specific test file

Run with coverage

Run specific test

Known Issues

CI/CD Integration

Adding New Tests

1. For a new pipeline stage

2. For additional fixtures

3. For test data

Best Practices

Troubleshooting

Import errors

Async test failures

Mock not working

Future Enhancements

Fixtures (`conftest.py`)