## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
LangBot Test Suite
This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.
Important Note
Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.
Structure
tests/
├── pipeline/ # Pipeline stage tests
│ ├── conftest.py # Shared fixtures and test infrastructure
│ ├── test_simple.py # Basic infrastructure tests (always pass)
│ ├── test_bansess.py # BanSessionCheckStage tests
│ ├── test_ratelimit.py # RateLimit stage tests
│ ├── test_preproc.py # PreProcessor stage tests
│ ├── test_respback.py # SendResponseBackStage tests
│ ├── test_resprule.py # GroupRespondRuleCheckStage tests
│ ├── test_pipelinemgr.py # PipelineManager tests
│ └── test_stages_integration.py # Integration tests
└── README.md # This file
Test Architecture
Fixtures (conftest.py)
The test suite uses a centralized fixture system that provides:
- MockApplication: Comprehensive mock of the Application object with all dependencies
- Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
- Sample data: Ready-to-use Query objects, message chains, and configurations
- Helper functions: Utilities for creating results and common assertions
Design Principles
- Isolation: Each test is independent and doesn't rely on external systems
- Mocking: All external dependencies are mocked to ensure fast, reliable tests
- Coverage: Tests cover happy paths, edge cases, and error conditions
- Extensibility: Easy to add new tests by reusing existing fixtures
Running Tests
Using the test runner script (recommended)
bash run_tests.sh
This script automatically:
- Activates the virtual environment
- Installs test dependencies if needed
- Runs tests with coverage
- Generates HTML coverage report
Manual test execution
Run all tests
pytest tests/pipeline/
Run only simple tests (no imports, always pass)
pytest tests/pipeline/test_simple.py -v
Run specific test file
pytest tests/pipeline/test_bansess.py -v
Run with coverage
pytest tests/pipeline/ --cov=pkg/pipeline --cov-report=html
Run specific test
pytest tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v
Known Issues
Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:
- Make sure you're running from the project root directory
- Ensure the virtual environment is activated
- Try running
test_simple.pyfirst to verify the test infrastructure works
CI/CD Integration
Tests are automatically run on:
- Pull request opened
- Pull request marked ready for review
- Push to PR branch
- Push to master/develop branches
The workflow runs tests on Python 3.10, 3.11, and 3.12 to ensure compatibility.
Adding New Tests
1. For a new pipeline stage
Create a new test file test_<stage_name>.py:
"""
<StageName> stage unit tests
"""
import pytest
from pkg.pipeline.<module>.<stage> import <StageClass>
from pkg.pipeline import entities as pipeline_entities
@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
"""Test basic flow"""
stage = <StageClass>(mock_app)
await stage.initialize({})
result = await stage.process(sample_query, '<StageName>')
assert result.result_type == pipeline_entities.ResultType.CONTINUE
2. For additional fixtures
Add new fixtures to conftest.py:
@pytest.fixture
def my_custom_fixture():
"""Description of fixture"""
return create_test_data()
3. For test data
Use the helper functions in conftest.py:
from tests.pipeline.conftest import create_stage_result, assert_result_continue
result = create_stage_result(
result_type=pipeline_entities.ResultType.CONTINUE,
query=sample_query
)
assert_result_continue(result)
Best Practices
- Test naming: Use descriptive names that explain what's being tested
- Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
- One assertion per test: Focus each test on a single behavior
- Mock appropriately: Mock external dependencies, not the code under test
- Use fixtures: Reuse common test data through fixtures
- Document tests: Add docstrings explaining what each test validates
Troubleshooting
Import errors
Make sure you've installed the package in development mode:
uv pip install -e .
Async test failures
Ensure you're using @pytest.mark.asyncio decorator for async tests.
Mock not working
Check that you're mocking at the right level and using AsyncMock for async functions.
Future Enhancements
- Add integration tests for full pipeline execution
- Add performance benchmarks
- Add mutation testing for better coverage quality
- Add property-based testing with Hypothesis