mirror of https://github.com/langbot-app/LangBot.git synced 2026-07-24 21:36:06 +00:00

Files

T

Junyan Chin 8e558ad3a1 Feat/saas sandbox adaptation (#2234 )

* fix(box): trust Box-reported skill paths when filesystem is not shared

In separated deployments (Docker Compose, k8s sidecar, --standalone-box,
remote runtime.endpoint) the Box runtime owns its own filesystem, so the
skill package_root it reports via list_skills is not resolvable on the
LangBot side. LangBot's reload_skills and build_skill_extra_mounts
validated those paths with os.path.isdir() against its own filesystem,
which silently dropped every skill in such deployments — breaking the
sandbox skill feature for the nsjail/SaaS backend.

Add BoxService.shares_filesystem_with_box, derived from the connector
transport (stdio = shared, WebSocket = separated), with an explicit
override seam for tests/embedders. Gate both isdir() guards on it: keep
local validation in shared-fs stdio mode, trust Box-reported paths
otherwise. The Box runtime only reports skills found on its own
filesystem, so those paths are valid there by construction.

Adds topology-derivation tests (real connector, no mocks) and
skill-retention tests for both shared and separated filesystems.

* build(docker): ship a self-contained nsjail sandbox backend in the image

Compile nsjail 3.6 from source in a dedicated multi-stage build and carry
only the binary plus its runtime libs (libprotobuf32, libnl-route-3-200)
into the final image. This lets the Box runtime isolate sandboxed code via
nsjail user/mount/pid/net namespaces without a host Docker socket — the
prerequisite for running Box on LangBot Cloud (k8s), where mounting
docker.sock would grant node root and is not acceptable for multi-tenant.

The build toolchain (build-essential/bison/flex/protobuf-dev/libnl-dev)
stays in the nsjail-build stage and is not present in the shipped image.

Verified: image builds (583MB), nsjail --help exits 0, libraries resolve,
and the real NsjailBackend executes an isolated command end-to-end on a
v6.1/cgroup2 host matching LangBot Cloud prod (rlimit fallback path, since
container /sys/fs/cgroup is read-only; PID-namespace isolation confirmed).

* feat(box): SaaS guard to force a single global sandbox scope

Add system.limitation.force_box_session_id_template: when non-empty it
overrides every pipeline's box-session-id-template at resolve time, pinning
all queries to one shared sandbox (e.g. {global}). This is the authoritative,
unbypassable guard — it runs on every exec call, so editing the pipeline
config via API cannot escape it. The web UI locks the Sandbox Scope selector
via a combined box_scope_editable flag (box available AND not forced).

* build(deps): pin langbot-plugin==0.4.2b1 (nsjail cgroup container-safety beta)

* fix(web): show forced sandbox scope + make disabled tooltip tap-friendly

When a SaaS deployment pins every pipeline to a fixed sandbox scope via
system.limitation.force_box_session_id_template, the Sandbox Scope selector was
correctly locked but still displayed the pipeline's stored value (e.g. the
per-chat default), misrepresenting the scope that the runtime actually enforces
on every exec. Coerce the displayed/saved value to the forced template so the
locked selector truthfully shows the active scope (e.g. Global).

Also fix the disabled_tooltip being invisible on touch devices: hover-only Radix
tooltips never open without a pointer, so the explanation of why the field is
locked could not be read on mobile. Wrap the info icon so a tap toggles the
tooltip while desktop hover still works.

* feat(web): hide sidebar new-version prompt for edition=cloud

Cloud instances are upgraded centrally by the operator, so surfacing a GitHub
'new version available' badge to tenants is misleading and actionable only by
the operator. Skip the release check entirely when edition=cloud.

* style(web): prettier formatting for DisabledTooltipIcon ternary

* chore(deps): bump langbot-plugin to 0.4.2b2

Picks up the SDK fix that creates a read-write host_path before the
nsjail bind-mount, fixing the SaaS MCP shared-workspace sandbox failure
(exec exit 255 with empty output when host_path didn't exist).

* chore(deps): bump langbot-plugin to 0.4.2b3

Picks up the nsjail /dev-node fix so stdio MCP servers (uvx-launched) can
start under force_global_sandbox instead of failing with 'Connection closed
/ please check URL'.

* fix(web): show real MCP runtime status on installed extensions list

The installed-extensions list badge keyed solely off the enable flag, so a
server that was still CONNECTING (or in ERROR) was shown as 'Connected'.
Reflect the actual runtime_info.status (connecting/connected/error/disabled)
with matching colors, and poll quietly every 3s while any MCP server is
connecting so the badge transitions without a manual refresh.

* chore(deps): bump langbot-plugin to 0.4.2b4

Picks up the 30s start_managed_process timeout so cold uvx MCP bootstraps
don't get torn down mid-install.

* style(web): satisfy prettier — parenthesize nullish-coalescing in ternary

* fix(mcp): isolate transient test sessions from the shared Box session

A config-page 'test' (server_name='_', no persisted UUID) ran in the same
shared 'mcp-shared' Box session as live MCP servers. A failing test (e.g.
empty args) churned that shared session and tore down healthy, already-
connected servers — leaving them stuck after exhausting their retries.

Mark UUID-less sessions as transient, give them their own isolated Box
session ('mcp-test-<uuid>'), and fully delete that session on cleanup so
tests can never disturb live servers and don't leak sessions.

* fix(mcp): tear down transient test session after test completes

A successful config-page test left its isolated 'mcp-test-<uuid>' Box
session running (the lifecycle task blocks until shutdown). Wrap the
transient test coroutine so it always shuts the session down afterward,
preventing isolated test sessions from leaking.

2026-06-09 19:30:17 +08:00

e2e

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

factories

Feat/sandbox (#2072 )

2026-06-03 11:12:39 +08:00

integration

fix(ci): bump migration head assertion to 0004, apply prettier

2026-06-06 03:56:14 -04:00

integration_tests

Feat/sandbox (#2072 )

2026-06-03 11:12:39 +08:00

smoke

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

unit_tests

Feat/saas sandbox adaptation (#2234 )

2026-06-09 19:30:17 +08:00

utils

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

__init__.py

feat: add comprehensive unit tests for pipeline stages (#1701 )

2025-10-01 10:56:59 +08:00

README.md

Feat/test build (#2174 )

2026-05-16 12:05:54 +08:00

test_cwe94_debug_exec.py

fix: remove /debug/exec endpoint that allows authenticated RCE via exec() (#2178 )

2026-05-19 00:53:39 +08:00

README.md

LangBot Test Suite

This directory contains the test suite for LangBot, with a focus on comprehensive unit testing of pipeline stages.

Quality Gate Layers

LangBot uses a layered quality gate system for developers and CI:

Layer	Command	What it runs	When to use
Quick	`make test-quick` or `bash scripts/test-quick.sh`	Ruff lint + Unit tests + Smoke tests	Before every commit
Fast Integration	`make test-integration-fast` or `bash scripts/test-integration-fast.sh`	SQLite/API/Pipeline integration (no external services)	Before PR, weekly
Coverage Gate	`make test-coverage` or `bash scripts/test-coverage.sh`	All tests with coverage, threshold: 18%	Before merge, CI
Full Local	`make test-all-local`	Quick + Integration + Coverage	Before major changes

Note: PostgreSQL migration tests and slow tests are NOT in local default gates. They run in separate CI workflows.

Developer Workflow

# Daily: Quick self-test
bash scripts/test-quick.sh

# Before PR: Full local gate
make test-all-local

# Or run each layer separately:
bash scripts/test-quick.sh           # ~2 min
bash scripts/test-integration-fast.sh # ~3 min
bash scripts/test-coverage.sh         # ~8 min

Coverage Baseline

Current coverage threshold: 18% Actual coverage: 30%

This is a conservative baseline to prevent coverage regression. It does NOT represent the final quality target. Key modules have higher coverage:

pipeline.preproc.preproc: 53%
pipeline.process.process: 96%
pipeline.respback.respback: 88%
telemetry.telemetry: 87%
provider.session.sessionmgr: 100%
provider.tools.toolmgr: 83%
storage.providers.s3storage: 80%

Important Note

Due to circular import dependencies in the pipeline module structure, the test files use lazy imports via importlib.import_module() instead of direct imports. This ensures tests can run without triggering circular import errors.

Structure

tests/
├── __init__.py
├── factories/                    # Shared test factories
│   ├── __init__.py              # Factory exports
│   ├── app.py                   # FakeApp factory
│   ├── message.py               # Message/query factories
│   ├── provider.py              # FakeProvider factory
│   └── platform.py              # FakePlatform factory
├── integration/                  # Integration tests (real resources)
│   ├── __init__.py
│   ├── api/                     # HTTP API tests
│   │   ├── __init__.py
│   │   └── test_smoke.py        # API smoke tests
│   ├── pipeline/                # Pipeline stage-chain tests
│   │   ├── __init__.py
│   │   └── test_full_flow.py    # Full flow integration
│   └── persistence/             # Database/persistence tests
│       ├── __init__.py
│       └── test_migrations.py   # Alembic migration tests
├── smoke/                        # Smoke tests (quick validation)
│   └── test_fake_message_flow.py
├── unit_tests/                   # Unit tests
│   ├── box/                      # Box module tests
│   ├── config/                   # Configuration tests
│   ├── pipeline/                 # Pipeline stage tests
│   │   └── conftest.py          # Shared fixtures and test infrastructure
│   ├── platform/                 # Platform adapter tests
│   ├── plugin/                   # Plugin system tests
│   │   └── test_handler_actions.py # Action handler tests
│   ├── provider/                 # Provider tests
│   │   ├── test_session_manager.py # SessionManager tests
│   │   └── test_tool_manager.py    # ToolManager tests
│   ├── rag/                      # RAG tests
│   │   └── test_file_storage.py   # File/ZIP storage tests
│   ├── storage/                  # Storage tests
│   │   └── test_s3storage.py      # S3StorageProvider tests
│   ├── vector/                   # Vector tests
│   │   └── test_vdb_filter_conversion.py # VDB filter tests
│   └── telemetry/                # Telemetry tests (rewritten)
├── utils/                        # Test utilities
│   ├── __init__.py
│   └── import_isolation.py      # sys.modules isolation for circular imports
└── README.md                     # This file

Test Factories

The tests/factories/ package provides reusable test factories:

from tests.factories import (
    FakeApp,          # Mock application
    FakeProvider,     # Fake LLM provider
    FakePlatform,     # Fake platform adapter
    text_query,       # Create text query
    group_text_query, # Create group query
    command_query,    # Create command query
)

# Create fake app
app = FakeApp()

# Create query with text
query = text_query("hello world")

# Create fake provider that returns specific response
provider = FakeProvider().returns("test response")

# Create fake platform for outbound capture
platform = FakePlatform()
await platform.reply_message(query.message_event, reply_chain)
outbound = platform.get_outbound_messages()

See tests/factories/__init__.py for all available factories.

Test Architecture

Fixtures (`conftest.py`)

The test suite uses a centralized fixture system that provides:

MockApplication: Comprehensive mock of the Application object with all dependencies
Mock objects: Pre-configured mocks for Session, Conversation, Model, Adapter
Sample data: Ready-to-use Query objects, message chains, and configurations
Helper functions: Utilities for creating results and common assertions

Design Principles

Isolation: Each test is independent and doesn't rely on external systems
Mocking: All external dependencies are mocked to ensure fast, reliable tests
Coverage: Tests cover happy paths, edge cases, and error conditions
Extensibility: Easy to add new tests by reusing existing fixtures

Running Tests

Quick self-test for developers

For local branch validation without real provider keys:

make test-quick

bash scripts/test-quick.sh

This runs:

Ruff lint check
Unit tests
Smoke tests

Suitable for quick validation before committing.

Using the test runner script (recommended for full coverage)

bash run_tests.sh

This script automatically:

Activates the virtual environment
Installs test dependencies if needed
Runs tests with coverage
Generates HTML coverage report

Manual test execution

Run all unit tests

uv run pytest tests/unit_tests/ --cov=langbot --cov-report=xml --cov-report=term

Run specific test module

uv run pytest tests/unit_tests/pipeline/ -v

Run specific test file

uv run pytest tests/unit_tests/pipeline/test_bansess.py -v

Run with coverage

uv run pytest tests/unit_tests/pipeline/ --cov=langbot --cov-report=html

Run specific test

uv run pytest tests/unit_tests/pipeline/test_bansess.py::test_bansess_whitelist_allow -v

Using markers

# Run only unit tests
uv run pytest tests/unit_tests/ -m unit

# Run only integration tests
uv run pytest tests/integration/ -m integration

# Run integration tests excluding slow ones
uv run pytest tests/integration/ -m "not slow" -q

# Skip slow tests
uv run pytest tests/unit_tests/ -m "not slow"

Running integration tests

Integration tests validate real system behavior with actual database/network resources.

# Run all integration tests (excluding slow ones)
uv run pytest tests/integration/ -m "not slow" -q

# Run SQLite migration integration tests
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short

# Run API smoke integration tests
uv run pytest tests/integration/api/test_smoke.py -q

# Run pipeline full-flow integration tests
uv run pytest tests/integration/pipeline/test_full_flow.py -q

# Run with verbose output
uv run pytest tests/integration/ -v

Note: Integration tests use:

Temporary databases (tmp_path) for persistence tests
Fake app/services for API tests (no real provider/platform)
Fake runner/provider for pipeline tests (no real LLM API)
Do not require external services

Running migration tests locally

SQLite migration tests can be run locally without any external dependencies:

# SQLite migration tests (uses tmp_path, no external DB needed)
uv run pytest tests/integration/persistence/test_migrations.py -q --tb=short

PostgreSQL migration tests require an external PostgreSQL database:

# PostgreSQL migration tests (requires PostgreSQL service)
# Tests are marked as slow and skipped if TEST_POSTGRES_URL is not set
TEST_POSTGRES_URL=postgresql+asyncpg://user:pass@localhost:5432/test_db \
    uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short

# Or skip by default (no PostgreSQL available)
uv run pytest tests/integration/persistence/test_migrations_postgres.py -q --tb=short
# Output: SKIPPED (TEST_POSTGRES_URL not set)

Note: PostgreSQL tests are not included in fast integration gate because they:

Require external PostgreSQL service
Are marked with @pytest.mark.slow
Need TEST_POSTGRES_URL environment variable

CI workflow .github/workflows/test-migrations.yml runs:

SQLite tests in test-migrations-sqlite job (fast, no external services)
PostgreSQL tests in test-migrations-postgres job (uses PostgreSQL service container)

Running pipeline integration tests locally

Pipeline full-flow integration tests validate real stage interactions:

# Run pipeline integration tests (uses fake runner, no real LLM API)
uv run pytest tests/integration/pipeline/test_full_flow.py -q --tb=short

# Run with coverage for pipeline modules
uv run pytest tests/integration/pipeline \
    --cov=langbot.pkg.pipeline.preproc.preproc \
    --cov=langbot.pkg.pipeline.process.process \
    --cov=langbot.pkg.pipeline.respback.respback \
    --cov-report=term -q

These tests:

Use FakeRunner class to simulate LLM responses without real API calls
Import real PreProcessor, MessageProcessor, SendResponseBackStage stages
Validate stage chain: PreProcessor → Processor → SendResponseBackStage
Test prevent_default, exception handling, and full message flow
Do not require real LLM provider keys

Known Issues

Some tests may encounter circular import errors. This is a known issue with the current module structure. The test infrastructure is designed to work around this using lazy imports, but if you encounter issues:

Make sure you're running from the project root directory
Ensure dependencies are installed: uv sync --dev
Try running a simple test first to verify the test infrastructure works

CI/CD Integration

Tests are automatically run on:

Pull request opened
Pull request marked ready for review
Push to PR branch
Push to master/develop branches

The workflow runs tests on Python 3.11, 3.12, and 3.13 to ensure compatibility.

Adding New Tests

1. For a new pipeline stage

Create a new test file test_<stage_name>.py:

"""
<StageName> stage unit tests
"""

import pytest
from langbot.pkg.pipeline.<module>.<stage> import <StageClass>
from langbot.pkg.pipeline import entities as pipeline_entities


@pytest.mark.asyncio
async def test_stage_basic_flow(mock_app, sample_query):
    """Test basic flow"""
    stage = <StageClass>(mock_app)
    await stage.initialize({})

    result = await stage.process(sample_query, '<StageName>')

    assert result.result_type == pipeline_entities.ResultType.CONTINUE

2. For additional fixtures

Add new fixtures to the appropriate conftest.py:

@pytest.fixture
def my_custom_fixture():
    """Description of fixture"""
    return create_test_data()

3. For test data

Use the helper functions in conftest.py:

from tests.unit_tests.pipeline.conftest import create_stage_result, assert_result_continue

result = create_stage_result(
    result_type=pipeline_entities.ResultType.CONTINUE,
    query=sample_query
)

assert_result_continue(result)

Best Practices

Test naming: Use descriptive names that explain what's being tested
Arrange-Act-Assert: Structure tests clearly with setup, execution, and verification
One assertion per test: Focus each test on a single behavior
Mock appropriately: Mock external dependencies, not the code under test
Use fixtures: Reuse common test data through fixtures
Document tests: Add docstrings explaining what each test validates

Troubleshooting

Import errors

Make sure you've installed the package in development mode:

uv sync --dev

Async test failures

Ensure you're using @pytest.mark.asyncio decorator for async tests.

Mock not working

Check that you're mocking at the right level and using AsyncMock for async functions.

Future Enhancements

Add integration tests for database migrations (SQLite)
Add PostgreSQL migration integration tests (G-003)
Add integration tests for full pipeline execution
Add API smoke integration tests
Add E2E tests
Add performance benchmarks
Add mutation testing for better coverage quality
Add property-based testing with Hypothesis

README.md

LangBot Test Suite

Quality Gate Layers

Developer Workflow

Coverage Baseline

Important Note

Structure

Test Factories

Test Architecture

Fixtures (conftest.py)

Design Principles

Running Tests

Quick self-test for developers

Using the test runner script (recommended for full coverage)

Manual test execution

Run all unit tests

Run specific test module

Run specific test file

Run with coverage

Run specific test

Using markers

Running integration tests

Running migration tests locally

Running pipeline integration tests locally

Known Issues

CI/CD Integration

Adding New Tests

1. For a new pipeline stage

2. For additional fixtures

3. For test data

Best Practices

Troubleshooting

Import errors

Async test failures

Mock not working

Future Enhancements

Fixtures (`conftest.py`)