Feat/sandbox (#2072)

* feat: add mcp and skills * feat: add filter * feat: modify frontend * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add test * fix: fix box intergration test * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * fix: ruff * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths * fix: ruff * feat: enhance sandbox api * refactor(box): derive paths from shared host root * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * feat(box): unify native agent tools around exec/read/write/edit * chore(sandbox): move MCP loader changes to follow-up branch * feat(box): add session workspace quota enforcement and SDK quota metadata * feat(skills): add Agent Skills management system (#1917) * feat(skills): add Agent Skills management system Implement comprehensive skills management feature inspired by agentskills spec: Backend: - Add Skill and SkillPipelineBinding database entities - Add database migration (dbm018) for skills tables - Implement SkillManager for skill loading, matching, and resolution - Implement SkillService for CRUD operations - Add skills API endpoints for skill and pipeline binding management - Integrate skill index injection into pipeline preprocessor - Add skill activation detection in LocalAgentRunner Frontend: - Add Skills page with listing, search, and type filter - Add SkillDetailDialog for create/edit with preview - Add SkillCard and SkillForm components - Add skills API methods to BackendClient - Add skills entry to sidebar navigation - Add i18n translations (en-US, zh-Hans) Features: - Support skill and workflow types - Sub-skill composition via {{INVOKE_SKILL: name}} syntax - Progressive disclosure (index in prompt, full instructions on activation) - Pipeline-specific skill bindings with priority * fix: resolve cherry-pick conflicts for agentskills onto sandbox - Remove non-existent external_kb service import - Add skill_mgr mock to localagent sandbox_exec tests - Keep database version at 24 (sandbox branch's latest) * feat(skills): upgrade to package-backed skills with sandbox execution Evolve the skills system from pure prompt-based to package-backed with sandbox tool execution support: - Add source_type/package_root/entry_file/skill_tools fields to Skill entity - SkillManager loads SKILL.md from local package directories - SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped) - LocalAgent injects skill tools into use_funcs on skill activation - BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params) - Skill tool names auto-namespaced as skill__{skill}__{tool} - API validation for package_root allowlist and entry path traversal - Frontend source_type toggle, package_root input, skill_tools editor - Migration renumbered to 025 with ALTER TABLE fallback for existing DBs - Fix unclosed limitation section in i18n files - Fix skills API methods misplaced outside BackendClient class * fix: test info * feat(skills): switch skills to package-backed storage and add import tooling - skills 从 inline/package 双轨收敛成 package-first - instructions 改为写入并读取 SKILL.md - 新增本地目录扫描和 GitHub 安装 skill - 前端把 skills 整合进 plugins 页，新增 SkillsComponent 和 GitHub 导入弹窗 - skill form 去掉 source_type / type 筛选，改成目录扫描驱动 - Box skill tool 挂载模式从 ro 改成 rw - 测试和中英文文案同步更新 * feat: simplify langbot skill create and import * refactor(skills): clean up legacy skill API and harden activation flow * refactor(skills): remove skill dependency expansion and add skill_get * fix: lint * fix: delete * fix(skills): align tool manager loader initialization * refactor: remove sandbox execute skill * fix(skills): hide activation markers and isolate skill activation flow * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): unify runtime skill access around filesystem paths * refactor(skills): unify runtime skill access around filesystem paths * feat(skills): align rw package design and fix skill activation, visibility, and lint issues * refactor(skills): replace rich authoring API with import/reload flow and update Box design doc * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths * fix: ruff * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * chore(sandbox): move MCP loader changes to follow-up branch * refactor(plugins): simplify GitHub install flow to default master archive * revert(api): restore plugin GitHub import flow in plugins controller * Improve data-root handling and skill install previews * Add managed skill authoring tools for local agents * Refactor the skills UI around sidebar detail pages * Document why managed skill authoring tools bypass box * fix: lint * feat(web): refactor plugin/skill install flows and fix skills page - Fix sidebar skill icon - Add skills route and error page component - Refactor plugin GitHub install from dialog modal to inline card - Add skill install dropdown menu (create/upload/github) in sidebar - Wire sidebar → skills page communication via pendingSkillInstallAction context - Add i18n keys for error page and skill install actions * fix(web): persist sidebar collapsible section open state on navigation Sections opened via sub-item navigation now retain their expanded state when the user switches to a different section, instead of collapsing because the isActive fallback becomes false. --------- Co-authored-by: youhuanghe <1051233107@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com> * feat(sandbox): add MCP box integration on top of sandbox base (#2083) * refactor(mcp): extract box stdio runtime helper * refactor(box): introduce reusable workspace session helper * refactor(box): run Box Runtime as subprocess inside LangBot container Remove the separate langbot_box_runtime Docker service. Box Runtime now always launches as a local stdio subprocess, regardless of whether LangBot runs in Docker or not. The WebSocket transport path is kept only for explicit runtime_url configuration (remote deployment). This simplifies deployment by eliminating cross-container path mapping and network hops. Box Runtime is a pure scheduling process (talks to Docker socket / nsjail), it does not execute user code or touch the filesystem, so container isolation is unnecessary — unlike Plugin Runtime. * fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor When switching runner (e.g. local-agent → n8n), the newly mounted stage's first emit would re-capture the saved snapshot, erasing the dirty state caused by the runner change. The save button would incorrectly go dim. - Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty - Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent) - Use stable onSubmitRef to prevent useEffect subscription churn - Add previousInitialValues guard to prevent initialValues echo loops * style(web): align plugin list header button heights * docs(review): update Box architecture review documents Replace old review docs with 5 focused documents: - box-architecture.md: deep architecture analysis (LangBot + SDK) - box-issues.md: 22 issues rated P0/P1/P2 - box-test-coverage.md: test coverage analysis - box-tob-analysis.md: toB commercialization analysis - box-vs-plugin-runtime.md: Box vs Plugin runtime comparison * feat(web): improve login error layout and add Terms of Service link - Improve backend connection error display with bordered container, inline icon, and better visual hierarchy - Extract actual error message from axios response object - Add Terms of Service link (https://langbot.app/terms) to login footer - Add termsOfService i18n key for all 7 locales * refactor(web): replace all hardcoded SVG icons with lucide-react Unify icon usage across the entire frontend by replacing 67 hardcoded SVG icons with lucide-react components across ~25 files. This improves consistency, maintainability, and reduces bundle duplication. Key replacements: - Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc. - MCP forms: Loader2, XCircle, Trash2 - Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc. - Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc. - Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile Zero hardcoded <svg> tags remain in .tsx files. * fix(web): stop polling plugin tasks when no active installs The PluginInstallTaskProvider was unconditionally polling getAsyncTasks every 3s on all /home/* routes. Now it only syncs once on mount and starts periodic polling only when there are active (non-terminal) install tasks. * fix(deps): update langbot-plugin version and add new dependencies * refactor: use Space API for release checks and stop idle polling - version.py: switch release list API from GitHub to space.langbot.app, remove unused in-place update logic (update_all, compare_version_str), translate all comments/logs to English - PluginInstallTaskContext: only poll when active install tasks exist * feat(box): add --standalone-box flag and 3-way transport decision for Box runtime Align Box runtime connection logic with Plugin runtime's pattern: - Docker: WebSocket to langbot_box container (ws://langbot_box:5411) - --standalone-box: WebSocket to external Box process (ws://localhost:5411) - Windows: subprocess + WebSocket (workaround for async stdio limitation) - Unix/macOS: subprocess + stdio pipe (unchanged) BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess lifecycle reuse. Add langbot_box service to docker-compose.yaml. * refactor(box): use single port with path-based routing for Box WS Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411. Update review docs to reflect the single-port architecture. * feat(web): show Box runtime status in plugin debug info popover Add Box status section to the debug info popover on the plugin list page, displaying connection status, backend info, profile, active sessions, and recent error count. Fetched from GET /api/v1/box/status in parallel with plugin debug info. Includes i18n for all 8 supported languages. * fix(web): remove ephemeral sandbox count from Box status display The active_sessions count reflects transient sandbox containers that expire after 5 minutes of inactivity, making it misleading in the UI. Keep only connection status, backend, profile, and error count. * feat(box): configurable sandbox scope and unified skill containers Replace the per-message session_id with a template-based system configurable per pipeline via 'Sandbox Scope' in the local-agent panel. Default scope is per-chat ({launcher_type}_{launcher_id}). Unify skill exec into the same container as default exec — skills are mounted at /workspace/.skills/{name}/ via extra_mounts instead of getting separate containers. All pipeline-bound skills are injected at container creation time. - Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages) - Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService - Rewrite native.py skill exec path to use execute_tool with shared session - Update tests for new session_id format - Add design doc: docs/review/box-session-scope.md * feat(web): show active sandbox details in Box status popover Display sandbox count and a detailed list of active sessions including session ID, image, backend, resources (CPU/memory), network mode, and last used time. Fetched from GET /api/v1/box/sessions in parallel. Includes i18n for all 8 supported languages. * feat(box): add startup and availability logging for sandbox tools Log Box runtime initialization result (success with profile info, or failure warning). Log native tool availability status at ToolManager startup so it's immediately clear whether exec/read/write/edit tools are registered for the LLM. * feat(box): support custom sandbox container image via config.yaml Add 'image' field to box config section. When set, it overrides the profile default image (python:3.11-slim) for all sandbox containers. Priority: caller-specified > config.yaml image > profile default. * feat(box): add heartbeat and reconnection for Box runtime connector Add 20-second heartbeat ping loop to detect silent Box runtime disconnections. On disconnect, set available=false and attempt reconnection after 3 seconds via the disconnect callback chain. - BoxRuntimeConnector: heartbeat loop, disconnect callback parameter, disconnect detection in connection callback and WS failure handler - BoxService: wire disconnect callback to toggle available state and re-initialize the connector on reconnection * feat(web): move runtime status to dashboard, clean up plugin debug popover Add SystemStatusCards component to the monitoring dashboard showing Plugin Runtime and Box Runtime connection status with details (backend, profile, sandbox count). Remove all Box/session status from the plugin page debug popover — it now only shows debug URL and key. Includes i18n for all 8 supported languages. * refactor(web): compact system status into a single card alongside metrics Replace the separate two-card row with a single compact 'System Status' card placed as the 5th column in the metrics grid. Shows green/red dots for Plugin Runtime and Box Runtime. Click to expand a popover with connection details (backend, profile, sandbox count). * feat: show connector error details for Plugin and Box runtime status Record Box connector error in BoxService and expose it as 'connector_error' in GET /api/v1/box/status when unavailable. Display error messages in the dashboard System Status popover for both Plugin Runtime (plugin_connector_error) and Box Runtime (connector_error) when they are disconnected. * fix(web): auto-refresh system status and show disconnect errors in real time Poll Plugin Runtime and Box Runtime status every 30 seconds so the dashboard reflects disconnections without a manual page refresh. Also re-fetch when the popover is opened for immediate feedback. * fix(box): handle RPC failure in get_status/get_sessions gracefully When the Box runtime disconnects, there is a race between the heartbeat flipping _available=false and the frontend polling get_status(). If the poll arrives first, client.get_status() throws a ConnectionClosedError which propagated as a 500, causing the frontend to show a grey dot (null status) instead of a red dot with error details. Now get_status() catches RPC errors and returns available=false with the exception message as connector_error. get_sessions() returns an empty list when unavailable or on RPC failure. * fix(box): add persistent reconnection loop with exponential backoff The previous disconnect handler only retried once and then gave up. Now spawns a background task that retries with exponential backoff (3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again. Uses a _reconnecting guard to prevent duplicate loops. Calls connector.dispose() before each retry to clean up stale tasks. * fix(box): detect disconnect when handler.run() returns normally The generic Handler.run() catches ConnectionClosedError and breaks out of its loop (normal return) instead of raising, because it has no disconnect_callback. The old code only triggered reconnection in the except branch, so a clean WebSocket close was never detected. Now treat handler.run() returning normally (after successful handshake) as a disconnect event, triggering the reconnection callback. * fix(web): refresh system status card when clicking Refresh Data button Pass a refreshKey prop through OverviewCards to SystemStatusCard that increments on each Refresh Data click, triggering a re-fetch of Plugin and Box runtime status alongside the monitoring data refresh. * fix(web): fix system status card stuck in loading state fetchStatus(showLoading=false) never called setLoading(false), so the initial loading=true was never cleared. Simplify to always setLoading in the finally block — the spinner only shows on the very first load since subsequent fetches complete near-instantly. * feat(web): show active sandbox details in dashboard Box status popover Fetch box sessions alongside status and display each active sandbox in the popover with session ID, image, resources (CPU/memory), and last used time. * feat(box): add global sandbox scope option Add a 'Global (shared by all)' option to the sandbox scope selector. Uses a constant '{global}' template variable that always resolves to 'global', so all users and chats share one sandbox container. * refactor(web): replace popover with dialog for system status details Replace the dropdown popover with a proper Dialog for runtime status details. Add a small info button on the System Status card that opens the dialog. Session details now show in a spacious 2-column grid layout with full image name, backend, CPU/memory, network, mount path, and created/last-used timestamps. * fix(web): widen system status dialog and fix scroll border issue Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move overflow-y-auto to an inner container with overflow-hidden on DialogContent to prevent padding bleed at scroll edges. * feat(web): add tooltips for truncated fields in system status dialog Wrap session_id, image, and mount path fields with Tooltip components so hovering over truncated text shows the full value. * feat: add download button * feat: successfully install * feat: delete old filter * feat: youhua frontend * fix: align box runtime launch args * feat: translate * feat: refactor market * feat: youhua qianduan * chore: rename extension zh translation * feat(extensions): unify extensions endpoint and refresh extensions page UX - Rename /home/plugins route to /home/extensions and update all sidebar links. - Add unified GET /api/v1/extensions returning plugins, MCP servers and skills, sorted by name; replace the three separate frontend fetches with this single call. - Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/ Switch/Label) and clean up hardcoded color tokens on the extension card. - Add a localStorage-persisted "Group by type" switch that, when enabled in the All Types tab, renders extensions grouped by type with a compact section header. - Show a spinner while loading and rename the empty-state copy from "No plugins installed" to "No extensions installed". - Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): fallback lucide icon when extension icon is missing Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension card when the icon URL is empty or the image fails to load. Picked icons distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to avoid visual collision with plugin component badges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): unify installed-extensions list with plugins, MCP and skills - Render plugins, MCP servers and skills together under the "Installed Extensions" sidebar entry, alphabetically sorted to match the list page. - Resolve per-item routes by extension type (plugin -> /home/extensions, mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover context menu on extensionType === 'plugin'. - Lift the "group by type" toggle into SidebarDataContext (still persisted in localStorage) so the sidebar groups items with section headers whenever the list page has the toggle enabled. - Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the LangBot blue for MCP, skill, and missing-icon plugin items, overriding the SidebarMenuSubButton svg color rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): mobile-friendly layout for extensions and add-extension pages - Stack the extensions page header vertically on small screens, let the filter Tabs scroll horizontally if they overflow, hide the debug button label below sm and let the install/debug controls wrap. - Constrain the debug popover and its inputs to the viewport width so they no longer overflow on phone-sized screens. - Drop the card grid from a fixed 30rem column to a min(100%, 22rem) column at base / 28rem at sm, and reduce the gap, so cards render cleanly at 360px+ widths in both flat and grouped views. - Make the add-extension header actions wrap on lg- viewports and the install dialog responsive instead of a hard 500px box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: change ui * feat: delete version for mcp and skills * fix: constrain home page content width * fix: preserve monitoring card borders under sticky filters * fix(box): restore sandbox config and shared mcp runtime * fix(box): harden sandbox session isolation * fix(skill): remove auto activation setting * feat(skill): align skill system with Claude Code's Tool Call design - Replace text marker activation with `activate` tool (Tool Call mechanism) - Replace 7 authoring tools with 2: `activate` + `register_skill` - Add builtin skills loading from templates/skills/ - Add create-skill as first builtin skill - Remove SKILL_ACTIVATION_MARKER and text detection methods - Tool Result returns SKILL.md content (protects KV Cache) This aligns with Claude Code's progressive disclosure pattern: - Metadata (name+description) always visible in tool description - SKILL.md body loaded on activate via Tool Call - Bundled resources accessible through virtual path mapping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(tools): add glob and grep native sandbox tools Add file discovery and content search capabilities to the sandbox: - glob: Find files by pattern (supports ** recursive matching) - grep: Search file contents with regex patterns Both tools respect skill package paths and include safety limits (max 100 files for glob, max 200 matches for grep). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(skill): add skill file browsing capability - Add API endpoints for listing/reading/writing skill files - Add FileTree component in SkillForm for directory browsing - Users can now view scripts/, references/, assets/ directories - Files can be selected and edited in the instructions textarea - Add translations for new file browsing features Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): copy builtin skills to data/skills on startup - Builtin skills (templates/skills/) are now copied to data/skills/ - Users can view and manage builtin skills in the UI - Rename SkillAuthoringToolLoader to SkillToolLoader Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): improve file browsing and fix path handling - Fix nested directory display in skill file tree (preserve root entries) - Fix file content display when clicking files in skill browser - Add skill manager and tool manager as proper package modules - Separate fileContent state to allow editing non-SKILL.md files Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(toolmgr): correct skill_tool_loader attribute name Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call and shutdown methods to match the attribute defined in initialize(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(native): update tool descriptions to use register_skill Replace references to removed import_skill_from_directory with register_skill in exec/write/edit tool descriptions. * feat(toolmgr): enhance tool initialization with backend availability checks * refactor: remove unused imports and clean up code in various files * feat: polish extension detail pages * feat: persist sidebar list expansion * fix: refine extension ui and backend errors * fix: align add extension marketplace ui * feat: manage skills through box runtime * feat: support github skill installation * fix: import github skill directories * feat: install market extensions from card click * feat(web): improve skill import flow * feat: polish extension import flow * fix(mcp): stabilize shared box managed processes * fix(web): improve backend retry and sidebar scrolling * docs(review): refresh box architecture review for feat/sandbox Sync the docs/review/ suite to the current state of the feat/sandbox branch (both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review. - box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b} config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill Tool Call activation, shared multi-process MCP container, SkillManager, BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect - box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail image conflict, frontend monitoring card) into a Resolved section; add new P0 (INIT/backend ordering), P1 (extra_mounts immutability after container creation), P2 (skill_store test gap, integration tests not in CI) - box-session-scope.md: add §0 Implementation Status — Phase 1 shipped, MCP unification landed earlier than originally scoped - box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC), add 7 new test files including SDK backend_selection/e2b/skill_store - box-tob-analysis.md: connection recovery now满足基本要求; add E2B and backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat - box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned with Plugin Runtime; revise remaining gaps (WS auth, shared base class) * refactor(box): use unified env-override mechanism for box.local config The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two places (connector._get_box_config and service._local_config), duplicating logic that LoadConfigStage._apply_env_overrides_to_config already provides generically via the SECTION__SUBSECTION__KEY convention. - Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight from instance_config (the unified BOX__LOCAL__* overrides are already applied before BoxService initializes) - Harden _load_allowed_mount_roots to accept a comma-separated string, since the generic mechanism stores a freshly-created key as a raw string when config.yaml has no box.local.allowed_mount_roots entry - docker-compose: rename the langbot container env vars to BOX__LOCAL__* (the canonical convention); remove them entirely from the langbot_box container — the Box runtime never reads box.local from env/config.yaml, it is configured via the INIT RPC action Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: repair stale skill/sandbox tests for feat/sandbox The skill subsystem moved to Tool-Call activation and a Box-managed skill store; several tests still asserted removed APIs and a sys.modules stub leaked across the suite. Full unit suite now green (was 23 failing). - test_skill_tools: drop TestSkillManagerActivation (text-marker API removed); rewrite TestSkillActivationHelper around the current skill.activation.register_activated_skill; replace the CRUD TestSkillAuthoringToolLoader with TestSkillToolLoader covering the current activate/register_skill tools and sandbox-availability gating - test_tool_manager_native: ToolManager attr is skill_tool_loader (not skill_authoring_tool_loader); native loader now exposes 6 tools (exec/read/write/edit/glob/grep) and requires initialize() with a backend-available get_status() - test_localagent_sandbox_exec: remove obsolete activation-marker leakage tests and their helper providers - test_model_service / pipeline conftest: give the mocks skill_mgr=None so PreProcessor's local-agent skill-binding guard short-circuits - test_n8nsvapi: stop permanently overwriting sys.modules ('langbot.pkg.provider.runner' etc.); save and restore around the import so other modules get the real LocalAgentRunner base class Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(tests): run unit tests on every push to feat/** branches - Add feat/** to push branches so long-lived feature branches are tested on every push (they accumulate large changes before a PR) - Drop the push path filter entirely: every push to master/develop/ feat/** now runs the full unit suite (the old 'pkg/**' filter never matched the real source path 'src/langbot/pkg/**', so backend-only pushes silently skipped tests) - Fix the same broken path glob on the pull_request trigger ('pkg/**' -> 'src/langbot/pkg/**') Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skill): harden mount/reload paths and HTTP errors against stale skill cache The Box backends behave inconsistently when extra_mounts reference a missing host directory (nsjail aborts the entire sandbox start, Docker silently creates a root-owned empty dir on the host, E2B silently skips the upload). The cache in skill_mgr.skills is only refreshed on in-process mutations, so out-of-band changes — container rebuilds, manual rm in the box volume, anything the LangBot API didn't drive — leave a stale skill that later produces one of those bad mount paths. - box/service.py: build_skill_extra_mounts now filters skills whose package_root is not isdir on the LangBot-visible filesystem and logs a warning, instead of passing the bad mount through to the backend - skill/manager.py: reload_skills (Box path) drops skills whose package_root is missing on the LangBot-side filesystem before they reach the in-memory cache, with a summary warning - api/http/controller/groups/skills.py: file/CRUD handlers now also catch BoxError (RuntimeError subclass, previously slipping past ``except ValueError`` and surfacing as 500); list/get handlers gain a try/except so a transient Box RPC failure becomes a clean 400 instead of a stack trace Tests added for build_skill_extra_mounts (skip missing, skip empty, no skill manager) and SkillManager.reload_skills (drop missing on Box path). Full unit suite: 279 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(box): add box.enabled toggle and gate consumers on availability Make the Box sandbox runtime optional. When ``box.enabled`` is false in config (or when an enabled Box fails to connect), every dependent feature degrades to the same disabled-state UX rather than crashing or silently falling back to less safe code paths. Backend: - config.yaml: new top-level ``box.enabled: true`` flag (default true) - BoxService: - Read box.enabled on construction - initialize() short-circuits when disabled — no remote WS connect, no stdio subprocess fork - _on_runtime_disconnect is a no-op when disabled (no reconnect loop on a deliberately-off service) - get_status() now exposes ``enabled`` so the frontend can tell "disabled in config" from "configured but failed" - MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to be available, not just installed - MCP _init_stdio_python_server: when ap.box_service exists but is unavailable, refuse the stdio server with an actionable error instead of silently falling through to host-stdio (which bypasses the sandbox the operator asked for). Setups without ap.box_service installed at all keep the legacy host-stdio fallback for pre-Box dev mode - SkillService._require_box_for_write: refuses create/update/install/ write_skill_file when ap.box_service is installed but unavailable. Distinguishes disabled vs failed in the error message so the UI can surface the right hint. Legacy setups (no ap.box_service) keep the local fallback path — that distinction is what keeps the existing local-skills tests valid Tests: - Box disabled-state behavior (4 cases) - Skill write refusal in disabled & failed states (7 cases) - MCP stdio runtime info policy updated to match new refuse-when-down behavior Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface Box disabled/unavailable state across consumers When Box is disabled in config (``box.enabled = false``) or fails to connect, every dependent UI surface now degrades visibly: - ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``, ``disabled`` (config-off) and a single ``hint`` key so callers don't have to re-derive the three states - ``BoxUnavailableNotice`` reusable Alert banner driven by that hint - Dashboard SystemStatusCards: three-state dot + label (connected / disabled-gray / disconnected-red); disabled state shows the ``boxDisabled`` hint, failed state continues to show the connector error. Plugin block kept untouched - Skills page (create view) and SkillDetailContent (edit view): Save button disabled and banner inserted above the form when Box is unavailable — matches the backend gate added in the previous commit - PipelineExtension skill section: ``enable_all_skills`` switch, Add Skill button and Remove buttons all gate on Box availability; banner inline under the section header - PipelineFormComponent: banner above the ``local-agent`` stage card when Box is unavailable, since that stage carries the sandbox-bound ``box-session-id-template`` field - Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale files updated with ``boxDisabled`` / ``boxUnavailable`` / ``boxRequiredHint`` strings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(box): document the box.enabled toggle and gate behavior matrix - docker-compose: move ``langbot_box`` under compose profiles (``box`` and ``all``) so ``docker compose up`` no longer requires the sandbox container. Inline comment explains how to pair the profile choice with ``box.enabled`` so the langbot service does not thrash trying to reach a runtime that was never started - docs/review/box-architecture.md: - Annotate ``box.enabled`` in the config.yaml example, listing the exact side effects (no remote/stdio connect; tools/skills/MCP stdio off; reads still work) - Replace the bare compose snippet with the actual profile-driven invocation and the BOX__ENABLED pairing - New "关闭/连接失败时的行为矩阵" section: a single table mapping every consumer (native tools, activate/register_skill, stdio MCP, skill list/CRUD, pipeline AI config, extensions page, dashboard) to its disabled-state behavior, plus the legacy ``ap.box_service`` distinguisher note Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip The previous commit hard-coded a BoxUnavailableNotice banner above the ``local-agent`` stage card. That works, but it shouts at the user about every field in that stage when in reality only one field — ``box-session-id-template`` — depends on the sandbox. Use the dynamic-form schema's existing variable-injection mechanism (``__system.*`` references via ``systemContext``) and add a sibling to ``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays visible, becomes inert, and an info icon next to its label exposes the reason on hover. The rest of the AI tab is left untouched. - entities/form/dynamic.ts: extend IDynamicFormItemSchema with ``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject`` - DynamicFormComponent: evaluate disable_if with the same resolver as show_if; OR the result into isFieldDisabled; render an Info tooltip trigger next to the label when the condition matches - ai.yaml metadata: attach disable_if (__system.box_available eq false) and a localized disabled_tooltip to box-session-id-template - PipelineFormComponent: drop the BoxUnavailableNotice import and the per-stage banner; pass ``systemContext={ box_available: boxAvailable }`` only for the local-agent stage so other stages aren't paying the re-render cost Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): friendly UI message when stdio MCP refused by Box state Previously the MCP detail dialog dumped the raw RuntimeError text from ``_init_stdio_python_server`` — English-only, prefixed with "Failed after 4 attempts", and exposing internal config names. The retry wrapper also kept retrying a refusal that is deterministically going to fail again, polluting logs. Replace the raw text with a structured signal: - New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio refusal path sets it before raising and uses a short opaque discriminator (``box_disabled_in_config`` / ``box_unavailable``) as the message body — never user-facing - ``_lifecycle_loop_with_retry`` short-circuits on ``BOX_UNAVAILABLE``: surfaces the error immediately, no retries, no "Failed after N attempts" prefix. Silences the warning storm seen during smoke-testing - ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``, ``retry_count``, ``box_session_id``, ``box_enabled`` to match what the backend already returns in get_runtime_info_dict() - Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and ``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect ``error_phase === 'box_unavailable'`` and render a two-line localized notice: state line ("Box disabled / unreachable") plus remediation line ("enable Box or switch to http/sse") - 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get ``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``, ``mcp.boxStdioRefusedSuggestion`` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp-web): block stdio MCP creation at the form when Box is unavailable When Box is disabled in config (``box.enabled = false``) or unreachable, saving a new MCP server in stdio mode produced one that could never start — the user would only learn that from the runtime error on the detail page. Stop the user before they save instead. Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog ``MCPFormDialog.tsx``) now: - Disable the ``stdio`` option in the mode select when Box is unavailable, with a small "(requires Box)" suffix so the reason is obvious. Existing stdio configs still display their current value - Show ``BoxUnavailableNotice`` inline under the mode select when the currently-selected mode is stdio and Box is unavailable, so editing a stale stdio config makes the cause visible - Disable the Save / Submit button while stdio is selected under that condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop so the parent ``MCPDetailContent`` can disable both its Submit and Save buttons. ``MCPFormDialog`` disables its Save button locally - Refuse the submit handler too (Enter-key path) with a toast carrying the same i18n message i18n: ``mcp.boxRequired`` (short tag in the disabled option) and ``mcp.stdioBlockedByBoxToast`` added to all 8 locales. Backend runtime gate (``_init_stdio_python_server`` refusal + ``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place as the last line of defence for API bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): prevent plugin config form overflow * refactor(skill): remove all local-filesystem fallbacks; Box is the sole source Skills now flow exclusively through the Box runtime. Every read and write method funnels through ``_box_service()``; when Box is unavailable (disabled in config, connection failed, or simply not installed) the operation either returns an empty surface (``list_skills`` → []) or raises with a clear ``Box runtime ... not initialised / disabled / unavailable: ...`` message via the new ``_require_box(action)`` helper. Why: the legacy local-fallback path scanned ``data/skills/``, but Box manages its own ``box.local.skills_root`` (default ``data/box/skills/``). The two diverging directories caused stale / phantom skill lists when Box flapped, and the local-fallback writes silently bypassed all the sandboxing the operator had configured. SkillService (``api/http/service/skill.py``): - New ``_require_box(action)`` returns the box service or raises a structured ValueError. ``_require_box_for_write`` kept as alias - ``list_skills`` → returns [] when Box is down so the UI can render the disabled banner cleanly - ``get_skill`` / ``get_skill_by_name`` → return None - All read-file / write-file / scan-dir / create / update / delete / install / preview methods → ``_require_box`` then box delegate. Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview pipelines) removed entirely SkillManager (``pkg/skill/manager.py``): - ``reload_skills`` returns early with empty cache when Box is down. data/skills/ discovery loop removed - ``refresh_skill_from_disk`` now just reports cache presence; the on-disk re-parse is gone since Box is the only writer Tests: - Drop 11 obsolete test_skill_service.py tests that exercised the removed local-fallback paths (create/install/file/delete/update) - Add list-empty + read-refused tests; flip the legacy-allow test to legacy-refuses-too - Rewrite refresh_skill_from_disk test to match the new behaviour Several helper methods (_managed_skill_path, _resolve_skill_path, _preview_skill_candidates, _install_preview_candidates, etc.) are now unreachable; a follow-up commit will prune them so this diff stays reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(skill): prune dead local-filesystem helpers left over from Box migration Follow-up to the Box-only refactor. The previous commit removed the local-fallback BRANCHES from every public method; this one removes the HELPERS those branches called, which are now unreachable. SkillService (service/skill.py): 787 → 449 lines Removed: scan_directory (sync), _read_skill_package, _write_skill_md, _resolve_create_field, _managed_skill_path, _managed_install_root_for_package, _normalize_package_root, _resolve_skill_path, _find_skill_entry, _discover_skill_directories, _safe_extract_zip, _extract_uploaded_skill_to_temp, _download_github_skill_to_temp, _resolve_github_source_root, _build_preview_target_dir, _preview_skill_candidates, _select_preview_candidates, _install_preview_candidates, _preview_source_root, _resolve_installed_skills, plus the module-level _FRONTMATTER_FIELDS and _build_skill_md. Kept (still needed by the surviving GitHub-import path): _download_github_asset, _download_github_skill_directory_as_zip, _find_github_skill_archive_entry, _copy_github_skill_directory_to_zip, _is_github_skill_md_url, _parse_github_skill_md_url, _resolve_github_skill_md_package_name, _validate_github_asset_url, _uploaded_skill_target_stem, _validate_skill_name. Imports dropped: shutil, tempfile, yaml, ....utils.paths. SkillManager (skill/manager.py): 187 → 88 lines Removed: get_managed_skills_root, _discover_skill_directories, _find_skill_entry, _load_skill_file, _normalize_package_root. Imports dropped: datetime, parse_frontmatter, paths. Tests: - test_skill_service.py: drop the 3 sync scan_directory tests + skill_service fixture + _create_skill_file helper - test_skill_tools.py: drop test_load_skill_file_success; rename TestSkillManagerPackageLoading → TestSkillManagerCache Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skill): re-inject skill index into local-agent system prompt The contributor's original PR (#1917) appended an ``Available Skills`` index to the system prompt before the LLM saw the user message, so the LLM could decide whether to activate a skill. ``7145447b`` removed the text-marker activation flow and, together with it, the entire system prompt injection — but the Tool Call replacement only put the available skills inside the ``activate`` tool's description. In practice the LLM ignores tool descriptions for selection and goes straight to native tools, so user-visible skill activation silently broke. Restore the injection, adapted for the Tool Call era: - SkillManager regains ``get_skill_index(bound_skills)`` and ``build_skill_aware_prompt_addition(bound_skills)``. The addendum carries only ``name (display_name): description`` for each pipeline-visible skill plus one instruction line pointing at the ``activate`` tool. No SKILL.md contents — KV cache stays clean - PreProcessor appends the addendum to the first system message (or inserts a new one) of ``query.prompt.messages`` for the local-agent runner. Handles plain-string and ContentElement[] bodies. Skips cleanly when no skills are visible - 3 new test_preproc cases: injection happens, bound-skills subset honoured, empty addendum touches nothing. 280 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(box): downgrade get_status.available when backend probed unavailable Until now ``BoxService.get_status`` returned ``available: true`` whenever the runtime connector was healthy, even if the runtime itself reported ``backend: { available: false }`` (operator selected nsjail without the binary, Docker daemon crashed mid-session, E2B credentials wrong, ...). The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the top-level flag and showed "connected" while every actual call to native exec or skill management would fail. The native-tool loader already polled ``status.backend.available`` independently and hid its tools correctly, but every other consumer (dashboard banner, the disabled-state hint, the LLM-facing message) disagreed with it. Combine the two in the payload: ``available = self._available AND status.backend.available``. When ``backend.available`` is false we now also surface a ``connector_error`` that names the backend ("Configured sandbox backend \"nsjail\" is unavailable") so the dialog shows the actionable reason instead of an empty error pane. The detailed ``backend`` object is preserved unchanged for the dialog. Internal ``box_service.available`` (used by ``skill_service`` writes, ``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally NOT changed — it still tracks connector health only, so a backend blip does not trigger spurious reconnect loops. Tests: - ``test_get_status_downgrades_available_when_backend_dead`` — exercise the new branch (connector OK, backend.available=false → top-level available=false, connector_error mentions the backend name) - ``test_get_status_keeps_available_true_when_backend_ok`` — guard against regressing the happy path Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary): ``GET /api/v1/box/status`` now returns ``available: false`` with the named connector_error, instead of the previous misleading ``available: true``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface the specific Box failure reason in unavailable banner When Box is configured but the runtime reports its backend is dead (e.g. ``box.backend = nsjail`` but the binary is missing, or Docker daemon crashed), the backend now returns a structured ``connector_error`` like ``Configured sandbox backend "nsjail" is unavailable``. The previous notice only said "Box sandbox is unavailable" + a generic "enable Box" hint, hiding the actionable detail. - ``useBoxStatus``: derive ``reason`` from ``status.connector_error``. Only exposed for the failed-state (``hint === 'boxUnavailable'``), since the disabled-by-config message already carries its reason - ``BoxUnavailableNotice``: insert the reason as a small monospaced line between the state message and the action hint. The disabled variant is unchanged (operator chose the state) - Wire ``reason`` through every existing call site (Skills page + detail, PipelineExtension, both MCP forms). Old unused ``context`` prop dropped Net layout (3 lines, still compact): ⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ... Configured sandbox backend "nsjail" is unavailable This feature requires the Box runtime. Enable it in config ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reconcile master's unit tests with feat/sandbox refactors The merge from master brought in new unit tests that target pre-refactor APIs on feat/sandbox. Reconcile each: - factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills dict + inert prompt-addition builder) and a Mock pipeline_service so the PreProcessor skill-index injection branch can run end-to-end in tests. - pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so pipeline.stage is fully initialised before any individual stage test (preproc, longtext, ...) tries to lazy-load it. Without this preload, running test_preproc.py in isolation hit a circular-import error via the stage -> app -> pipelinemgr -> stage chain. - provider/test_tool_manager.py: ToolManager now probes four loaders (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in the execute_func_call fixture and assert all four shutdowns fire. - utils/test_paths.py: drop the three cwd-dependent _check_if_source_install cases. The refactor walks Path(__file__).resolve().parents looking for pyproject.toml + main.py, so cwd no longer factors in and there's no file read to mock-fail. The positive case and caching test still apply. - utils/test_version.py: delete entirely. is_newer and compare_version_str were removed when VersionManager was refactored to use the Space API for release checks (1b4107a9); the tests targeted a surface that no longer exists. * refactor(box): launch box runtime via the lbp CLI subcommand Mirror the plugin runtime: box is now started through the same CLI entry point (langbot_plugin.cli) instead of the box module directly. - docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box` (WebSocket is the default transport, no flag needed — matches `rt`). - box/connector.py: both subprocess launch sites (_start_local_stdio and the Windows _start_subprocess_then_ws path) invoke `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport. - docs/review: update stale `-m langbot_plugin.box[.server]` references. Pairs with the SDK change that removes box's direct-launch entry points (python -m langbot_plugin.box / .box.server) and the legacy --mode flag. * chore: bump langbot-plugin beta 1 * fix(ci): resolve langbot-plugin from PyPI and clear lint failures CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image. Root causes and fixes: - pyproject.toml had a [tool.uv.sources] editable override pinning langbot-plugin to ../langbot-plugin-sdk. That path only exists in a paired local checkout, so `uv sync` failed on every CI runner ("Distribution not found"). Remove the override and regenerate uv.lock so langbot-plugin==0.4.0b1 resolves from PyPI, matching master. - tests/integration/api/test_pipelines.py: the pipeline extensions endpoint now calls ap.skill_service.list_skills(); add the missing skill_service mock to the fake_pipeline_app fixture (the test came from master, the endpoint change from feat/sandbox). - Apply ruff format to three src files and prettier to three web files that had committed formatting drift, failing `ruff format --check` and `pnpm lint`. * chore: bump beta version * docs: remove BOX_BACKEND override reference * fix(pipelines): stop attributing dashboard debug WS to bound web_page_bot The dashboard pipeline-debug WebSocket (/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket (/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths, but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when the same pipeline happened to be bound to a web_page_bot, passed that bot as `owner_bot` into `handle_websocket_message`. The adapter then used the page bot's listeners + adapter for the request, so debug sessions were logged as "page bot" activity in the dashboard. Debug sessions must always run under the built-in websocket_proxy_bot. Remove `_find_owner_bot`, drop the `owner_bot` parameter from the debug-path `_handle_receive`, and call `handle_websocket_message` without it so the adapter takes its default proxy-bot branch. The embed handler still resolves and passes its `runtime_bot` for the page-bot path, so attribution there is unchanged. * fix(plugin): install marketplace MCP from canonical mode + extra_args _install_mcp_from_marketplace read the dropped `mcp_data.config` field and reconstructed mode/extra_args by guessing from the URL — which lost stdio's command/args/env/box entirely, so stdio MCP installs from the marketplace always failed. Use the Space record's canonical `mode` and `extra_args` directly (the same shape stored in mcp_servers), and gate the install on `mode` instead of the removed `config`. After a successful install, best-effort POST to the marketplace install endpoint to bump install_count. * feat(web): show recommendation lists in plugin market; mixed-type icons The marketplace recommendation lists (curated rows from Space) were never mounted in the plugin market page. Wire them in: - fetch recommendation lists on mount and render them above the extension grid, only when no search/filter is active. Recommendation lists now mix plugins, MCPs and skills, so resolve each card's icon by type (plugin / mcp / skill marketplace icon URL) instead of always using the plugin icon endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): auto-open install dialog from one-click deep link Accept a deep link from LangBot Space's one-click install: /home/add-extension?install=1&extension_type=<plugin|mcp|skill>&author=&name=&version= On mount, populate the install info, open the confirm dialog directly, and strip the params from the URL. Reuses the existing marketplace install flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: push marketplace URL to runtime; fix market client base race - On connecting to the plugin runtime, push the configured space.url via the new SET_RUNTIME_CONFIG action so the runtime downloads plugins from the same Space, instead of relying on its own CLOUD_SERVICE_URL env/default. Wrapped in try/except so an older SDK without the action degrades gracefully. - web: the plugin market fetched recommendation lists (and listings) via the sync cloud client before its baseURL was resolved from system info, so it hit the default space.langbot.app. Await getCloudServiceClient() before the initial fetches and for the recommendation list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): don't show MCP "connection failed" while still connecting The MCP status UI rendered "连接失败" for any non-connected state, so during a normal connection attempt the subtitle showed "连接失败" while the status pill below it showed "连接中..." — contradictory. Only treat an explicit ERROR (or box-unavailable) status as failed; a CONNECTING or initial/unresolved status now shows "连接中". Applied to the MCP detail form (subtitle + StatusDisplay) and the MCP server card. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): type-aware install dialog + refresh sidebar after install The marketplace install confirm dialog was hardcoded to "安装插件 / 确定要安装插件 X 吗" for every type. Make it type-aware (plugin / MCP / skill) and show more info: type chip, author/name id, and version when present. Also refresh all sidebar extension lists (plugins, MCP servers, skills) when an install task completes, so the newly-installed extension appears immediately regardless of type (previously only refreshPlugins ran). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): richer install dialog (icon + name + description), drop redundant type row The install dialog already states the type in its title, so the "类型" row was redundant. Replace the info box with the extension's icon (avatar), display name, author/name id + version, and description — built from the PluginV4 for in-app installs and from the icon endpoint by type for the one-click deep link. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): TDZ crash in add-extension (installIconURL before installInfo) installIconURL was computed above the useState declaration of installInfo, causing "Cannot access 'installInfo' before initialization" (500) on the add-extension page. Move the computation below the state declarations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): redesign install-progress dialog for MCP/skill The progress dialog showed plugin-only stages (download + dependency install) for every type. MCP/skill have no such steps, so show a single "installing → done/failed" row for them (MCP: adding & connecting the server; skill: installing the package) while keeping the detailed download/deps stages for plugins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): add missing market.componentName i18n keys The marketplace component filter (and component badges) used market.componentName.{Tool,Command,EventListener,KnowledgeEngine,Parser,Page} but those keys only existed under plugins.componentName, so the market UI showed raw keys. Add a componentName block to the market namespace (zh-Hans + en-US; other locales fall back to zh-Hans). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): sidebar extensions refresh button + full-name tooltip - Add a refresh button to the installed-extensions category header in the sidebar; it re-fetches plugins + MCP servers + skills and spins while loading. - The sidebar item tooltip now shows the extension's full name (with the description below when present), so truncated MCP/extension names are readable on hover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(plugin-market): rename component filter to "插件组件" with hint tooltip + persist filters - Rename the in-app plugin market component filter label to "插件组件" / "Plugin Component" - Add an Info icon tooltip explaining what plugin components are (Tool / Command / EventListener, etc.) - Persist filter selections (type / component / tags / sort) in localStorage so they survive reloads; restored on mount (URL type param still wins) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): restore missing "页面"(Page) component filter option The market component-filter list on this branch was a diverged rewrite that dropped the Page component kind master had added. The i18n key (market.componentName.Page) already existed; re-add the Page entry to the componentOptions list so plugins providing Page components can be filtered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(i18n): reword plugin component filter hint Drop the redundant "插件组件是" lead-in and mention that components extend LangBot's capabilities; mirror the wording in en-US. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(i18n): backfill missing market/addExtension keys in 6 locales check-i18n surfaced that market.componentName.*, market.filterByComponentHint and the addExtension.install* keys existed only in en-US/zh-Hans. Backfill them for es-ES, ja-JP, ru-RU, th-TH, vi-VN and zh-Hant (reusing each locale's existing component-name translations) and align the filterByComponent label with the new "Plugin Component" wording. check-i18n now passes for all locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(plugins): relabel "group by type" as "group by format" The installed-extensions grouping is by extension format (plugin / MCP / skill), so rename the toggle label accordingly across all 8 locales (key unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): cursor-pointer on tag filter trigger The TagsFilter Select trigger used the default cursor; add cursor-pointer so the tag filter is clearly clickable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sidebar): show edition badge (Community / Cloud) in logo area Add a small badge next to the LangBot name in the sidebar header that reflects systemInfo.edition: a neutral "Community" badge for the community edition and a blue "Cloud" badge for the cloud edition. Adds sidebar.editionCommunity / sidebar.editionCloud across all 8 locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(sidebar): unify zh-Hans cloud edition label to 云端版 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sidebar): edition badge - drop hover, use "Cloud" in all locales The edition badge is not interactive, so remove the hover background on the cloud badge. Also use the literal "Cloud" label uniformly across all locales instead of localized variants (云端版/クラウド版/...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(box): cap tool-call loop and run workspace-quota walk off the event loop Two robustness fixes that bite under normal sandbox usage (not just attack), hardening the self-hosted community edition before release: - localagent: cap the tool-call loop at MAX_TOOL_CALL_ROUNDS (128). A looping or adversarial model could otherwise emit tool calls indefinitely (each potentially a sandbox exec), producing a non-terminating request and runaway cost. The cap is generous enough not to interrupt legitimate multi-step agentic workflows. - box.service: make _enforce_workspace_quota async and run the recursive workspace scan via asyncio.to_thread. It ran on every quota-enforced exec and a large workspace would block the whole asyncio runtime (all bots/pipelines). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): refresh box docs; trim issue list to SaaS blockers only Community self-hosted edition is release-ready, so the box review docs are updated to current state (date 2026-06-02 + status note) and box-issues.md is rewritten to keep only the SaaS / multi-tenant / network-exposed release blockers (S1-S8): unauthenticated control plane, no per-pipeline exec authorization, unbounded sessions + no reaper, no kernel-level quota, mount validation gaps (/ + extra_mounts), missing container hardening, lock-around- cold-start, and the lower-severity follow-ups. Resolved items (tool-call loop cap, async quota scan, host_path mount allowlist, _is_path_under dedup) moved to a short "resolved before community release" record; community-only and pure-cleanup items dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(deps): pin langbot-plugin to 0.4.0 Track the stable SDK release (0.4.0b1 -> 0.4.0); regenerate uv.lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: WangCham <651122857@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: fdc310 <82008029+fdc310@users.noreply.github.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com>
2026-07-21 03:46:11 +00:00 · 2026-06-03 11:12:39 +08:00
parent 4054ba2a76
commit 96b041846d
161 changed files with 22518 additions and 4029 deletions
@@ -15,7 +15,7 @@ class FakeApp:
    def __init__(
        self,
        *,
-        command_prefix: list[str] = ["/", "!"],
+        command_prefix: list[str] = ['/', '!'],
        command_enable: bool = True,
        pipeline_concurrency: int = 10,
        admins: list[str] | None = None,
@@ -40,6 +40,8 @@ class FakeApp:
        self.telemetry = self._create_mock_telemetry()
        self.survey = None
        self.cmd_mgr = self._create_mock_cmd_mgr()
+        self.skill_mgr = self._create_mock_skill_mgr()
+        self.pipeline_service = self._create_mock_pipeline_service()

        # Apply any extra attributes for specific test scenarios
        for name, value in extra_attrs.items():
@@ -98,9 +100,9 @@ class FakeApp:
    ):
        instance_config = Mock()
        instance_config.data = {
-            "command": {"prefix": command_prefix, "enable": command_enable},
-            "concurrency": {"pipeline": pipeline_concurrency},
-            "admins": admins,
+            'command': {'prefix': command_prefix, 'enable': command_enable},
+            'concurrency': {'pipeline': pipeline_concurrency},
+            'admins': admins,
        }
        return instance_config

@@ -119,6 +121,20 @@ class FakeApp:
        cmd_mgr.execute = AsyncMock()
        return cmd_mgr

+    def _create_mock_skill_mgr(self):
+        """Mock SkillManager that returns no skill index addition by default."""
+        skill_mgr = Mock()
+        skill_mgr.skills = {}
+        skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
+        skill_mgr.get_skill_index = Mock(return_value=[])
+        return skill_mgr
+
+    def _create_mock_pipeline_service(self):
+        """Mock PipelineService.get_pipeline returning empty extensions prefs."""
+        pipeline_service = AsyncMock()
+        pipeline_service.get_pipeline = AsyncMock(return_value={'extensions_preferences': {}})
+        return pipeline_service
+
    def capture_message(self, message):
        """Capture an outbound message for test assertions."""
        self._outbound_messages.append(message)
@@ -134,4 +150,4 @@ class FakeApp:

 def fake_app(**kwargs) -> FakeApp:
    """Create a FakeApp instance with optional overrides."""
-    return FakeApp(**kwargs)
+    return FakeApp(**kwargs)
@@ -20,6 +20,7 @@ pytestmark = pytest.mark.integration

 # ============== FIXTURE FOR SYS.MODULES ISOLATION ==============

+
@pytest.fixture(scope='module')
 def mock_circular_import_chain():
    """Break circular import chain for API controller."""
@@ -53,21 +54,25 @@ def mock_circular_import_chain():
    ):
        # Import groups after mocking to populate preregistered_groups
        import langbot.pkg.api.http.controller.groups.pipelines.pipelines as _pipelines  # noqa: E402, F401
+
        yield


 # ============== FAKE APPLICATION WITH PIPELINE SERVICES ==============

+
@pytest.fixture(scope='module')
 def fake_pipeline_app():
    """Create FakeApp with pipeline-specific services (module scope for reuse)."""
    app = FakeApp()

    # Pipeline config
-    app.instance_config.data.update({
-        'api': {'port': 5300},
-        'system': {'allow_modify_login_info': True, 'limitation': {}},
-    })
+    app.instance_config.data.update(
+        {
+            'api': {'port': 5300},
+            'system': {'allow_modify_login_info': True, 'limitation': {}},
+        }
+    )

    # Auth services
    app.user_service = Mock()
@@ -79,25 +84,31 @@ def fake_pipeline_app():

    # Pipeline service
    app.pipeline_service = Mock()
-    app.pipeline_service.get_pipeline_metadata = AsyncMock(return_value=[
-        {'name': 'trigger', 'stages': []},
-        {'name': 'ai', 'stages': []},
-    ])
-    app.pipeline_service.get_pipelines = AsyncMock(return_value=[
-        {
+    app.pipeline_service.get_pipeline_metadata = AsyncMock(
+        return_value=[
+            {'name': 'trigger', 'stages': []},
+            {'name': 'ai', 'stages': []},
+        ]
+    )
+    app.pipeline_service.get_pipelines = AsyncMock(
+        return_value=[
+            {
+                'uuid': 'test-pipeline-uuid',
+                'name': 'Test Pipeline',
+                'description': 'Test description',
+                'created_at': '2024-01-01T00:00:00',
+                'updated_at': '2024-01-01T00:00:00',
+                'is_default': False,
+            }
+        ]
+    )
+    app.pipeline_service.get_pipeline = AsyncMock(
+        return_value={
            'uuid': 'test-pipeline-uuid',
            'name': 'Test Pipeline',
-            'description': 'Test description',
-            'created_at': '2024-01-01T00:00:00',
-            'updated_at': '2024-01-01T00:00:00',
-            'is_default': False,
+            'config': {},
        }
-    ])
-    app.pipeline_service.get_pipeline = AsyncMock(return_value={
-        'uuid': 'test-pipeline-uuid',
-        'name': 'Test Pipeline',
-        'config': {},
-    })
+    )
    app.pipeline_service.create_pipeline = AsyncMock(return_value={'uuid': 'new-pipeline-uuid'})
    app.pipeline_service.update_pipeline = AsyncMock(return_value={})
    app.pipeline_service.delete_pipeline = AsyncMock()
@@ -112,6 +123,10 @@ def fake_pipeline_app():
    app.mcp_service = Mock()
    app.mcp_service.get_mcp_servers = AsyncMock(return_value=[])

+    # Skill service (for extensions endpoint)
+    app.skill_service = Mock()
+    app.skill_service.list_skills = AsyncMock(return_value=[])
+
    # Plugin connector (for extensions endpoint)
    app.plugin_connector.list_plugins = AsyncMock(return_value=[])

@@ -130,6 +145,7 @@ async def quart_test_client(fake_pipeline_app, http_controller_cls):

 # ============== PIPELINE ENDPOINT TESTS ==============

+
@pytest.mark.usefixtures('mock_circular_import_chain')
 class TestPipelineMetadataEndpoint:
    """Tests for /api/v1/pipelines/_/metadata endpoint."""
@@ -138,8 +154,7 @@ class TestPipelineMetadataEndpoint:
    async def test_get_pipeline_metadata_success(self, quart_test_client):
        """GET /api/v1/pipelines/_/metadata returns metadata list."""
        response = await quart_test_client.get(
-            '/api/v1/pipelines/_/metadata',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines/_/metadata', headers={'Authorization': 'Bearer test_token'}
        )

        assert response.status_code == 200
@@ -162,10 +177,7 @@ class TestPipelinesListEndpoint:
    @pytest.mark.asyncio
    async def test_get_pipelines_success(self, quart_test_client):
        """GET /api/v1/pipelines returns pipeline list."""
-        response = await quart_test_client.get(
-            '/api/v1/pipelines',
-            headers={'Authorization': 'Bearer test_token'}
-        )
+        response = await quart_test_client.get('/api/v1/pipelines', headers={'Authorization': 'Bearer test_token'})

        assert response.status_code == 200
        data = await response.get_json()
@@ -176,8 +188,7 @@ class TestPipelinesListEndpoint:
    async def test_get_pipelines_with_sort_param(self, quart_test_client):
        """GET pipelines with sort parameter."""
        response = await quart_test_client.get(
-            '/api/v1/pipelines?sort_by=created_at&sort_order=DESC',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines?sort_by=created_at&sort_order=DESC', headers={'Authorization': 'Bearer test_token'}
        )

        assert response.status_code == 200
@@ -193,8 +204,7 @@ class TestPipelinesCRUDEndpoints:
    async def test_get_single_pipeline_success(self, quart_test_client):
        """GET /api/v1/pipelines/{uuid} returns pipeline."""
        response = await quart_test_client.get(
-            '/api/v1/pipelines/test-pipeline-uuid',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
        )

        assert response.status_code == 200
@@ -208,7 +218,7 @@ class TestPipelinesCRUDEndpoints:
        response = await quart_test_client.post(
            '/api/v1/pipelines',
            headers={'Authorization': 'Bearer test_token'},
-            json={'name': 'New Pipeline', 'config': {}}
+            json={'name': 'New Pipeline', 'config': {}},
        )

        assert response.status_code == 200
@@ -222,7 +232,7 @@ class TestPipelinesCRUDEndpoints:
        response = await quart_test_client.put(
            '/api/v1/pipelines/test-pipeline-uuid',
            headers={'Authorization': 'Bearer test_token'},
-            json={'name': 'Updated Pipeline'}
+            json={'name': 'Updated Pipeline'},
        )

        assert response.status_code == 200
@@ -233,8 +243,7 @@ class TestPipelinesCRUDEndpoints:
    async def test_delete_pipeline_success(self, quart_test_client):
        """DELETE /api/v1/pipelines/{uuid} deletes pipeline."""
        response = await quart_test_client.delete(
-            '/api/v1/pipelines/test-pipeline-uuid',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
        )

        assert response.status_code == 200
@@ -245,8 +254,7 @@ class TestPipelinesCRUDEndpoints:
    async def test_copy_pipeline_success(self, quart_test_client):
        """POST /api/v1/pipelines/{uuid}/copy copies pipeline."""
        response = await quart_test_client.post(
-            '/api/v1/pipelines/test-pipeline-uuid/copy',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines/test-pipeline-uuid/copy', headers={'Authorization': 'Bearer test_token'}
        )

        assert response.status_code == 200
@@ -263,8 +271,7 @@ class TestPipelineExtensionsEndpoint:
    async def test_get_extensions(self, quart_test_client):
        """GET /api/v1/pipelines/{uuid}/extensions."""
        response = await quart_test_client.get(
-            '/api/v1/pipelines/test-pipeline-uuid/extensions',
-            headers={'Authorization': 'Bearer test_token'}
+            '/api/v1/pipelines/test-pipeline-uuid/extensions', headers={'Authorization': 'Bearer test_token'}
        )

        # Should return 200 if pipeline found
@@ -0,0 +1,329 @@
+"""Integration tests for LangBot Box.
+
+These tests verify the end-to-end behavior of the Box sandbox execution
+system.  Tests decorated with ``requires_container`` need a real container
+runtime (Podman or Docker) and are skipped otherwise.
+
+CI only runs ``tests/unit_tests/``, so these tests never execute in the
+CI pipeline.  Run them locally with::
+
+    pytest tests/integration_tests/ -v
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import shutil
+import socket
+import subprocess
+from types import SimpleNamespace
+
+import pytest
+
+from langbot.pkg.box.service import BoxService
+from langbot_plugin.box.backend import BaseSandboxBackend
+from langbot_plugin.box.client import ActionRPCBoxClient
+from langbot_plugin.box.errors import BoxBackendUnavailableError
+from langbot_plugin.box.models import BoxExecutionStatus, BoxNetworkMode, BoxSpec
+from langbot_plugin.box.runtime import BoxRuntime
+from langbot_plugin.box.server import BoxServerHandler
+
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+
+_logger = logging.getLogger('test.box.integration')
+
+# Default image for integration tests — small and fast to pull.
+_TEST_IMAGE = 'alpine:latest'
+
+
+# ── Skip helpers ──────────────────────────────────────────────────────
+
+
+def _has_container_runtime() -> bool:
+    for cmd in ('podman', 'docker'):
+        if shutil.which(cmd) is None:
+            continue
+        try:
+            result = subprocess.run(
+                [cmd, 'info'],
+                capture_output=True,
+                timeout=10,
+            )
+            if result.returncode == 0:
+                return True
+        except Exception:
+            continue
+    return False
+
+
+def _can_open_test_socket() -> bool:
+    try:
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    except OSError:
+        return False
+    sock.close()
+    return True
+
+
+requires_container = pytest.mark.skipif(
+    not _has_container_runtime(),
+    reason='no container runtime (podman/docker) available',
+)
+
+requires_socket = pytest.mark.skipif(
+    not _can_open_test_socket(),
+    reason='local test environment does not permit opening TCP sockets',
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+class _QueueConnection:
+    """In-process Connection backed by asyncio Queues — no real IO."""
+
+    def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
+        self._rx = rx
+        self._tx = tx
+
+    async def send(self, message: str) -> None:
+        await self._tx.put(message)
+
+    async def receive(self) -> str:
+        return await self._rx.get()
+
+    async def close(self) -> None:
+        pass
+
+
+async def _make_rpc_pair(runtime: BoxRuntime):
+    """Create an in-process (ActionRPCBoxClient, server_task, client_task) connected via queues."""
+    from langbot_plugin.runtime.io.handler import Handler
+
+    c2s: asyncio.Queue[str] = asyncio.Queue()
+    s2c: asyncio.Queue[str] = asyncio.Queue()
+    client_conn = _QueueConnection(rx=s2c, tx=c2s)
+    server_conn = _QueueConnection(rx=c2s, tx=s2c)
+
+    server_handler = BoxServerHandler(server_conn, runtime)
+    server_task = asyncio.create_task(server_handler.run())
+
+    client_handler = Handler.__new__(Handler)
+    Handler.__init__(client_handler, client_conn)
+    client_task = asyncio.create_task(client_handler.run())
+
+    client = ActionRPCBoxClient(logger=_logger)
+    client.set_handler(client_handler)
+
+    return client, server_task, client_task
+
+
+# ── Fixtures ──────────────────────────────────────────────────────────
+
+
+@pytest.fixture
+async def box_client():
+    """Yield an ActionRPCBoxClient backed by a real BoxRuntime via in-process RPC."""
+    runtime = BoxRuntime(logger=_logger)
+    await runtime.initialize()
+    client, server_task, client_task = await _make_rpc_pair(runtime)
+    yield client
+    server_task.cancel()
+    client_task.cancel()
+    await runtime.shutdown()
+
+
+# ── 1. Simple command execution ───────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_exec_simple_command(box_client: ActionRPCBoxClient):
+    """Box starts a simple command and returns stdout."""
+    spec = BoxSpec(
+        cmd='echo hello-box',
+        session_id='int-simple',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    result = await box_client.execute(spec)
+
+    assert result.status == BoxExecutionStatus.COMPLETED
+    assert result.exit_code == 0
+    assert 'hello-box' in result.stdout
+
+
+# ── 2. Session file persistence ───────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_session_persists_files(box_client: ActionRPCBoxClient):
+    """Write a file in one exec, read it back in a second exec on the same session."""
+    sid = 'int-persist'
+
+    write_result = await box_client.execute(
+        BoxSpec(
+            cmd='echo "hello from file" > /tmp/testfile.txt',
+            session_id=sid,
+            workdir='/tmp',
+            image=_TEST_IMAGE,
+        )
+    )
+    assert write_result.exit_code == 0
+
+    read_result = await box_client.execute(
+        BoxSpec(
+            cmd='cat /tmp/testfile.txt',
+            session_id=sid,
+            workdir='/tmp',
+            image=_TEST_IMAGE,
+        )
+    )
+    assert read_result.exit_code == 0
+    assert 'hello from file' in read_result.stdout
+
+
+# ── 3. Timeout handling ───────────────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_timeout_kills_command(box_client: ActionRPCBoxClient):
+    """A long-running command is killed after timeout_sec."""
+    session_id = 'int-timeout'
+    spec = BoxSpec(
+        cmd='sleep 120',
+        session_id=session_id,
+        workdir='/tmp',
+        timeout_sec=3,
+        image=_TEST_IMAGE,
+    )
+    result = await box_client.execute(spec)
+
+    assert result.status == BoxExecutionStatus.TIMED_OUT
+    assert result.exit_code is None
+
+    sessions = await box_client.get_sessions()
+    assert all(session['session_id'] != session_id for session in sessions)
+
+
+# ── 4. Network isolation ─────────────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_offline_cannot_reach_network(box_client: ActionRPCBoxClient):
+    """With network=OFF the sandbox cannot reach the internet."""
+    spec = BoxSpec(
+        cmd='wget -q -O /dev/null --timeout=3 http://1.1.1.1 2>&1; exit $?',
+        session_id='int-offline',
+        workdir='/tmp',
+        network=BoxNetworkMode.OFF,
+        image=_TEST_IMAGE,
+    )
+    result = await box_client.execute(spec)
+
+    assert result.exit_code != 0
+
+
+# ── 5. Backend unavailable ───────────────────────────────────────────
+
+
+class _UnavailableBackend(BaseSandboxBackend):
+    """A backend that always reports itself as unavailable."""
+
+    name = 'unavailable'
+
+    def __init__(self):
+        super().__init__(logging.getLogger('test'))
+
+    async def is_available(self) -> bool:
+        return False
+
+    async def start_session(self, spec):
+        raise NotImplementedError
+
+    async def exec(self, session, spec):
+        raise NotImplementedError
+
+    async def stop_session(self, session):
+        pass
+
+
+@requires_socket
+@pytest.mark.asyncio
+async def test_backend_unavailable_returns_error():
+    """When no backend is available the full RPC path returns BoxBackendUnavailableError."""
+    runtime = BoxRuntime(logger=_logger, backends=[_UnavailableBackend()])
+    await runtime.initialize()
+    client, server_task, client_task = await _make_rpc_pair(runtime)
+    try:
+        spec = BoxSpec(
+            cmd='echo hello',
+            session_id='int-no-backend',
+            workdir='/tmp',
+        )
+        with pytest.raises(BoxBackendUnavailableError):
+            await client.execute(spec)
+    finally:
+        server_task.cancel()
+        client_task.cancel()
+        await runtime.shutdown()
+
+
+# ── 6. Full service-to-runtime path ──────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_full_service_to_remote_runtime(tmp_path):
+    """BoxService -> ActionRPCBoxClient -> RPC -> BoxRuntime -> real backend."""
+    runtime = BoxRuntime(logger=_logger)
+    await runtime.initialize()
+    client, server_task, client_task = await _make_rpc_pair(runtime)
+    try:
+        host_dir = tmp_path / 'workspace'
+        host_dir.mkdir()
+
+        mock_ap = SimpleNamespace(
+            logger=_logger,
+            instance_config=SimpleNamespace(
+                data={
+                    'box': {
+                        'backend': 'local',
+                        'runtime': {'endpoint': ''},
+                        'local': {
+                            'profile': 'default',
+                            'allowed_mount_roots': [str(tmp_path)],
+                            'default_workspace': str(host_dir),
+                        },
+                        'e2b': {'api_key': '', 'api_url': '', 'template': ''},
+                    }
+                }
+            ),
+        )
+
+        service = BoxService(mock_ap, client=client)
+        await service.initialize()
+
+        query = pipeline_query.Query.model_construct(query_id=42)
+        result = await service.execute_tool(
+            {'command': 'echo service-path'},
+            query,
+        )
+
+        assert result['ok'] is True
+        assert result['status'] == 'completed'
+        assert 'service-path' in result['stdout']
+        assert result['session_id'] == 'query_42'
+    finally:
+        server_task.cancel()
+        client_task.cancel()
+        await runtime.shutdown()
@@ -0,0 +1,368 @@
+"""Integration tests for Box MCP-related features.
+
+These tests verify managed process lifecycle, WebSocket stdio attach,
+session cleanup, and the single-session query API using a real container
+runtime.
+
+CI only runs ``tests/unit_tests/``, so these tests never execute in the
+CI pipeline.  Run them locally with::
+
+    pytest tests/integration_tests/box/test_box_mcp_integration.py -v
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import shutil
+import socket
+import subprocess
+
+import aiohttp
+import pytest
+from aiohttp.test_utils import TestServer
+
+from langbot_plugin.box.client import ActionRPCBoxClient
+from langbot_plugin.box.errors import BoxManagedProcessNotFoundError, BoxSessionNotFoundError
+from langbot_plugin.box.models import BoxManagedProcessSpec, BoxManagedProcessStatus, BoxSpec
+from langbot_plugin.box.runtime import BoxRuntime
+from langbot_plugin.box.server import BoxServerHandler, create_ws_relay_app
+
+_logger = logging.getLogger('test.box.mcp_integration')
+
+_TEST_IMAGE = 'alpine:latest'
+
+
+# ── Skip helpers ──────────────────────────────────────────────────────
+
+
+def _has_container_runtime() -> bool:
+    for cmd in ('podman', 'docker'):
+        if shutil.which(cmd) is None:
+            continue
+        try:
+            result = subprocess.run([cmd, 'info'], capture_output=True, timeout=10)
+            if result.returncode == 0:
+                return True
+        except Exception:
+            continue
+    return False
+
+
+def _can_open_test_socket() -> bool:
+    try:
+        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    except OSError:
+        return False
+    sock.close()
+    return True
+
+
+requires_container = pytest.mark.skipif(
+    not _has_container_runtime(),
+    reason='no container runtime (podman/docker) available',
+)
+
+requires_socket = pytest.mark.skipif(
+    not _can_open_test_socket(),
+    reason='local test environment does not permit opening TCP sockets',
+)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────
+
+
+class _QueueConnection:
+    """In-process Connection backed by asyncio Queues — no real IO."""
+
+    def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
+        self._rx = rx
+        self._tx = tx
+
+    async def send(self, message: str) -> None:
+        await self._tx.put(message)
+
+    async def receive(self) -> str:
+        return await self._rx.get()
+
+    async def close(self) -> None:
+        pass
+
+
+async def _make_rpc_pair(runtime: BoxRuntime):
+    """Create an in-process RPC pair connected via queues."""
+    from langbot_plugin.runtime.io.handler import Handler
+
+    c2s: asyncio.Queue[str] = asyncio.Queue()
+    s2c: asyncio.Queue[str] = asyncio.Queue()
+    client_conn = _QueueConnection(rx=s2c, tx=c2s)
+    server_conn = _QueueConnection(rx=c2s, tx=s2c)
+
+    server_handler = BoxServerHandler(server_conn, runtime)
+    server_task = asyncio.create_task(server_handler.run())
+
+    client_handler = Handler.__new__(Handler)
+    Handler.__init__(client_handler, client_conn)
+    client_task = asyncio.create_task(client_handler.run())
+
+    client = ActionRPCBoxClient(logger=_logger)
+    client.set_handler(client_handler)
+
+    return client, server_task, client_task
+
+
+# ── Fixtures ──────────────────────────────────────────────────────────
+
+
+@pytest.fixture
+async def box_server():
+    """Yield a (ws_relay_url, ActionRPCBoxClient) backed by a real BoxRuntime."""
+    runtime = BoxRuntime(logger=_logger)
+    await runtime.initialize()
+
+    # Start ws relay for managed process attach
+    ws_app = create_ws_relay_app(runtime)
+    ws_server = TestServer(ws_app)
+    await ws_server.start_server()
+
+    client, server_task, client_task = await _make_rpc_pair(runtime)
+
+    ws_relay_url = str(ws_server.make_url(''))
+    yield ws_relay_url, client
+
+    server_task.cancel()
+    client_task.cancel()
+    await runtime.shutdown()
+    await ws_server.close()
+
+
+# ── 1. Managed process lifecycle ─────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_managed_process_start_and_query(box_server):
+    """Start a managed process and query its status."""
+    ws_relay_url, client = box_server
+
+    # Create session
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-lifecycle',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Start a managed process that stays alive
+    proc_spec = BoxManagedProcessSpec(
+        command='sh',
+        args=['-c', 'while true; do sleep 1; done'],
+        cwd='/tmp',
+    )
+    info = await client.start_managed_process('mcp-int-lifecycle', proc_spec)
+    assert info.status == BoxManagedProcessStatus.RUNNING
+
+    # Query it
+    info2 = await client.get_managed_process('mcp-int-lifecycle')
+    assert info2.status == BoxManagedProcessStatus.RUNNING
+    assert info2.command == 'sh'
+
+    # Stop only the managed process while keeping the session available
+    await client.stop_managed_process('mcp-int-lifecycle')
+    with pytest.raises(BoxManagedProcessNotFoundError):
+        await client.get_managed_process('mcp-int-lifecycle')
+    session_info = await client.get_session('mcp-int-lifecycle')
+    assert session_info['session_id'] == 'mcp-int-lifecycle'
+
+    # Cleanup
+    await client.delete_session('mcp-int-lifecycle')
+
+
+# ── 2. WebSocket stdio attach ────────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_ws_stdio_attach_echo(box_server):
+    """Attach to a managed process via WebSocket and verify bidirectional IO."""
+    ws_relay_url, client = box_server
+
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-ws',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Start a cat process (echoes stdin to stdout)
+    proc_spec = BoxManagedProcessSpec(
+        command='cat',
+        args=[],
+        cwd='/tmp',
+    )
+    await client.start_managed_process('mcp-int-ws', proc_spec)
+
+    # Connect via WebSocket (ws relay)
+    ws_url = client.get_managed_process_websocket_url('mcp-int-ws', ws_relay_url)
+    session = aiohttp.ClientSession()
+    try:
+        async with session.ws_connect(ws_url) as ws:
+            # Send a line
+            await ws.send_str('hello from test')
+
+            # Expect to receive it back (cat echoes)
+            msg = await asyncio.wait_for(ws.receive(), timeout=5)
+            assert msg.type == aiohttp.WSMsgType.TEXT
+            assert 'hello from test' in msg.data
+    finally:
+        await session.close()
+
+    await client.delete_session('mcp-int-ws')
+
+
+# ── 3. Session cleanup removes container ─────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_delete_session_cleans_up(box_server):
+    """After deleting a session, it should no longer exist."""
+    ws_relay_url, client = box_server
+
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-cleanup',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Start a process
+    proc_spec = BoxManagedProcessSpec(
+        command='sleep',
+        args=['3600'],
+        cwd='/tmp',
+    )
+    await client.start_managed_process('mcp-int-cleanup', proc_spec)
+
+    # Delete
+    await client.delete_session('mcp-int-cleanup')
+
+    # Session should be gone
+    with pytest.raises(BoxSessionNotFoundError):
+        await client.get_session('mcp-int-cleanup')
+
+
+# ── 4. GET session details ────────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_get_session_returns_details(box_server):
+    """Get single session returns session details and managed process info."""
+    ws_relay_url, client = box_server
+
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-get',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Query without managed process
+    info = await client.get_session('mcp-int-get')
+    assert info['session_id'] == 'mcp-int-get'
+    assert info['image'] == _TEST_IMAGE
+    assert 'managed_process' not in info
+
+    # Start a process and query again
+    proc_spec = BoxManagedProcessSpec(
+        command='sleep',
+        args=['3600'],
+        cwd='/tmp',
+    )
+    await client.start_managed_process('mcp-int-get', proc_spec)
+
+    info2 = await client.get_session('mcp-int-get')
+    assert info2['session_id'] == 'mcp-int-get'
+    assert 'managed_process' in info2
+    assert info2['managed_process']['status'] == BoxManagedProcessStatus.RUNNING.value
+
+    await client.delete_session('mcp-int-get')
+
+
+# ── 5. Process exit detected ────────────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_process_exit_detected(box_server):
+    """When a managed process exits, its status should reflect EXITED."""
+    ws_relay_url, client = box_server
+
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-exit',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Start a process that exits immediately
+    proc_spec = BoxManagedProcessSpec(
+        command='sh',
+        args=['-c', 'echo done && exit 0'],
+        cwd='/tmp',
+    )
+    await client.start_managed_process('mcp-int-exit', proc_spec)
+
+    # Wait a bit for process to exit
+    await asyncio.sleep(2)
+
+    info = await client.get_managed_process('mcp-int-exit')
+    assert info.status == BoxManagedProcessStatus.EXITED
+    assert info.exit_code == 0
+
+    await client.delete_session('mcp-int-exit')
+
+
+# ── 6. Instance ID orphan cleanup ───────────────────────────────────
+
+
+@requires_container
+@requires_socket
+@pytest.mark.asyncio
+async def test_orphan_cleanup_preserves_own_containers(box_server):
+    """Orphan cleanup should not remove containers belonging to the current instance."""
+    ws_relay_url, client = box_server
+
+    # Create a session (container gets current instance ID label)
+    spec = BoxSpec(
+        cmd='',
+        session_id='mcp-int-orphan',
+        workdir='/tmp',
+        image=_TEST_IMAGE,
+    )
+    await client.create_session(spec)
+
+    # Verify session exists
+    sessions = await client.get_sessions()
+    assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
+
+    # Trigger status check (which doesn't clean up own containers)
+    status = await client.get_status()
+    assert status['active_sessions'] >= 1
+
+    # Our session should still exist
+    sessions = await client.get_sessions()
+    assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
+
+    await client.delete_session('mcp-int-orphan')
@@ -0,0 +1,106 @@
+from __future__ import annotations
+
+from types import SimpleNamespace
+from unittest.mock import Mock
+
+import pytest
+
+from langbot_plugin.box.client import ActionRPCBoxClient
+from langbot.pkg.box.connector import BoxRuntimeConnector
+
+
+def make_app(logger: Mock, runtime_endpoint: str = ''):
+    return SimpleNamespace(
+        logger=logger,
+        instance_config=SimpleNamespace(
+            data={
+                'box': {
+                    'backend': 'local',
+                    'runtime': {'endpoint': runtime_endpoint},
+                    'local': {
+                        'profile': 'default',
+                        'allowed_mount_roots': [],
+                        'default_workspace': '',
+                    },
+                    'e2b': {'api_key': '', 'api_url': '', 'template': ''},
+                }
+            }
+        ),
+    )
+
+
+def test_box_runtime_connector_stdio_when_no_url(monkeypatch: pytest.MonkeyPatch):
+    """Without runtime.endpoint, on a non-Docker Unix platform, use stdio."""
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    connector = BoxRuntimeConnector(make_app(Mock()))
+
+    assert connector._uses_websocket() is False
+    assert isinstance(connector.client, ActionRPCBoxClient)
+
+
+def test_box_runtime_connector_ws_when_url_configured(monkeypatch: pytest.MonkeyPatch):
+    """With an explicit runtime.endpoint, always use WebSocket."""
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    logger = Mock()
+    connector = BoxRuntimeConnector(make_app(logger, runtime_endpoint='http://box-runtime:5410'))
+
+    assert connector._uses_websocket() is True
+    assert isinstance(connector.client, ActionRPCBoxClient)
+
+
+def test_box_runtime_connector_ws_in_docker(monkeypatch: pytest.MonkeyPatch):
+    """Inside Docker (no explicit URL), use WebSocket to reach a sibling container."""
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'docker')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    connector = BoxRuntimeConnector(make_app(Mock()))
+
+    assert connector._uses_websocket() is True
+    assert connector.ws_relay_base_url == 'http://langbot_box:5410'
+
+
+def test_box_runtime_connector_ws_with_standalone_flag(monkeypatch: pytest.MonkeyPatch):
+    """With --standalone-box flag, use WebSocket even on a local Unix platform."""
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', True)
+    connector = BoxRuntimeConnector(make_app(Mock()))
+
+    assert connector._uses_websocket() is True
+
+
+def test_box_runtime_connector_ws_relay_url_default(monkeypatch: pytest.MonkeyPatch):
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    connector = BoxRuntimeConnector(make_app(Mock()))
+
+    assert connector.ws_relay_base_url == 'http://127.0.0.1:5410'
+
+
+def test_box_runtime_connector_ws_relay_url_explicit(monkeypatch: pytest.MonkeyPatch):
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    connector = BoxRuntimeConnector(make_app(Mock(), runtime_endpoint='http://box-runtime:5410'))
+    assert connector.ws_relay_base_url == 'http://box-runtime:5410'
+
+
+def test_box_runtime_connector_dispose_terminates_subprocess(monkeypatch: pytest.MonkeyPatch):
+    monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
+    monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
+    logger = Mock()
+    connector = BoxRuntimeConnector(make_app(logger))
+    subprocess = Mock()
+    subprocess.returncode = None
+    handler_task = Mock()
+    ctrl_task = Mock()
+    connector._subprocess = subprocess
+    connector._handler_task = handler_task
+    connector._ctrl_task = ctrl_task
+
+    connector.dispose()
+
+    subprocess.terminate.assert_called_once()
+    handler_task.cancel.assert_called_once()
+    ctrl_task.cancel.assert_called_once()
+    assert connector._handler_task is None
+    assert connector._ctrl_task is None
@@ -0,0 +1,147 @@
+from __future__ import annotations
+
+import os
+import tempfile
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+from langbot.pkg.box.workspace import (
+    BoxWorkspaceSession,
+    classify_python_workspace,
+    infer_workspace_host_path,
+    rewrite_mounted_path,
+    wrap_python_command_with_env,
+)
+
+
+def test_rewrite_mounted_path_translates_host_prefix():
+    result = rewrite_mounted_path('/tmp/demo/project/app.py', '/tmp/demo/project')
+    assert result == '/workspace/app.py'
+
+
+def test_infer_workspace_host_path_unwraps_virtualenv_bin_dir():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        project_root = os.path.join(tmpdir, 'project')
+        os.makedirs(os.path.join(project_root, '.venv', 'bin'))
+        python_bin = os.path.join(project_root, '.venv', 'bin', 'python')
+        script = os.path.join(project_root, 'server.py')
+
+        with open(python_bin, 'w', encoding='utf-8') as handle:
+            handle.write('')
+        with open(script, 'w', encoding='utf-8') as handle:
+            handle.write('print("ok")\n')
+
+        result = infer_workspace_host_path(python_bin, [script])
+
+        assert result == os.path.realpath(project_root)
+
+
+def test_classify_python_workspace_detects_package_and_requirements():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        assert classify_python_workspace(tmpdir) is None
+
+        with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as handle:
+            handle.write('requests\n')
+        assert classify_python_workspace(tmpdir) == 'requirements'
+
+        with open(os.path.join(tmpdir, 'pyproject.toml'), 'w', encoding='utf-8') as handle:
+            handle.write('[project]\nname = "demo"\n')
+        assert classify_python_workspace(tmpdir) == 'package'
+
+
+def test_wrap_python_command_with_env_contains_bootstrap_and_command():
+    command = wrap_python_command_with_env('python script.py')
+
+    assert 'python -m venv "$_LB_VENV_DIR"' in command
+    assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
+    assert command.rstrip().endswith('python script.py')
+
+
+@pytest.mark.asyncio
+async def test_workspace_session_execute_for_query_uses_session_payload():
+    box_service = SimpleNamespace(execute_spec_payload=AsyncMock(return_value={'ok': True}))
+    workspace = BoxWorkspaceSession(
+        box_service,
+        'skill-person_123-demo',
+        host_path='/tmp/project',
+        host_path_mode='rw',
+        env={'FOO': 'bar'},
+    )
+
+    query = SimpleNamespace(query_id='q1')
+    result = await workspace.execute_for_query(query, 'python run.py', workdir='/workspace', timeout_sec=30)
+
+    assert result == {'ok': True}
+    payload = box_service.execute_spec_payload.await_args.args[0]
+    assert payload == {
+        'session_id': 'skill-person_123-demo',
+        'workdir': '/workspace',
+        'env': {'FOO': 'bar'},
+        'persistent': False,
+        'host_path': '/tmp/project',
+        'host_path_mode': 'rw',
+        'cmd': 'python run.py',
+        'timeout_sec': 30,
+    }
+
+
+@pytest.mark.asyncio
+async def test_workspace_session_start_managed_process_rewrites_command_and_args():
+    box_service = SimpleNamespace(start_managed_process=AsyncMock(return_value={'status': 'running'}))
+    workspace = BoxWorkspaceSession(
+        box_service,
+        'mcp-u1',
+        host_path='/tmp/project',
+        host_path_mode='ro',
+    )
+
+    result = await workspace.start_managed_process(
+        '/tmp/project/.venv/bin/python',
+        ['/tmp/project/server.py', '--config', '/tmp/project/config.json'],
+        env={'TOKEN': '1'},
+    )
+
+    assert result == {'status': 'running'}
+    session_id = box_service.start_managed_process.await_args.args[0]
+    payload = box_service.start_managed_process.await_args.args[1]
+    assert session_id == 'mcp-u1'
+    assert payload == {
+        'command': 'python',
+        'args': ['/workspace/server.py', '--config', '/workspace/config.json'],
+        'env': {'TOKEN': '1'},
+        'cwd': '/workspace',
+        'process_id': 'default',
+    }
+
+
+def test_workspace_session_build_session_payload_keeps_generic_workspace_shape():
+    workspace = BoxWorkspaceSession(
+        Mock(),
+        'workspace-1',
+        host_path='/tmp/project',
+        host_path_mode='rw',
+        env={'FOO': 'bar'},
+        network='on',
+        read_only_rootfs=False,
+        image='python:3.11',
+        cpus=1.0,
+        memory_mb=512,
+        pids_limit=128,
+    )
+
+    assert workspace.build_session_payload() == {
+        'session_id': 'workspace-1',
+        'workdir': '/workspace',
+        'env': {'FOO': 'bar'},
+        'persistent': False,
+        'network': 'on',
+        'read_only_rootfs': False,
+        'host_path': '/tmp/project',
+        'host_path_mode': 'rw',
+        'image': 'python:3.11',
+        'cpus': 1.0,
+        'memory_mb': 512,
+        'pids_limit': 128,
+    }
@@ -12,6 +12,12 @@ from __future__ import annotations
 import pytest
 from unittest.mock import AsyncMock, Mock

+# Preload pipelinemgr so the pipeline.stage module is fully initialised before
+# any individual stage test (e.g. preproc, longtext) tries to import it. Without
+# this, running a stage test in isolation triggers a circular-import error:
+#   stage.py → core.app → pipelinemgr → stage.stage_class (not yet bound).
+import langbot.pkg.pipeline.pipelinemgr  # noqa: F401
+
 import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
 import langbot_plugin.api.entities.builtin.platform.message as platform_message
 import langbot_plugin.api.entities.builtin.platform.events as platform_events
@@ -34,6 +40,9 @@ class MockApplication:
        self.query_pool = self._create_mock_query_pool()
        self.instance_config = self._create_mock_instance_config()
        self.task_mgr = self._create_mock_task_manager()
+        # Skill manager is optional; PreProcessor only touches it for the
+        # local-agent runner. None keeps the skill-binding branch inert.
+        self.skill_mgr = None

    def _create_mock_logger(self):
        logger = Mock()
@@ -0,0 +1,78 @@
+from __future__ import annotations
+
+from unittest.mock import Mock
+
+import pytest
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+# TODO: unskip once the handler ↔ app circular import is resolved
+pytest.skip(
+    'circular import in handler ↔ app; will be unblocked once resolved',
+    allow_module_level=True,
+)
+
+from langbot.pkg.pipeline.process.handler import MessageHandler  # noqa: E402
+
+
+class _StubHandler(MessageHandler):
+    async def handle(self, query):
+        raise NotImplementedError
+
+
+handler = _StubHandler(ap=Mock())
+
+
+def test_chat_handler_formats_tool_call_request_log():
+    result = provider_message.Message(
+        role='assistant',
+        content='',
+        tool_calls=[
+            provider_message.ToolCall(
+                id='call-1',
+                type='function',
+                function=provider_message.FunctionCall(name='exec', arguments='{}'),
+            )
+        ],
+    )
+
+    summary = handler.format_result_log(result)
+
+    assert summary == 'assistant: requested tools: exec'
+
+
+def test_chat_handler_formats_tool_result_log():
+    result = provider_message.Message(
+        role='tool',
+        content='{"status":"completed","exit_code":0,"backend":"podman","stdout":"42\\n"}',
+        tool_call_id='call-1',
+    )
+
+    summary = handler.format_result_log(result)
+
+    # Tool results use generic cut_str truncation
+    assert summary is not None
+    assert summary.startswith('tool: {"status":"com')
+    assert summary.endswith('...')
+
+
+def test_chat_handler_formats_tool_error_log():
+    result = provider_message.MessageChunk(
+        role='tool',
+        content='err: host_path must point to an existing directory on the host',
+        tool_call_id='call-1',
+        is_final=True,
+    )
+
+    summary = handler.format_result_log(result)
+
+    assert summary is not None
+    assert summary.startswith('tool error: err: host_path must')
+    assert summary.endswith('...')
+
+
+def test_chat_handler_skips_empty_assistant_log():
+    result = provider_message.Message(role='assistant', content='')
+
+    summary = handler.format_result_log(result)
+
+    assert summary is None
@@ -14,33 +14,43 @@ import json
 import sys
 from unittest.mock import AsyncMock, MagicMock, Mock, patch

-# Break the circular import chain before importing n8nsvapi:
+import pytest
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+
+# Break the circular import chain while importing n8nsvapi:
 #   n8nsvapi → runner → app → pipelinemgr → all runners → runner (partially init)
-_mock_runner = MagicMock()
-_mock_runner.runner_class = lambda name: (lambda cls: cls)  # no-op decorator
-_mock_runner.RequestRunner = object
-_mocked_imports = {
-    'langbot.pkg.provider.runner': _mock_runner,
+# The stubs are restored in a ``finally`` block so this module does NOT pollute
+# sys.modules for other test modules (e.g. ones importing the real
+# LocalAgentRunner, which would otherwise inherit ``object`` and break).
+# Mirrors master's intent but uses try/finally so a raised import doesn't
+# leave the global namespace in a stubbed state, and includes
+# ``langbot.pkg.utils.httpclient`` which master didn't stub.
+_runner_stub = MagicMock()
+_runner_stub.runner_class = lambda name: (lambda cls: cls)  # no-op decorator
+_runner_stub.RequestRunner = object
+_import_stubs = {
+    'langbot.pkg.provider.runner': _runner_stub,
    'langbot.pkg.core.app': MagicMock(),
+    'langbot.pkg.utils.httpclient': MagicMock(),
 }
-_original_imports = {name: sys.modules.get(name) for name in _mocked_imports}
-sys.modules.update(_mocked_imports)
-
-import pytest  # noqa: E402
-import langbot_plugin.api.entities.builtin.provider.message as provider_message  # noqa: E402
-from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner  # noqa: E402
-
-for _name, _original in _original_imports.items():
-    if _original is None:
-        sys.modules.pop(_name, None)
-    else:
-        sys.modules[_name] = _original
+_saved_modules = {name: sys.modules.get(name) for name in _import_stubs}
+for _name, _stub in _import_stubs.items():
+    sys.modules[_name] = _stub
+try:
+    from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner
+finally:
+    for _name, _original in _saved_modules.items():
+        if _original is None:
+            sys.modules.pop(_name, None)
+        else:
+            sys.modules[_name] = _original


 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------

+
 def make_runner(output_key: str = 'response') -> N8nServiceAPIRunner:
    ap = Mock()
    ap.logger = Mock()
@@ -83,6 +93,7 @@ async def collect_chunks(runner: N8nServiceAPIRunner, chunks: list[bytes | str])
 # _process_response: stream format (type:item/end)
 # ---------------------------------------------------------------------------

+
@pytest.mark.asyncio
 async def test_stream_format_single_item():
    """Single item + end in one chunk yields final chunk with full content."""
@@ -165,6 +176,7 @@ async def test_stream_format_no_spurious_empty_yield():
 # _process_response: plain JSON fallback
 # ---------------------------------------------------------------------------

+
@pytest.mark.asyncio
 async def test_plain_json_with_output_key():
    """Plain JSON with matching output_key extracts value via output_key."""
@@ -235,6 +247,7 @@ async def test_invalid_json_returns_raw_text():
 # _call_webhook: output type depends on is_stream
 # ---------------------------------------------------------------------------

+
 def make_query(is_stream: bool):
    """Build a minimal Query mock."""
    query = Mock()
@@ -0,0 +1,242 @@
+from __future__ import annotations
+
+import json
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
+import langbot_plugin.api.entities.builtin.provider.message as provider_message
+import langbot_plugin.api.entities.builtin.provider.session as provider_session
+
+from langbot.pkg.provider.runners.localagent import LocalAgentRunner
+
+
+class RecordingProvider:
+    def __init__(self):
+        self.requests: list[dict] = []
+
+    async def invoke_llm(self, query, model, messages, funcs, extra_args=None, remove_think=None):
+        self.requests.append(
+            {
+                'messages': list(messages),
+                'funcs': list(funcs),
+                'remove_think': remove_think,
+            }
+        )
+
+        if len(self.requests) == 1:
+            return provider_message.Message(
+                role='assistant',
+                content='Let me calculate that exactly.',
+                tool_calls=[
+                    provider_message.ToolCall(
+                        id='call-1',
+                        type='function',
+                        function=provider_message.FunctionCall(
+                            name='exec',
+                            arguments=json.dumps(
+                                {'command': ("python - <<'PY'\nnums = [1, 2, 3, 4]\nprint(sum(nums) / len(nums))\nPY")}
+                            ),
+                        ),
+                    )
+                ],
+            )
+
+        tool_result = json.loads(messages[-1].content)
+        return provider_message.Message(
+            role='assistant',
+            content=f'The average is {tool_result["stdout"]}.',
+        )
+
+
+class RecordingStreamProvider:
+    def __init__(self):
+        self.stream_requests: list[dict] = []
+
+    def invoke_llm_stream(self, query, model, messages, funcs, extra_args=None, remove_think=None):
+        self.stream_requests.append(
+            {
+                'messages': list(messages),
+                'funcs': list(funcs),
+                'remove_think': remove_think,
+            }
+        )
+
+        async def _stream():
+            if len(self.stream_requests) == 1:
+                yield provider_message.MessageChunk(
+                    role='assistant',
+                    tool_calls=[
+                        provider_message.ToolCall(
+                            id='call-1',
+                            type='function',
+                            function=provider_message.FunctionCall(
+                                name='exec',
+                                arguments=json.dumps({'command': "python -c 'print(1)'"}),
+                            ),
+                        )
+                    ],
+                    is_final=True,
+                )
+                return
+
+            yield provider_message.MessageChunk(
+                role='assistant',
+                content='Tool execution failed.',
+                is_final=True,
+            )
+
+        return _stream()
+
+
+def make_query() -> pipeline_query.Query:
+    adapter = AsyncMock()
+    adapter.is_stream_output_supported = AsyncMock(return_value=False)
+
+    return pipeline_query.Query.model_construct(
+        query_id='avg-query',
+        launcher_type=provider_session.LauncherTypes.PERSON,
+        launcher_id=12345,
+        sender_id=12345,
+        message_chain=[],
+        message_event=None,
+        adapter=adapter,
+        pipeline_uuid='pipeline-uuid',
+        bot_uuid='bot-uuid',
+        pipeline_config={
+            'ai': {
+                'runner': {'runner': 'local-agent'},
+                'local-agent': {'model': {'primary': 'test-model-uuid', 'fallbacks': []}, 'prompt': 'test-prompt'},
+            },
+            'output': {'misc': {'remove-think': False}},
+        },
+        prompt=SimpleNamespace(messages=[]),
+        messages=[],
+        user_message=provider_message.Message(
+            role='user',
+            content='Please calculate the average of 1, 2, 3, and 4.',
+        ),
+        use_funcs=[SimpleNamespace(name='exec')],
+        use_llm_model_uuid='test-model-uuid',
+        variables={},
+    )
+
+
+@pytest.mark.asyncio
+async def test_localagent_uses_exec_for_exact_calculation():
+    provider = RecordingProvider()
+    model = SimpleNamespace(
+        provider=provider,
+        model_entity=SimpleNamespace(
+            uuid='test-model-uuid',
+            name='test-model',
+            abilities=['func_call'],
+            extra_args={},
+        ),
+    )
+
+    tool_manager = SimpleNamespace(
+        execute_func_call=AsyncMock(
+            return_value={
+                'session_id': 'avg-query',
+                'backend': 'podman',
+                'status': 'completed',
+                'ok': True,
+                'exit_code': 0,
+                'stdout': '2.5',
+                'stderr': '',
+                'duration_ms': 18,
+            }
+        )
+    )
+
+    app = SimpleNamespace(
+        logger=Mock(),
+        model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
+        tool_mgr=tool_manager,
+        rag_mgr=SimpleNamespace(),
+        box_service=SimpleNamespace(
+            get_system_guidance=Mock(
+                return_value=(
+                    'When the exec tool is available, use it for exact calculations, statistics, '
+                    'structured data parsing, and code execution instead of estimating mentally. '
+                    'Unless the user explicitly asks for the script, code, or implementation details, '
+                    'do not include the generated script in the final answer. '
+                    'A default workspace is mounted at /workspace for file tasks.'
+                )
+            ),
+        ),
+        skill_mgr=SimpleNamespace(
+            get_skills_for_pipeline=AsyncMock(return_value=[]),
+            detect_skill_activation=AsyncMock(return_value=None),
+            build_activation_prompt=Mock(return_value=None),
+        ),
+    )
+
+    runner = LocalAgentRunner(app, pipeline_config={})
+    query = make_query()
+
+    results = [message async for message in runner.run(query)]
+
+    assert [message.role for message in results] == ['assistant', 'tool', 'assistant']
+    assert results[-1].content == 'The average is 2.5.'
+
+    tool_manager.execute_func_call.assert_awaited_once()
+    tool_name, tool_parameters = tool_manager.execute_func_call.await_args.args[:2]
+    assert tool_name == 'exec'
+    assert 'print(sum(nums) / len(nums))' in tool_parameters['command']
+
+    first_request = provider.requests[0]
+    assert any(
+        message.role == 'system'
+        and 'exec' in str(message.content)
+        and 'exact calculations' in str(message.content)
+        and 'Unless the user explicitly asks for the script' in str(message.content)
+        and '/workspace' in str(message.content)
+        for message in first_request['messages']
+    )
+    assert [tool.name for tool in first_request['funcs']] == ['exec']
+
+
+@pytest.mark.asyncio
+async def test_localagent_streaming_tool_error_yields_message_chunks():
+    provider = RecordingStreamProvider()
+    model = SimpleNamespace(
+        provider=provider,
+        model_entity=SimpleNamespace(
+            uuid='test-model-uuid',
+            name='test-model',
+            abilities=['func_call'],
+            extra_args={},
+        ),
+    )
+
+    adapter = AsyncMock()
+    adapter.is_stream_output_supported = AsyncMock(return_value=True)
+
+    query = make_query()
+    query.adapter = adapter
+
+    app = SimpleNamespace(
+        logger=Mock(),
+        model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
+        tool_mgr=SimpleNamespace(execute_func_call=AsyncMock(side_effect=RuntimeError('boom'))),
+        rag_mgr=SimpleNamespace(),
+        box_service=SimpleNamespace(
+            get_system_guidance=Mock(return_value='sandbox guidance'),
+        ),
+        skill_mgr=SimpleNamespace(
+            get_skills_for_pipeline=AsyncMock(return_value=[]),
+            detect_skill_activation=AsyncMock(return_value=None),
+            build_activation_prompt=Mock(return_value=None),
+        ),
+    )
+
+    runner = LocalAgentRunner(app, pipeline_config={})
+
+    results = [message async for message in runner.run(query)]
+
+    assert all(isinstance(message, provider_message.MessageChunk) for message in results)
+    assert any(message.role == 'tool' and message.content == 'err: boom' for message in results)
@@ -0,0 +1,712 @@
+"""Tests for MCP Box integration: path rewriting, host_path inference, config model, payloads.
+
+Uses importlib.util.spec_from_file_location to load mcp.py directly without
+triggering the circular import chain through the app module.
+"""
+
+from __future__ import annotations
+
+import importlib
+import importlib.util
+import os
+import sys
+import tempfile
+import types
+from contextlib import asynccontextmanager
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# Load mcp.py directly from file path, with stub dependencies
+# ---------------------------------------------------------------------------
+
+
+def _stub_module(fqn: str, attrs: dict | None = None, is_package: bool = False):
+    """Create or return a stub module and register it in sys.modules."""
+    if fqn in sys.modules:
+        mod = sys.modules[fqn]
+    else:
+        mod = types.ModuleType(fqn)
+        mod.__spec__ = importlib.machinery.ModuleSpec(fqn, None, is_package=is_package)
+        if is_package:
+            mod.__path__ = []
+        sys.modules[fqn] = mod
+    parts = fqn.rsplit('.', 1)
+    if len(parts) == 2 and parts[0] in sys.modules:
+        setattr(sys.modules[parts[0]], parts[1], mod)
+    if attrs:
+        for k, v in attrs.items():
+            setattr(mod, k, v)
+    return mod
+
+
+@pytest.fixture(scope='module', autouse=True)
+def mcp_module():
+    """Load mcp.py with minimal stubs to avoid circular imports."""
+    saved = {}
+
+    def _save_and_stub(name, attrs=None, is_package=False):
+        saved[name] = sys.modules.get(name)
+        # Don't overwrite modules that already exist (from other test modules)
+        if name in sys.modules:
+            return
+        _stub_module(name, attrs, is_package)
+
+    # Stub entire dependency chains as packages / modules
+    _save_and_stub('langbot_plugin', is_package=True)
+    _save_and_stub('langbot_plugin.api', is_package=True)
+    _save_and_stub('langbot_plugin.api.entities', is_package=True)
+    _save_and_stub('langbot_plugin.api.entities.events', is_package=True)
+    _save_and_stub('langbot_plugin.api.entities.events.pipeline_query', {})
+    _save_and_stub('langbot_plugin.api.entities.builtin', is_package=True)
+    _save_and_stub('langbot_plugin.api.entities.builtin.resource', is_package=True)
+    _save_and_stub(
+        'langbot_plugin.api.entities.builtin.resource.tool',
+        {
+            'LLMTool': type('LLMTool', (), {}),
+        },
+    )
+    _save_and_stub('langbot_plugin.api.entities.builtin.provider', is_package=True)
+    _save_and_stub('langbot_plugin.api.entities.builtin.provider.message', {})
+    _save_and_stub('sqlalchemy', {'select': Mock()})
+    _save_and_stub('httpx', {'AsyncClient': Mock()})
+    _save_and_stub('mcp', {'ClientSession': Mock, 'StdioServerParameters': Mock}, is_package=True)
+    _save_and_stub('mcp.client', is_package=True)
+    _save_and_stub('mcp.client.stdio', {'stdio_client': Mock()})
+    _save_and_stub('mcp.client.sse', {'sse_client': Mock()})
+    _save_and_stub('mcp.client.streamable_http', {'streamable_http_client': Mock()})
+    _save_and_stub('mcp.client.websocket', {'websocket_client': Mock()})
+
+    # Stub the provider.tools.loader (source of circular import)
+    _save_and_stub('langbot', is_package=True)
+    _save_and_stub('langbot.pkg', is_package=True)
+    _save_and_stub('langbot.pkg.provider', is_package=True)
+    _save_and_stub('langbot.pkg.provider.tools', is_package=True)
+    _save_and_stub(
+        'langbot.pkg.provider.tools.loader',
+        {
+            'ToolLoader': type('ToolLoader', (), {'__init__': lambda self, ap: None}),
+        },
+    )
+    _save_and_stub('langbot.pkg.provider.tools.loaders', is_package=True)
+    _save_and_stub('langbot.pkg.core', is_package=True)
+    _save_and_stub('langbot.pkg.core.app', {'Application': type('Application', (), {})})
+    _save_and_stub('langbot.pkg.entity', is_package=True)
+    _save_and_stub('langbot.pkg.entity.persistence', is_package=True)
+    _save_and_stub('langbot.pkg.entity.persistence.mcp', {})
+
+    # box models
+    import enum as _enum
+
+    class _BPS(str, _enum.Enum):
+        RUNNING = 'running'
+        EXITED = 'exited'
+
+    _save_and_stub('langbot_plugin.box', is_package=True)
+    _save_and_stub('langbot_plugin.box.models', {'BoxManagedProcessStatus': _BPS})
+
+    # Now load mcp.py via spec_from_file_location
+    mod_fqn = 'langbot.pkg.provider.tools.loaders.mcp'
+    sys.modules.pop(mod_fqn, None)
+    mcp_path = os.path.join(
+        os.path.dirname(__file__),
+        '..',
+        '..',
+        '..',
+        'src',
+        'langbot',
+        'pkg',
+        'provider',
+        'tools',
+        'loaders',
+        'mcp.py',
+    )
+    mcp_path = os.path.normpath(mcp_path)
+    pkg_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(mcp_path))))
+    sys.modules['langbot.pkg'].__path__ = [pkg_root]
+    sys.modules['langbot.pkg.provider.tools.loaders'].__path__ = [os.path.dirname(mcp_path)]
+    spec = importlib.util.spec_from_file_location(mod_fqn, mcp_path)
+    mod = importlib.util.module_from_spec(spec)
+    sys.modules[mod_fqn] = mod
+    spec.loader.exec_module(mod)
+
+    yield mod
+
+    # Cleanup
+    sys.modules.pop(mod_fqn, None)
+    sys.modules.pop('langbot.pkg.provider.tools.loaders.mcp_stdio', None)
+    sys.modules.pop('langbot.pkg.box.workspace', None)
+    for name in reversed(list(saved)):
+        if saved[name] is None:
+            sys.modules.pop(name, None)
+        else:
+            sys.modules[name] = saved[name]
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_ap():
+    ap = Mock()
+    ap.logger = Mock()
+    ap.box_service = Mock()
+    return ap
+
+
+def _make_session(mcp_module, server_config: dict, ap=None):
+    if ap is None:
+        ap = _make_ap()
+    return mcp_module.RuntimeMCPSession(
+        server_name=server_config.get('name', 'test-server'),
+        server_config=server_config,
+        enable=True,
+        ap=ap,
+    )
+
+
+# ── MCPServerBoxConfig ──────────────────────────────────────────────
+
+
+class TestMCPServerBoxConfig:
+    def test_default_values(self, mcp_module):
+        cfg = mcp_module.MCPServerBoxConfig.model_validate({})
+        assert cfg.image is None
+        assert cfg.network == 'on'
+        assert cfg.host_path is None
+        assert cfg.host_path_mode == 'ro'
+        assert cfg.env == {}
+        assert cfg.startup_timeout_sec == 120
+        assert cfg.cpus is None
+        assert cfg.memory_mb is None
+        assert cfg.pids_limit is None
+        assert cfg.read_only_rootfs is None
+
+    def test_custom_values(self, mcp_module):
+        cfg = mcp_module.MCPServerBoxConfig.model_validate(
+            {
+                'image': 'node:20',
+                'network': 'on',
+                'host_path': '/home/user/mcp',
+                'host_path_mode': 'rw',
+                'env': {'FOO': 'bar'},
+                'startup_timeout_sec': 60,
+                'cpus': 2.0,
+                'memory_mb': 1024,
+                'pids_limit': 256,
+                'read_only_rootfs': False,
+            }
+        )
+        assert cfg.image == 'node:20'
+        assert cfg.network == 'on'
+        assert cfg.cpus == 2.0
+        assert cfg.memory_mb == 1024
+
+    def test_extra_fields_ignored(self, mcp_module):
+        cfg = mcp_module.MCPServerBoxConfig.model_validate(
+            {
+                'image': 'node:20',
+                'unknown_field': 'whatever',
+            }
+        )
+        assert cfg.image == 'node:20'
+        assert not hasattr(cfg, 'unknown_field')
+
+
+# ── Path Rewriting ──────────────────────────────────────────────────
+
+
+class TestRewritePath:
+    def test_no_host_path_returns_unchanged(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        assert s._rewrite_path('/some/path', None) == '/some/path'
+
+    def test_empty_path_returns_empty(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        assert s._rewrite_path('', '/home/user/mcp') == ''
+
+    def test_prefix_match_rewrites(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        result = s._rewrite_path('/home/user/mcp/server.py', '/home/user/mcp')
+        assert result == '/workspace/server.py'
+
+    def test_exact_match_rewrites_to_workspace(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        result = s._rewrite_path('/home/user/mcp', '/home/user/mcp')
+        assert result == '/workspace'
+
+    def test_non_matching_path_unchanged(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        result = s._rewrite_path('/opt/other/server.py', '/home/user/mcp')
+        assert result == '/opt/other/server.py'
+
+    def test_similar_prefix_not_rewritten(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        result = s._rewrite_path('/home/user/mcp-other/file.py', '/home/user/mcp')
+        assert result == '/home/user/mcp-other/file.py'
+
+    def test_nested_subpath_rewrites(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        result = s._rewrite_path('/home/user/mcp/src/lib/main.py', '/home/user/mcp')
+        assert result == '/workspace/src/lib/main.py'
+
+
+# ── host_path Inference ─────────────────────────────────────────────
+
+
+class TestInferHostPath:
+    def test_no_absolute_paths_returns_none(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': ['server.py'],
+            },
+        )
+        assert s._infer_host_path() is None
+
+    def test_nonexistent_path_returns_none(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': '/nonexistent/path/to/python',
+                'args': [],
+            },
+        )
+        assert s._infer_host_path() is None
+
+    def test_existing_absolute_path_infers_directory(self, mcp_module):
+        with tempfile.NamedTemporaryFile(suffix='.py') as f:
+            s = _make_session(
+                mcp_module,
+                {
+                    'name': 'test',
+                    'uuid': 'u1',
+                    'mode': 'sse',
+                    'command': 'python',
+                    'args': [f.name],
+                },
+            )
+            result = s._infer_host_path()
+            assert result is not None
+            assert result == os.path.dirname(os.path.realpath(f.name))
+
+
+# ── Build Box Session Payload ───────────────────────────────────────
+
+
+class TestBuildBoxSessionPayload:
+    def test_minimal_config(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        payload = s._build_box_session_payload('session-123')
+        assert payload['session_id'] == 'session-123'
+        assert payload['workdir'] == '/workspace'
+        assert payload['env'] == {}
+        assert 'host_path' not in payload
+
+    def test_with_host_path(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+                'box': {'host_path': '/home/user/mcp', 'host_path_mode': 'ro'},
+            },
+        )
+        payload = s._build_box_session_payload('session-123')
+        assert payload['host_path'] == '/home/user/mcp'
+        assert payload['host_path_mode'] == 'ro'
+
+    def test_optional_fields_included_when_set(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+                'box': {'image': 'node:20', 'cpus': 2.0, 'memory_mb': 1024, 'pids_limit': 256},
+            },
+        )
+        payload = s._build_box_session_payload('session-123')
+        assert payload['image'] == 'node:20'
+        assert payload['cpus'] == 2.0
+        assert payload['memory_mb'] == 1024
+        assert payload['pids_limit'] == 256
+
+    def test_none_fields_excluded(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        payload = s._build_box_session_payload('session-123')
+        assert 'image' not in payload
+        assert 'cpus' not in payload
+
+
+# ── Build Box Process Payload ───────────────────────────────────────
+
+
+class TestBuildBoxProcessPayload:
+    def test_basic_payload(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': ['server.py'],
+                'env': {'KEY': 'val'},
+            },
+        )
+        payload = s._build_box_process_payload()
+        assert payload['command'] == 'python'
+        assert payload['args'] == ['server.py']
+        assert payload['env'] == {'KEY': 'val'}
+        assert payload['cwd'] == '/workspace'
+
+    def test_path_rewriting_applied(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': '/home/user/mcp/venv/bin/python',
+                'args': ['/home/user/mcp/server.py', '--config', '/home/user/mcp/config.json'],
+                'env': {},
+                'box': {'host_path': '/home/user/mcp'},
+            },
+        )
+        payload = s._build_box_process_payload()
+        # venv python is replaced with plain 'python' (deps installed in-container)
+        assert payload['command'] == 'python'
+        assert payload['args'] == ['/workspace/server.py', '--config', '/workspace/config.json']
+
+    def test_non_matching_args_not_rewritten(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': ['/opt/other/server.py', '--flag'],
+                'env': {},
+                'box': {'host_path': '/home/user/mcp'},
+            },
+        )
+        payload = s._build_box_process_payload()
+        assert payload['command'] == 'python'
+        assert payload['args'] == ['/opt/other/server.py', '--flag']
+
+
+# ── get_runtime_info_dict ───────────────────────────────────────────
+
+
+class TestGetRuntimeInfoDict:
+    def test_non_stdio_session(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'test-uuid',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        info = s.get_runtime_info_dict()
+        assert info['status'] == 'connecting'
+        assert 'box_session_id' not in info
+
+    def test_runtime_tools_include_parameters(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'test-uuid',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        s.functions = [
+            SimpleNamespace(
+                name='create-service',
+                description='Create a service',
+                parameters={
+                    'type': 'object',
+                    'properties': {
+                        'project_id': {'type': 'string'},
+                    },
+                    'required': ['project_id'],
+                },
+            )
+        ]
+
+        info = s.get_runtime_info_dict()
+
+        assert info['tools'][0]['parameters']['properties']['project_id']['type'] == 'string'
+        assert info['tools'][0]['parameters']['required'] == ['project_id']
+
+    def test_stdio_session_includes_box_info(self, mcp_module):
+        ap = _make_ap()
+        ap.box_service.available = True
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'test-uuid',
+                'mode': 'stdio',
+                'command': 'python',
+                'args': [],
+            },
+            ap=ap,
+        )
+        info = s.get_runtime_info_dict()
+        assert info['box_session_id'] == 'mcp-shared'
+        assert info['box_enabled'] is True
+
+    def test_stdio_session_refuses_when_box_unavailable(self, mcp_module):
+        """Policy: when Box is configured but unavailable (disabled in config
+        OR connection failed), stdio MCP servers are NOT treated as box-stdio.
+        ``_init_stdio_python_server`` will raise a clear refusal at start
+        time; until then, the runtime info simply omits box_session_id so the
+        UI can render the disabled state cleanly."""
+        ap = _make_ap()
+        ap.box_service.available = False
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'test-uuid',
+                'mode': 'stdio',
+                'command': 'python',
+                'args': [],
+            },
+            ap=ap,
+        )
+        info = s.get_runtime_info_dict()
+        assert 'box_session_id' not in info
+        assert 'box_enabled' not in info
+
+    def test_stdio_session_without_box_service_uses_local_stdio(self, mcp_module):
+        ap = _make_ap()
+        del ap.box_service
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'test-uuid',
+                'mode': 'stdio',
+                'command': 'python',
+                'args': [],
+            },
+            ap=ap,
+        )
+        info = s.get_runtime_info_dict()
+        assert 'box_session_id' not in info
+
+
+# ── Box config parsing ──────────────────────────────────────────────
+
+
+class TestBoxConfigParsing:
+    def test_box_config_parsed_from_server_config(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+                'box': {'image': 'node:20', 'host_path': '/home/user/mcp'},
+            },
+        )
+        assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
+        assert s.box_config.image == 'node:20'
+        assert s.box_config.host_path == '/home/user/mcp'
+
+    def test_missing_box_key_uses_defaults(self, mcp_module):
+        s = _make_session(
+            mcp_module,
+            {
+                'name': 'test',
+                'uuid': 'u1',
+                'mode': 'sse',
+                'command': 'python',
+                'args': [],
+            },
+        )
+        assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
+        assert s.box_config.image is None
+        assert s.box_config.host_path_mode == 'ro'
+
+
+@pytest.mark.asyncio
+async def test_init_box_stdio_server_stages_host_path_in_shared_workspace(mcp_module, tmp_path):
+    mcp_stdio_module = sys.modules['langbot.pkg.provider.tools.loaders.mcp_stdio']
+
+    class FakeClientSession:
+        def __init__(self, *_args):
+            pass
+
+        async def __aenter__(self):
+            return self
+
+        async def __aexit__(self, exc_type, exc, tb):
+            return False
+
+        async def initialize(self):
+            return None
+
+    @asynccontextmanager
+    async def fake_websocket_client(_url: str):
+        yield ('read-stream', 'write-stream')
+
+    mcp_stdio_module.ClientSession = FakeClientSession
+    mcp_stdio_module.websocket_client = fake_websocket_client
+
+    ap = _make_ap()
+    ap.box_service.available = True
+    ap.box_service.default_workspace = str(tmp_path / 'shared-box-workspace')
+    ap.box_service.create_session = AsyncMock(return_value={})
+    ap.box_service.build_spec = Mock(return_value='validated-spec')
+    ap.box_service.client = SimpleNamespace(
+        execute=AsyncMock(return_value=SimpleNamespace(ok=True, stderr='', exit_code=0))
+    )
+    ap.box_service.start_managed_process = AsyncMock(return_value={})
+    ap.box_service.get_managed_process_websocket_url = Mock(return_value='ws://box.example/process')
+
+    host_path = tmp_path / 'mcp-source'
+    host_path.mkdir()
+    server_file = host_path / 'server.py'
+    server_file.write_text('print("hello")\n', encoding='utf-8')
+
+    session = _make_session(
+        mcp_module,
+        {
+            'name': 'test',
+            'uuid': 'u1',
+            'mode': 'stdio',
+            'command': str(host_path / '.venv' / 'bin' / 'python'),
+            'args': [str(server_file)],
+            'box': {'host_path': str(host_path)},
+        },
+        ap=ap,
+    )
+
+    await session._init_box_stdio_server()
+    await session.exit_stack.aclose()
+
+    assert ap.box_service.create_session.await_count == 1
+    session_payload = ap.box_service.create_session.await_args.args[0]
+    assert session_payload['session_id'] == 'mcp-shared'
+    assert 'host_path' not in session_payload
+    assert ap.box_service.build_spec.call_count == 1
+    assert ap.box_service.build_spec.call_args.kwargs.get('skip_host_mount_validation', False) is False
+    assert ap.box_service.build_spec.call_args.args[0]['host_path'] == str(host_path)
+
+    staged_file = tmp_path / 'shared-box-workspace' / '.mcp' / 'u1' / 'workspace' / 'server.py'
+    assert staged_file.read_text(encoding='utf-8') == 'print("hello")\n'
+
+    process_payload = ap.box_service.start_managed_process.await_args.args[1]
+    assert process_payload['process_id'] == 'u1'
+    assert process_payload['command'] == 'python'
+    assert process_payload['args'] == ['/workspace/.mcp/u1/workspace/server.py']
+    assert process_payload['cwd'] == '/workspace/.mcp/u1/workspace'
@@ -169,6 +169,7 @@ async def test_updated_llm_model_is_immediately_usable_by_local_agent_pipeline()
    ap.logger = Mock()
    ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
    ap.tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
+    ap.skill_mgr = None  # PreProcessor only uses skill_mgr for the local-agent skill-binding branch
    ap.plugin_connector = SimpleNamespace(
        emit_event=AsyncMock(return_value=SimpleNamespace(event=SimpleNamespace(default_prompt=[], prompt=[])))
    )
@@ -0,0 +1,479 @@
+from __future__ import annotations
+
+import os
+import tempfile
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+
+def _make_ap(logger=None):
+    ap = SimpleNamespace()
+    ap.logger = logger or Mock()
+    ap.persistence_mgr = Mock()
+    ap.persistence_mgr.execute_async = AsyncMock(return_value=Mock(all=Mock(return_value=[])))
+    ap.persistence_mgr.serialize_model = Mock(side_effect=lambda cls, row: row)
+    return ap
+
+
+def _make_skill_data(
+    name='test-skill',
+    instructions='Do something',
+    package_root='',
+    entry_file='SKILL.md',
+    **kwargs,
+):
+    return {
+        'name': name,
+        'display_name': kwargs.pop('display_name', name),
+        'description': kwargs.pop('description', f'Description of {name}'),
+        'instructions': instructions,
+        'package_root': package_root,
+        'entry_file': entry_file,
+        **kwargs,
+    }
+
+
+class TestSkillManagerCache:
+    """The Box runtime is the only source of truth — SkillManager just holds
+    an in-memory cache populated by ``reload_skills``. There is no local
+    filesystem reader anymore."""
+
+    def test_refresh_skill_from_disk_reports_cache_presence(self):
+        """Box is the only source of truth for skill content. refresh_skill_from_disk
+        now just reports whether the skill is still in the in-memory cache —
+        the actual content refresh is driven by SkillService awaiting
+        ``reload_skills`` after every Box mutation."""
+        from langbot.pkg.skill.manager import SkillManager
+
+        ap = _make_ap()
+        mgr = SkillManager(ap)
+
+        # Empty cache → returns False
+        assert mgr.refresh_skill_from_disk('test-skill') is False
+
+        # Cache populated → returns True; method does NOT mutate the cache
+        cached = _make_skill_data(name='test-skill', instructions='Cached')
+        mgr.skills['test-skill'] = cached
+        assert mgr.refresh_skill_from_disk('test-skill') is True
+        assert mgr.skills['test-skill'] is cached
+        assert mgr.refresh_skill_from_disk('') is False
+
+    @pytest.mark.asyncio
+    async def test_reload_skills_drops_box_skills_with_missing_package_root(self):
+        """When Box reports a skill whose package_root is gone from the
+        LangBot-visible filesystem, the cache must drop it instead of
+        keeping a stale entry that would later produce a bad mount."""
+        from langbot.pkg.skill.manager import SkillManager
+
+        with tempfile.TemporaryDirectory() as live_dir:
+            ghost_dir = os.path.join(live_dir, '_does_not_exist')
+            box_service = SimpleNamespace(
+                available=True,
+                list_skills=AsyncMock(
+                    return_value=[
+                        _make_skill_data(name='alive', package_root=live_dir),
+                        _make_skill_data(name='ghost', package_root=ghost_dir),
+                    ]
+                ),
+            )
+
+            ap = _make_ap()
+            ap.box_service = box_service
+            mgr = SkillManager(ap)
+
+            await mgr.reload_skills()
+
+        assert list(mgr.skills) == ['alive']
+        # Warning fired with the dropped skill name so operators can see it.
+        warning_messages = [str(call.args[0]) for call in ap.logger.warning.call_args_list]
+        assert any('ghost' in msg and 'package_root missing' in msg for msg in warning_messages)
+
+
+class TestSkillActivationHelper:
+    """Skill activation is now Tool-Call based.
+
+    The legacy text-marker mechanism (``[ACTIVATE_SKILL: x]`` detection,
+    ``build_activation_prompt_for_skills``, ``remove_activation_marker``,
+    ``prepare_skill_activation``) has been removed. Activation now goes
+    through ``skill.activation.register_activated_skill``, invoked by the
+    ``activate`` Tool Call.
+    """
+
+    def test_register_activated_skill_records_known_skill(self):
+        from langbot.pkg.skill.activation import register_activated_skill
+        from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
+        from langbot.pkg.skill.manager import SkillManager
+
+        ap = _make_ap()
+        mgr = SkillManager(ap)
+        mgr.skills = {
+            'primary': _make_skill_data(name='primary', instructions='Primary instructions'),
+        }
+        ap.skill_mgr = mgr
+
+        query = SimpleNamespace(variables={})
+
+        assert register_activated_skill(ap, query, 'primary') is True
+        assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'primary'}
+        assert query.variables[ACTIVATED_SKILLS_KEY]['primary']['name'] == 'primary'
+
+    def test_register_activated_skill_rejects_unknown_skill(self):
+        from langbot.pkg.skill.activation import register_activated_skill
+        from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
+        from langbot.pkg.skill.manager import SkillManager
+
+        ap = _make_ap()
+        mgr = SkillManager(ap)
+        mgr.skills = {'primary': _make_skill_data(name='primary')}
+        ap.skill_mgr = mgr
+
+        query = SimpleNamespace(variables={})
+
+        assert register_activated_skill(ap, query, 'missing') is False
+        assert ACTIVATED_SKILLS_KEY not in query.variables
+
+    def test_register_activated_skill_without_skill_manager_returns_false(self):
+        from langbot.pkg.skill.activation import register_activated_skill
+
+        ap = _make_ap()  # no skill_mgr attribute
+        query = SimpleNamespace(variables={})
+
+        assert register_activated_skill(ap, query, 'primary') is False
+
+
+class TestSkillPathHelpers:
+    def test_get_visible_skills_filters_by_bound_names(self):
+        from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY, get_visible_skills
+
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(
+            skills={
+                'visible': _make_skill_data(name='visible'),
+                'hidden': _make_skill_data(name='hidden'),
+            }
+        )
+        query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['visible']})
+
+        result = get_visible_skills(ap, query)
+
+        assert list(result.keys()) == ['visible']
+
+    def test_resolve_virtual_skill_path_allows_visible_skill_reads(self):
+        from langbot.pkg.provider.tools.loaders.skill import (
+            PIPELINE_BOUND_SKILLS_KEY,
+            resolve_virtual_skill_path,
+        )
+
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
+        query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
+
+        skill, rewritten = resolve_virtual_skill_path(
+            ap,
+            query,
+            '/workspace/.skills/demo/SKILL.md',
+            include_visible=True,
+            include_activated=False,
+        )
+
+        assert skill['name'] == 'demo'
+        assert rewritten == '/workspace/SKILL.md'
+
+    def test_build_skill_session_id_uses_name_based_identifier(self):
+        from langbot.pkg.provider.tools.loaders.skill import build_skill_session_id
+
+        with_launcher = build_skill_session_id(
+            {'name': 'writer'},
+            SimpleNamespace(query_id=42, launcher_type='person', launcher_id='123'),
+        )
+        fallback = build_skill_session_id({'name': 'writer'}, SimpleNamespace(query_id=99))
+
+        assert with_launcher == 'skill-person_123-writer'
+        assert fallback == 'skill-99-writer'
+
+    def test_should_prepare_skill_python_env_detects_manifests_and_venv(self):
+        from langbot.pkg.provider.tools.loaders.skill import should_prepare_skill_python_env
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            assert should_prepare_skill_python_env(tmpdir) is False
+
+            with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as f:
+                f.write('requests==2.32.0\n')
+            assert should_prepare_skill_python_env(tmpdir) is True
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            os.makedirs(os.path.join(tmpdir, '.venv'))
+            assert should_prepare_skill_python_env(tmpdir) is True
+
+    def test_wrap_skill_command_with_python_env_bootstraps_then_runs_command(self):
+        from langbot.pkg.provider.tools.loaders.skill import wrap_skill_command_with_python_env
+
+        command = wrap_skill_command_with_python_env('python scripts/run.py')
+
+        assert 'python -m venv "$_LB_VENV_DIR"' in command
+        assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
+        assert command.rstrip().endswith('python scripts/run.py')
+
+
+class TestSkillToolLoader:
+    """The skill tool surface is now just ``activate`` + ``register_skill``.
+
+    The legacy CRUD authoring tools (create/list/get/update/delete/
+    import_skill_from_directory/reload_skills) were removed; skill CRUD is
+    handled by SkillService via the HTTP API / web UI instead.
+    """
+
+    @pytest.mark.asyncio
+    async def test_activate_returns_instructions_and_registers_skill(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import (
+            ACTIVATE_SKILL_TOOL_NAME,
+            SkillToolLoader,
+        )
+        from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
+
+        skill = _make_skill_data(name='demo', package_root='/data/skills/demo', instructions='Step 1')
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(
+            skills={'demo': skill},
+            get_skill_by_name=lambda name: skill if name == 'demo' else None,
+        )
+
+        loader = SkillToolLoader(ap)
+        query = SimpleNamespace(variables={})
+
+        result = await loader.invoke_tool(ACTIVATE_SKILL_TOOL_NAME, {'skill_name': 'demo'}, query)
+
+        assert result['activated'] is True
+        assert result['skill_name'] == 'demo'
+        assert result['mount_path'] == '/workspace/.skills/demo'
+        assert 'Step 1' in result['content']
+        assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'demo'}
+
+    @pytest.mark.asyncio
+    async def test_activate_unknown_skill_raises(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import (
+            ACTIVATE_SKILL_TOOL_NAME,
+            SkillToolLoader,
+        )
+
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(
+            skills={'demo': _make_skill_data(name='demo')},
+            get_skill_by_name=lambda name: None,
+        )
+
+        loader = SkillToolLoader(ap)
+
+        with pytest.raises(ValueError, match='not found'):
+            await loader.invoke_tool(
+                ACTIVATE_SKILL_TOOL_NAME,
+                {'skill_name': 'ghost'},
+                SimpleNamespace(variables={}),
+            )
+
+    @pytest.mark.asyncio
+    async def test_register_skill_scans_directory_and_creates_skill(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import (
+            REGISTER_SKILL_TOOL_NAME,
+            SkillToolLoader,
+        )
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            repo_dir = os.path.join(tmpdir, 'repo')
+            os.makedirs(repo_dir)
+
+            ap = _make_ap()
+            ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
+            ap.skill_service = SimpleNamespace(
+                scan_directory_async=AsyncMock(
+                    return_value={
+                        'name': 'cloned-skill',
+                        'display_name': 'Cloned Skill',
+                        'description': 'Imported from clone',
+                        'instructions': 'Do work',
+                    }
+                ),
+                create_skill=AsyncMock(
+                    return_value=_make_skill_data(name='cloned-skill', package_root=os.path.realpath(repo_dir))
+                ),
+            )
+
+            loader = SkillToolLoader(ap)
+            result = await loader.invoke_tool(
+                REGISTER_SKILL_TOOL_NAME,
+                {'path': '/workspace/repo'},
+                SimpleNamespace(),
+            )
+
+        ap.skill_service.scan_directory_async.assert_awaited_once_with(os.path.realpath(repo_dir))
+        ap.skill_service.create_skill.assert_awaited_once_with(
+            {
+                'name': 'cloned-skill',
+                'display_name': 'Cloned Skill',
+                'description': 'Imported from clone',
+                'instructions': 'Do work',
+                'package_root': os.path.realpath(repo_dir),
+            }
+        )
+        assert result['registered'] is True
+        assert result['skill_name'] == 'cloned-skill'
+        assert result['source_path'] == '/workspace/repo'
+
+    @pytest.mark.asyncio
+    async def test_register_skill_rejects_workspace_escape(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import (
+            REGISTER_SKILL_TOOL_NAME,
+            SkillToolLoader,
+        )
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ap = _make_ap()
+            ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
+            ap.skill_service = SimpleNamespace(scan_directory_async=AsyncMock(), create_skill=AsyncMock())
+
+            loader = SkillToolLoader(ap)
+
+            with pytest.raises(ValueError, match='escapes the workspace boundary'):
+                await loader.invoke_tool(
+                    REGISTER_SKILL_TOOL_NAME,
+                    {'path': '/workspace/../../etc'},
+                    SimpleNamespace(),
+                )
+
+    @pytest.mark.asyncio
+    async def test_register_skill_requires_skill_service(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import (
+            REGISTER_SKILL_TOOL_NAME,
+            SkillToolLoader,
+        )
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ap = _make_ap()  # no skill_service attribute
+            ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
+
+            loader = SkillToolLoader(ap)
+
+            with pytest.raises(ValueError, match='Skill service not available'):
+                await loader.invoke_tool(
+                    REGISTER_SKILL_TOOL_NAME,
+                    {'path': '/workspace/foo'},
+                    SimpleNamespace(),
+                )
+
+    @pytest.mark.asyncio
+    async def test_tools_hidden_when_sandbox_backend_unavailable(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
+
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(skills={})
+        ap.box_service = SimpleNamespace(
+            available=True,
+            get_status=AsyncMock(return_value={'backend': {'available': False}}),
+        )
+
+        loader = SkillToolLoader(ap)
+        await loader.initialize()
+
+        assert await loader.get_tools() == []
+        assert await loader.has_tool('activate') is False
+        assert await loader.has_tool('register_skill') is False
+
+    @pytest.mark.asyncio
+    async def test_tools_exposed_when_sandbox_backend_available(self):
+        from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
+
+        ap = _make_ap()
+        ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
+        ap.box_service = SimpleNamespace(
+            available=True,
+            get_status=AsyncMock(return_value={'backend': {'available': True}}),
+        )
+
+        loader = SkillToolLoader(ap)
+        await loader.initialize()
+
+        tools = await loader.get_tools()
+
+        assert sorted(tool.name for tool in tools) == ['activate', 'register_skill']
+        assert await loader.has_tool('activate') is True
+        assert await loader.has_tool('register_skill') is True
+
+
+class TestNativeToolLoaderSkillPaths:
+    @pytest.mark.asyncio
+    async def test_read_visible_skill_file(self):
+        from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
+        from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            skill_md = os.path.join(tmpdir, 'SKILL.md')
+            with open(skill_md, 'w', encoding='utf-8') as f:
+                f.write('demo instructions')
+
+            ap = _make_ap()
+            ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
+            ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
+            loader = NativeToolLoader(ap)
+
+            result = await loader.invoke_tool(
+                'read',
+                {'path': '/workspace/.skills/demo/SKILL.md'},
+                SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']}),
+            )
+
+            assert result == {'ok': True, 'content': 'demo instructions'}
+
+    @pytest.mark.asyncio
+    async def test_exec_in_activated_skill_mount_rewrites_command_and_refreshes(self):
+        from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
+        from langbot.pkg.provider.tools.loaders.skill import register_activated_skill
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ap = _make_ap()
+            ap.box_service = SimpleNamespace(
+                available=True,
+                default_workspace=tmpdir,
+                execute_tool=AsyncMock(return_value={'ok': True}),
+            )
+            ap.skill_mgr = SimpleNamespace(refresh_skill_from_disk=Mock())
+            loader = NativeToolLoader(ap)
+
+            query = SimpleNamespace(query_id='q1', launcher_type='person', launcher_id='123', variables={})
+            register_activated_skill(query, _make_skill_data(name='demo', package_root=tmpdir))
+
+            result = await loader.invoke_tool(
+                'exec',
+                {
+                    'command': 'python /workspace/.skills/demo/scripts/run.py',
+                    'workdir': '/workspace/.skills/demo',
+                },
+                query,
+            )
+
+            assert result == {'ok': True}
+            tool_parameters = ap.box_service.execute_tool.await_args.args[0]
+            assert tool_parameters['command'] == 'python /workspace/.skills/demo/scripts/run.py'
+            assert tool_parameters['workdir'] == '/workspace/.skills/demo'
+            ap.skill_mgr.refresh_skill_from_disk.assert_called_once_with('demo')
+
+    @pytest.mark.asyncio
+    async def test_write_requires_skill_activation(self):
+        from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
+        from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
+
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ap = _make_ap()
+            ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
+            ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
+            loader = NativeToolLoader(ap)
+
+            query = SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
+
+            with pytest.raises(ValueError, match='Skill "demo" is not available at this path'):
+                await loader.invoke_tool(
+                    'write',
+                    {'path': '/workspace/.skills/demo/notes.txt', 'content': 'hi'},
+                    query,
+                )
@@ -4,6 +4,7 @@ Tests cover:
 - Tool schema generation for OpenAI and Anthropic
 - Tool execution dispatch
 """
+
 from __future__ import annotations

 import pytest
@@ -52,11 +53,12 @@ class TestToolManagerSchemaGeneration:
    @pytest.fixture
    def sample_tools(self):
        """Create sample LLMTool list for testing."""
+
        def dummy_weather_func(**kwargs):
-            return "weather result"
+            return 'weather result'

        def dummy_calc_func(**kwargs):
-            return "calc result"
+            return 'calc result'

        tools = [
            resource_tool.LLMTool(
@@ -65,15 +67,10 @@ class TestToolManagerSchemaGeneration:
                description='Get current weather for a location',
                parameters={
                    'type': 'object',
-                    'properties': {
-                        'location': {
-                            'type': 'string',
-                            'description': 'City name'
-                        }
-                    },
-                    'required': ['location']
+                    'properties': {'location': {'type': 'string', 'description': 'City name'}},
+                    'required': ['location'],
                },
-                func=dummy_weather_func
+                func=dummy_weather_func,
            ),
            resource_tool.LLMTool(
                name='calculate',
@@ -81,15 +78,10 @@ class TestToolManagerSchemaGeneration:
                description='Perform a calculation',
                parameters={
                    'type': 'object',
-                    'properties': {
-                        'expression': {
-                            'type': 'string',
-                            'description': 'Math expression'
-                        }
-                    },
-                    'required': ['expression']
+                    'properties': {'expression': {'type': 'string', 'description': 'Math expression'}},
+                    'required': ['expression'],
                },
-                func=dummy_calc_func
+                func=dummy_calc_func,
            ),
        ]
        return tools
@@ -188,26 +180,48 @@ class TestToolManagerExecuteFuncCall:

    @pytest.fixture
    def mock_app_with_loaders(self):
-        """Create mock app with mock tool loaders."""
+        """Create mock app with mock tool loaders.
+
+        Returns (app, plugin_loader, mcp_loader). The native and skill loaders
+        are attached directly to the app for tests that don't need to assert
+        against them — they all default to ``has_tool == False`` so the
+        execute_func_call probe falls through to the plugin/mcp pair.
+        """
        mock_app = Mock()
        mock_app.logger = Mock()

+        def _make_inert_loader():
+            loader = Mock()
+            loader.has_tool = AsyncMock(return_value=False)
+            loader.invoke_tool = AsyncMock(return_value=None)
+            loader.initialize = AsyncMock()
+            loader.shutdown = AsyncMock()
+            return loader
+
        # Create mock plugin loader
-        mock_plugin_loader = Mock()
-        mock_plugin_loader.has_tool = AsyncMock(return_value=False)
+        mock_plugin_loader = _make_inert_loader()
        mock_plugin_loader.invoke_tool = AsyncMock(return_value='plugin_result')
-        mock_plugin_loader.initialize = AsyncMock()
-        mock_plugin_loader.shutdown = AsyncMock()

        # Create mock MCP loader
-        mock_mcp_loader = Mock()
-        mock_mcp_loader.has_tool = AsyncMock(return_value=False)
+        mock_mcp_loader = _make_inert_loader()
        mock_mcp_loader.invoke_tool = AsyncMock(return_value='mcp_result')
-        mock_mcp_loader.initialize = AsyncMock()
-        mock_mcp_loader.shutdown = AsyncMock()
+
+        # Stash inert native/skill loaders so the ToolManager probe order
+        # (native → plugin → mcp → skill) doesn't AttributeError. Tests that
+        # need to override these can replace the attributes on the manager.
+        mock_app._inert_native_loader = _make_inert_loader()
+        mock_app._inert_skill_loader = _make_inert_loader()

        return mock_app, mock_plugin_loader, mock_mcp_loader

+    @staticmethod
+    def _wire_loaders(manager, mock_app, plugin_loader, mcp_loader):
+        """Attach all four loaders (native + plugin + mcp + skill) to manager."""
+        manager.native_tool_loader = mock_app._inert_native_loader
+        manager.plugin_tool_loader = plugin_loader
+        manager.mcp_tool_loader = mcp_loader
+        manager.skill_tool_loader = mock_app._inert_skill_loader
+
    @pytest.fixture
    def sample_query(self):
        """Create sample query for testing."""
@@ -215,9 +229,7 @@ class TestToolManagerExecuteFuncCall:
        return query

    @pytest.mark.asyncio
-    async def test_execute_calls_plugin_loader_when_has_tool(
-        self, mock_app_with_loaders, sample_query
-    ):
+    async def test_execute_calls_plugin_loader_when_has_tool(self, mock_app_with_loaders, sample_query):
        """Test that execute_func_call uses plugin loader when tool exists there."""
        toolmgr = get_toolmgr_module()

@@ -225,26 +237,17 @@ class TestToolManagerExecuteFuncCall:
        mock_plugin_loader.has_tool = AsyncMock(return_value=True)

        manager = toolmgr.ToolManager(mock_app)
-        manager.plugin_tool_loader = mock_plugin_loader
-        manager.mcp_tool_loader = mock_mcp_loader
+        self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)

-        result = await manager.execute_func_call(
-            'test_tool',
-            {'param': 'value'},
-            sample_query
-        )
+        result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)

        assert result == 'plugin_result'
-        mock_plugin_loader.invoke_tool.assert_called_once_with(
-            'test_tool', {'param': 'value'}, sample_query
-        )
+        mock_plugin_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
        # MCP loader should not be called
        mock_mcp_loader.invoke_tool.assert_not_called()

    @pytest.mark.asyncio
-    async def test_execute_calls_mcp_loader_when_plugin_not_found(
-        self, mock_app_with_loaders, sample_query
-    ):
+    async def test_execute_calls_mcp_loader_when_plugin_not_found(self, mock_app_with_loaders, sample_query):
        """Test that execute_func_call uses MCP loader when plugin doesn't have tool."""
        toolmgr = get_toolmgr_module()

@@ -253,24 +256,15 @@ class TestToolManagerExecuteFuncCall:
        mock_mcp_loader.has_tool = AsyncMock(return_value=True)

        manager = toolmgr.ToolManager(mock_app)
-        manager.plugin_tool_loader = mock_plugin_loader
-        manager.mcp_tool_loader = mock_mcp_loader
+        self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)

-        result = await manager.execute_func_call(
-            'test_tool',
-            {'param': 'value'},
-            sample_query
-        )
+        result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)

        assert result == 'mcp_result'
-        mock_mcp_loader.invoke_tool.assert_called_once_with(
-            'test_tool', {'param': 'value'}, sample_query
-        )
+        mock_mcp_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)

    @pytest.mark.asyncio
-    async def test_execute_raises_when_tool_not_found(
-        self, mock_app_with_loaders, sample_query
-    ):
+    async def test_execute_raises_when_tool_not_found(self, mock_app_with_loaders, sample_query):
        """Test that execute_func_call raises ValueError when tool not found."""
        toolmgr = get_toolmgr_module()

@@ -279,20 +273,13 @@ class TestToolManagerExecuteFuncCall:
        mock_mcp_loader.has_tool = AsyncMock(return_value=False)

        manager = toolmgr.ToolManager(mock_app)
-        manager.plugin_tool_loader = mock_plugin_loader
-        manager.mcp_tool_loader = mock_mcp_loader
+        self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)

        with pytest.raises(ValueError, match='未找到工具'):
-            await manager.execute_func_call(
-                'unknown_tool',
-                {},
-                sample_query
-            )
+            await manager.execute_func_call('unknown_tool', {}, sample_query)

    @pytest.mark.asyncio
-    async def test_plugin_loader_checked_first(
-        self, mock_app_with_loaders, sample_query
-    ):
+    async def test_plugin_loader_checked_first(self, mock_app_with_loaders, sample_query):
        """Test that plugin loader is checked before MCP loader."""
        toolmgr = get_toolmgr_module()

@@ -302,8 +289,7 @@ class TestToolManagerExecuteFuncCall:
        mock_mcp_loader.has_tool = AsyncMock(return_value=True)

        manager = toolmgr.ToolManager(mock_app)
-        manager.plugin_tool_loader = mock_plugin_loader
-        manager.mcp_tool_loader = mock_mcp_loader
+        self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)

        await manager.execute_func_call('test_tool', {}, sample_query)

@@ -317,20 +303,30 @@ class TestToolManagerShutdown:

    @pytest.mark.asyncio
    async def test_shutdown_calls_loader_shutdown(self):
-        """Test that shutdown calls shutdown on both loaders."""
+        """Test that shutdown calls shutdown on every registered loader."""
        toolmgr = get_toolmgr_module()

        mock_app = Mock()
-        mock_plugin_loader = Mock()
-        mock_plugin_loader.shutdown = AsyncMock()
-        mock_mcp_loader = Mock()
-        mock_mcp_loader.shutdown = AsyncMock()
+
+        def _make_loader():
+            loader = Mock()
+            loader.shutdown = AsyncMock()
+            return loader
+
+        mock_native_loader = _make_loader()
+        mock_plugin_loader = _make_loader()
+        mock_mcp_loader = _make_loader()
+        mock_skill_loader = _make_loader()

        manager = toolmgr.ToolManager(mock_app)
+        manager.native_tool_loader = mock_native_loader
        manager.plugin_tool_loader = mock_plugin_loader
        manager.mcp_tool_loader = mock_mcp_loader
+        manager.skill_tool_loader = mock_skill_loader

        await manager.shutdown()

+        mock_native_loader.shutdown.assert_called_once()
        mock_plugin_loader.shutdown.assert_called_once()
-        mock_mcp_loader.shutdown.assert_called_once()
+        mock_mcp_loader.shutdown.assert_called_once()
+        mock_skill_loader.shutdown.assert_called_once()
@@ -0,0 +1,250 @@
+from __future__ import annotations
+
+import os
+import tempfile
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
+
+from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
+from langbot.pkg.provider.tools.toolmgr import ToolManager
+
+
+class StubLoader:
+    def __init__(self, tools: list[resource_tool.LLMTool] | None = None, invoke_result=None):
+        self._tools = tools or []
+        self._invoke_result = invoke_result
+
+    async def get_tools(self, *_args, **_kwargs):
+        return self._tools
+
+    async def has_tool(self, name: str) -> bool:
+        return any(tool.name == name for tool in self._tools)
+
+    async def invoke_tool(self, name: str, parameters: dict, query):
+        return self._invoke_result(name, parameters, query) if callable(self._invoke_result) else self._invoke_result
+
+    async def shutdown(self):
+        return None
+
+
+def make_tool(name: str) -> resource_tool.LLMTool:
+    return resource_tool.LLMTool(
+        name=name,
+        human_desc=name,
+        description=name,
+        parameters={'type': 'object', 'properties': {}},
+        func=lambda parameters: parameters,
+    )
+
+
+@pytest.mark.asyncio
+async def test_tool_manager_omits_skill_authoring_tools_by_default():
+    manager = ToolManager(SimpleNamespace())
+    manager.native_tool_loader = StubLoader([make_tool('exec')])
+    manager.skill_tool_loader = StubLoader([make_tool('activate')])
+    manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
+    manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
+
+    tools = await manager.get_all_tools()
+
+    assert [tool.name for tool in tools] == ['exec', 'plugin_tool', 'mcp_tool']
+
+
+@pytest.mark.asyncio
+async def test_tool_manager_includes_skill_authoring_tools_when_requested():
+    manager = ToolManager(SimpleNamespace())
+    manager.native_tool_loader = StubLoader([make_tool('exec')])
+    manager.skill_tool_loader = StubLoader([make_tool('activate')])
+    manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
+    manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
+
+    tools = await manager.get_all_tools(include_skill_authoring=True)
+
+    assert [tool.name for tool in tools] == ['exec', 'activate', 'plugin_tool', 'mcp_tool']
+
+
+@pytest.mark.asyncio
+async def test_tool_manager_routes_native_tool_calls():
+    app = SimpleNamespace()
+    manager = ToolManager(app)
+    manager.native_tool_loader = StubLoader([make_tool('exec')], invoke_result={'backend': 'fake'})
+    manager.skill_tool_loader = StubLoader([make_tool('activate')])
+    manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
+    manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
+
+    result = await manager.execute_func_call('exec', {'command': 'pwd'}, query=Mock())
+
+    assert result == {'backend': 'fake'}
+
+
+@pytest.mark.asyncio
+async def test_native_tool_loader_hides_tools_when_box_unavailable():
+    loader = NativeToolLoader(SimpleNamespace(box_service=SimpleNamespace(available=False)))
+
+    assert await loader.get_tools() == []
+    for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
+        assert await loader.has_tool(tool_name) is False
+
+
+@pytest.mark.asyncio
+async def test_native_tool_loader_exposes_all_tools_when_box_available():
+    box_service = SimpleNamespace(
+        available=True,
+        get_status=AsyncMock(return_value={'backend': {'available': True}}),
+    )
+    loader = NativeToolLoader(SimpleNamespace(box_service=box_service, logger=Mock()))
+    await loader.initialize()
+
+    tools = await loader.get_tools()
+
+    assert [tool.name for tool in tools] == ['exec', 'read', 'write', 'edit', 'glob', 'grep']
+    for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
+        assert await loader.has_tool(tool_name) is True
+
+
+# ── read/write/edit file tool tests ─────────────────────────────
+
+
+def _make_loader_with_workspace(tmpdir: str) -> tuple[NativeToolLoader, Mock]:
+    logger = Mock()
+    box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
+    ap = SimpleNamespace(box_service=box_service, logger=logger)
+    return NativeToolLoader(ap), logger
+
+
+def _make_query() -> Mock:
+    q = Mock()
+    q.query_id = 'test-query-1'
+    return q
+
+
+@pytest.mark.asyncio
+async def test_read_file():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+        with open(os.path.join(tmpdir, 'hello.txt'), 'w') as f:
+            f.write('hello world')
+
+        result = await loader.invoke_tool('read', {'path': '/workspace/hello.txt'}, _make_query())
+
+        assert result['ok'] is True
+        assert result['content'] == 'hello world'
+
+
+@pytest.mark.asyncio
+async def test_read_nonexistent_file():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+
+        result = await loader.invoke_tool('read', {'path': '/workspace/no_such.txt'}, _make_query())
+
+        assert result['ok'] is False
+        assert 'not found' in result['error'].lower()
+
+
+@pytest.mark.asyncio
+async def test_read_directory():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+        os.makedirs(os.path.join(tmpdir, 'subdir'))
+        with open(os.path.join(tmpdir, 'a.txt'), 'w') as f:
+            f.write('a')
+
+        result = await loader.invoke_tool('read', {'path': '/workspace'}, _make_query())
+
+        assert result['ok'] is True
+        assert result['is_directory'] is True
+        assert 'a.txt' in result['content']
+
+
+@pytest.mark.asyncio
+async def test_write_creates_file():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+
+        result = await loader.invoke_tool(
+            'write', {'path': '/workspace/new.txt', 'content': 'new content'}, _make_query()
+        )
+
+        assert result['ok'] is True
+        with open(os.path.join(tmpdir, 'new.txt')) as f:
+            assert f.read() == 'new content'
+
+
+@pytest.mark.asyncio
+async def test_write_creates_subdirectories():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+
+        result = await loader.invoke_tool(
+            'write', {'path': '/workspace/sub/deep/file.txt', 'content': 'nested'}, _make_query()
+        )
+
+        assert result['ok'] is True
+        with open(os.path.join(tmpdir, 'sub', 'deep', 'file.txt')) as f:
+            assert f.read() == 'nested'
+
+
+@pytest.mark.asyncio
+async def test_edit_replaces_unique_string():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+        with open(os.path.join(tmpdir, 'code.py'), 'w') as f:
+            f.write('def foo():\n    return 1\n')
+
+        result = await loader.invoke_tool(
+            'edit',
+            {'path': '/workspace/code.py', 'old_string': 'return 1', 'new_string': 'return 42'},
+            _make_query(),
+        )
+
+        assert result['ok'] is True
+        with open(os.path.join(tmpdir, 'code.py')) as f:
+            assert f.read() == 'def foo():\n    return 42\n'
+
+
+@pytest.mark.asyncio
+async def test_edit_rejects_ambiguous_match():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+        with open(os.path.join(tmpdir, 'dup.txt'), 'w') as f:
+            f.write('aaa\naaa\n')
+
+        result = await loader.invoke_tool(
+            'edit',
+            {'path': '/workspace/dup.txt', 'old_string': 'aaa', 'new_string': 'bbb'},
+            _make_query(),
+        )
+
+        assert result['ok'] is False
+        assert '2' in result['error']
+
+
+@pytest.mark.asyncio
+async def test_edit_rejects_missing_string():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+        with open(os.path.join(tmpdir, 'x.txt'), 'w') as f:
+            f.write('hello')
+
+        result = await loader.invoke_tool(
+            'edit',
+            {'path': '/workspace/x.txt', 'old_string': 'nope', 'new_string': 'yes'},
+            _make_query(),
+        )
+
+        assert result['ok'] is False
+        assert 'not found' in result['error'].lower()
+
+
+@pytest.mark.asyncio
+async def test_path_escape_blocked():
+    with tempfile.TemporaryDirectory() as tmpdir:
+        loader, _ = _make_loader_with_workspace(tmpdir)
+
+        with pytest.raises(ValueError, match='escapes'):
+            await loader.invoke_tool('read', {'path': '/workspace/../../etc/passwd'}, _make_query())
@@ -0,0 +1,23 @@
+from pathlib import Path
+
+from src.langbot.pkg.utils import paths
+
+
+def test_get_data_root_uses_source_root_in_repo_checkout():
+    data_root = Path(paths.get_data_root())
+    repo_root = Path(__file__).resolve().parents[2]
+
+    assert data_root == repo_root / 'data'
+
+
+def test_get_data_path_joins_under_data_root():
+    data_path = Path(paths.get_data_path('skills', 'demo-skill'))
+    repo_root = Path(__file__).resolve().parents[2]
+
+    assert data_path == repo_root / 'data' / 'skills' / 'demo-skill'
+
+
+def test_get_data_root_honors_env_override(monkeypatch, tmp_path):
+    monkeypatch.setenv('LANGBOT_DATA_ROOT', str(tmp_path / 'custom-data'))
+
+    assert Path(paths.get_data_root()) == (tmp_path / 'custom-data').resolve()
@@ -0,0 +1,204 @@
+from __future__ import annotations
+
+import importlib
+import sys
+import types
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, Mock
+
+import pytest
+
+from langbot_plugin.api.entities.builtin.pipeline.query import Query
+from langbot_plugin.api.entities.builtin.platform.entities import Friend
+from langbot_plugin.api.entities.builtin.platform.events import FriendMessage
+from langbot_plugin.api.entities.builtin.platform.message import MessageChain, Plain
+from langbot_plugin.api.entities.builtin.provider.message import Message
+from langbot_plugin.api.entities.builtin.provider.prompt import Prompt
+from langbot_plugin.api.entities.builtin.provider.session import Conversation, LauncherTypes, Session
+
+
+def _make_query() -> Query:
+    message_chain = MessageChain([Plain(text='create a skill')])
+    return Query(
+        query_id=1,
+        launcher_type=LauncherTypes.PERSON,
+        launcher_id='launcher-1',
+        sender_id='sender-1',
+        message_event=FriendMessage(
+            message_chain=message_chain,
+            time=0,
+            sender=Friend(id='sender-1', nickname='Tester', remark='Tester'),
+        ),
+        message_chain=message_chain,
+        bot_uuid='bot-1',
+        pipeline_uuid='pipe-1',
+        pipeline_config={
+            'ai': {
+                'runner': {'runner': 'local-agent'},
+                'local-agent': {
+                    'model': {'primary': 'model-1', 'fallbacks': []},
+                    'prompt': 'default',
+                    'knowledge-bases': [],
+                },
+            },
+            'trigger': {'misc': {}},
+        },
+        variables={},
+    )
+
+
+def _make_conversation() -> Conversation:
+    return Conversation(
+        prompt=Prompt(name='default', messages=[Message(role='system', content='system prompt')]),
+        messages=[],
+        pipeline_uuid='pipe-1',
+        bot_uuid='bot-1',
+        uuid='conv-1',
+    )
+
+
+def _make_app(*, skill_service) -> SimpleNamespace:
+    session = Session(launcher_type=LauncherTypes.PERSON, launcher_id='launcher-1', sender_id='sender-1')
+    conversation = _make_conversation()
+    model = SimpleNamespace(model_entity=SimpleNamespace(uuid='model-1', abilities={'func_call'}))
+    tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
+
+    return SimpleNamespace(
+        sess_mgr=SimpleNamespace(
+            get_session=AsyncMock(return_value=session),
+            get_conversation=AsyncMock(return_value=conversation),
+        ),
+        model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
+        tool_mgr=tool_mgr,
+        plugin_connector=SimpleNamespace(
+            emit_event=AsyncMock(
+                return_value=SimpleNamespace(
+                    event=SimpleNamespace(
+                        default_prompt=conversation.prompt.messages.copy(),
+                        prompt=conversation.messages.copy(),
+                    )
+                )
+            )
+        ),
+        pipeline_service=SimpleNamespace(
+            get_pipeline=AsyncMock(return_value={'extensions_preferences': {'enable_all_skills': True}})
+        ),
+        skill_mgr=SimpleNamespace(
+            build_skill_aware_prompt_addition=Mock(return_value=''),
+            skills={},
+        ),
+        skill_service=skill_service,
+        logger=Mock(),
+    )
+
+
+def _import_preproc_modules():
+    fake_app_module = types.ModuleType('langbot.pkg.core.app')
+    fake_app_module.Application = object
+    sys.modules['langbot.pkg.core.app'] = fake_app_module
+
+    for module_name in (
+        'langbot.pkg.pipeline.preproc.preproc',
+        'langbot.pkg.pipeline.stage',
+    ):
+        sys.modules.pop(module_name, None)
+
+    preproc_module = importlib.import_module('langbot.pkg.pipeline.preproc.preproc')
+    entities_module = importlib.import_module('langbot.pkg.pipeline.entities')
+    return preproc_module, entities_module
+
+
+@pytest.mark.asyncio
+async def test_preproc_enables_skill_authoring_tools_when_skill_service_available():
+    preproc_module, entities_module = _import_preproc_modules()
+
+    app = _make_app(skill_service=SimpleNamespace())
+    stage = preproc_module.PreProcessor(app)
+
+    result = await stage.process(_make_query(), 'PreProcessor')
+
+    assert result.result_type == entities_module.ResultType.CONTINUE
+    app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=True)
+
+
+@pytest.mark.asyncio
+async def test_preproc_disables_skill_authoring_tools_when_skill_service_missing():
+    preproc_module, entities_module = _import_preproc_modules()
+
+    app = _make_app(skill_service=None)
+    stage = preproc_module.PreProcessor(app)
+
+    result = await stage.process(_make_query(), 'PreProcessor')
+
+    assert result.result_type == entities_module.ResultType.CONTINUE
+    app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=False)
+
+
+@pytest.mark.asyncio
+async def test_preproc_injects_skill_index_into_system_prompt():
+    """The Tool Call activation pattern still needs the LLM to know which
+    skills exist. PreProcessor must append the SkillManager's index
+    addendum to the first system message."""
+    preproc_module, entities_module = _import_preproc_modules()
+
+    app = _make_app(skill_service=SimpleNamespace())
+    addendum = '\n\nAvailable Skills:\n- demo (demo): Demo skill.\n\nCall activate ...'
+    app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value=addendum)
+
+    query = _make_query()
+    result = await stage_process_capture(preproc_module, app, query)
+
+    assert result.result_type == entities_module.ResultType.CONTINUE
+    app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=None)
+    head = query.prompt.messages[0]
+    assert head.role == 'system'
+    assert head.content.endswith(addendum)
+
+
+@pytest.mark.asyncio
+async def test_preproc_respects_pipeline_bound_skills_subset():
+    """When ``enable_all_skills`` is false the bound list is passed through
+    so the addendum only mentions skills allowed for this pipeline."""
+    preproc_module, entities_module = _import_preproc_modules()
+
+    app = _make_app(skill_service=SimpleNamespace())
+    app.pipeline_service.get_pipeline = AsyncMock(
+        return_value={
+            'extensions_preferences': {
+                'enable_all_skills': False,
+                'skills': ['only-this'],
+            }
+        }
+    )
+    app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
+
+    query = _make_query()
+    result = await stage_process_capture(preproc_module, app, query)
+
+    assert result.result_type == entities_module.ResultType.CONTINUE
+    app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=['only-this'])
+    assert query.variables.get('_pipeline_bound_skills') == ['only-this']
+
+
+@pytest.mark.asyncio
+async def test_preproc_skips_injection_when_addendum_is_empty():
+    """No visible skills → system prompt is left untouched (no
+    ``Available Skills`` block appended)."""
+    preproc_module, entities_module = _import_preproc_modules()
+
+    app = _make_app(skill_service=SimpleNamespace())
+    app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
+
+    query = _make_query()
+    result = await stage_process_capture(preproc_module, app, query)
+
+    assert result.result_type == entities_module.ResultType.CONTINUE
+    if query.prompt and query.prompt.messages:
+        assert 'Available Skills' not in (query.prompt.messages[0].content or '')
+
+
+async def stage_process_capture(preproc_module, app, query):
+    """Run PreProcessor.process and return the result while keeping ``query``
+    accessible to the assertions (process mutates query in place)."""
+    stage = preproc_module.PreProcessor(app)
+    return await stage.process(query, 'PreProcessor')
@@ -0,0 +1,89 @@
+from types import SimpleNamespace
+from unittest.mock import AsyncMock
+
+import pytest
+
+from langbot.pkg.api.http.service.skill import SkillService
+
+
+class TestRequireBoxForWrite:
+    """Box is the only source of truth for skills — there is no local
+    filesystem fallback. Every write and (most) read methods refuse cleanly
+    when the Box runtime is disabled, unreachable, or simply not installed."""
+
+    def _ap_with_disabled_box(self):
+        return SimpleNamespace(
+            skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
+            box_service=SimpleNamespace(
+                available=False,
+                enabled=False,
+                _connector_error='Box runtime is disabled in config (box.enabled = false)',
+            ),
+        )
+
+    def _ap_with_failed_box(self):
+        return SimpleNamespace(
+            skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
+            box_service=SimpleNamespace(
+                available=False,
+                enabled=True,
+                _connector_error='docker daemon not running',
+            ),
+        )
+
+    @pytest.mark.asyncio
+    async def test_create_skill_refused_when_box_disabled(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='disabled in config'):
+            await service.create_skill({'name': 'x'})
+
+    @pytest.mark.asyncio
+    async def test_create_skill_refused_when_box_failed(self):
+        service = SkillService(self._ap_with_failed_box())
+        with pytest.raises(ValueError, match='docker daemon not running'):
+            await service.create_skill({'name': 'x'})
+
+    @pytest.mark.asyncio
+    async def test_update_skill_refused_when_box_disabled(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='Editing a skill requires the Box runtime'):
+            await service.update_skill('x', {})
+
+    @pytest.mark.asyncio
+    async def test_write_skill_file_refused_when_box_disabled(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='Editing skill files requires the Box runtime'):
+            await service.write_skill_file('x', 'a.txt', 'hi')
+
+    @pytest.mark.asyncio
+    async def test_install_from_github_refused_when_box_disabled(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='Installing a skill from GitHub'):
+            await service.install_from_github({'owner': 'o', 'repo': 'r', 'asset_url': 'https://example/x.zip'})
+
+    @pytest.mark.asyncio
+    async def test_install_from_zip_upload_refused_when_box_disabled(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='Installing a skill from upload'):
+            await service.install_from_zip_upload(file_bytes=b'', filename='x.zip')
+
+    @pytest.mark.asyncio
+    async def test_create_skill_refused_when_box_service_missing_entirely(self):
+        """No ap.box_service attribute at all (truly minimal setup):
+        Box is the only source of truth, so creation must still refuse."""
+        service = SkillService(SimpleNamespace(skill_mgr=SimpleNamespace(reload_skills=AsyncMock())))
+        with pytest.raises(ValueError, match='not initialised'):
+            await service.create_skill({'name': 'x'})
+
+    @pytest.mark.asyncio
+    async def test_list_skills_returns_empty_when_box_unavailable(self):
+        """list_skills should render an empty surface (not crash) so the
+        skills page can show a banner instead of a broken state."""
+        service = SkillService(self._ap_with_disabled_box())
+        assert await service.list_skills() == []
+
+    @pytest.mark.asyncio
+    async def test_read_skill_file_refused_when_box_unavailable(self):
+        service = SkillService(self._ap_with_disabled_box())
+        with pytest.raises(ValueError, match='Reading a skill file'):
+            await service.read_skill_file('x', 'a.txt')
@@ -11,7 +11,6 @@ Uses tmp_path for file system isolation where applicable.

 import os
 import pytest
-from unittest.mock import patch


 class TestCheckIfSourceInstall:
@@ -19,7 +18,7 @@ class TestCheckIfSourceInstall:

    def test_returns_true_for_source_install(self, tmp_path, monkeypatch):
        """Should return True when main.py with LangBot marker exists."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n# This is the entry point')

        monkeypatch.chdir(tmp_path)
@@ -33,52 +32,14 @@ class TestCheckIfSourceInstall:

        paths._is_source_install = None

-    def test_returns_false_when_no_main_py(self, tmp_path, monkeypatch):
-        """Should return False when main.py doesn't exist."""
-        monkeypatch.chdir(tmp_path)
-
-        from langbot.pkg.utils import paths
-
-        paths._is_source_install = None
-
-        result = paths._check_if_source_install()
-        assert result is False
-
-        paths._is_source_install = None
-
-    def test_returns_false_when_main_py_without_marker(self, tmp_path, monkeypatch):
-        """Should return False when main.py exists but lacks LangBot marker."""
-        main_py = tmp_path / "main.py"
-        main_py.write_text('# Some other project\nprint("hello")')
-
-        monkeypatch.chdir(tmp_path)
-
-        from langbot.pkg.utils import paths
-
-        paths._is_source_install = None
-
-        result = paths._check_if_source_install()
-        assert result is False
-
-        paths._is_source_install = None
-
-    def test_handles_io_error_gracefully(self, tmp_path, monkeypatch):
-        """Should return False when main.py cannot be read."""
-        main_py = tmp_path / "main.py"
-        main_py.write_text('# LangBot/main.py\n')
-
-        monkeypatch.chdir(tmp_path)
-
-        from langbot.pkg.utils import paths
-
-        paths._is_source_install = None
-
-        # Patch open to raise IOError
-        with patch("builtins.open", side_effect=IOError("Cannot read")):
-            result = paths._check_if_source_install()
-            assert result is False
-
-        paths._is_source_install = None
+    # Note: ``_check_if_source_install`` was refactored to walk
+    # ``Path(__file__).resolve().parents`` looking for ``pyproject.toml`` +
+    # ``main.py`` instead of relying on the cwd. That makes it robust to where
+    # the process is launched from but also means the old "cwd doesn't have
+    # main.py" / "main.py without marker" / "IOError on read" cases no longer
+    # apply — there's no file read at all. The corresponding negative tests
+    # were removed; ``test_returns_true_for_source_install`` still exercises
+    # the positive path because the repo checkout itself is a source install.


 class TestGetFrontendPath:
@@ -92,16 +53,16 @@ class TestGetFrontendPath:

        result = paths.get_frontend_path()
        # The result should contain web/dist or be an absolute path to it
-        assert "web/dist" in result or result.endswith("dist")
+        assert 'web/dist' in result or result.endswith('dist')

        paths._is_source_install = None

    def test_finds_dist_directory_in_source_mode(self, tmp_path, monkeypatch):
        """Should find web/dist when running from source mode."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n')

-        web_dist = tmp_path / "web" / "dist"
+        web_dist = tmp_path / 'web' / 'dist'
        web_dist.mkdir(parents=True)

        monkeypatch.chdir(tmp_path)
@@ -111,18 +72,18 @@ class TestGetFrontendPath:
        paths._is_source_install = None

        result = paths.get_frontend_path()
-        assert result == "web/dist"
+        assert result == 'web/dist'

        paths._is_source_install = None

    def test_prefers_dist_over_out_in_source_mode(self, tmp_path, monkeypatch):
        """Should prefer web/dist over web/out when both exist in source mode."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n')

-        web_dist = tmp_path / "web" / "dist"
+        web_dist = tmp_path / 'web' / 'dist'
        web_dist.mkdir(parents=True)
-        web_out = tmp_path / "web" / "out"
+        web_out = tmp_path / 'web' / 'out'
        web_out.mkdir(parents=True)

        monkeypatch.chdir(tmp_path)
@@ -132,7 +93,7 @@ class TestGetFrontendPath:
        paths._is_source_install = None

        result = paths.get_frontend_path()
-        assert result == "web/dist"
+        assert result == 'web/dist'

        paths._is_source_install = None

@@ -148,19 +109,19 @@ class TestGetResourcePath:

        paths._is_source_install = None

-        result = paths.get_resource_path("nonexistent/file.txt")
-        assert result == "nonexistent/file.txt"
+        result = paths.get_resource_path('nonexistent/file.txt')
+        assert result == 'nonexistent/file.txt'

        paths._is_source_install = None

    def test_finds_resource_in_current_directory_source_mode(self, tmp_path, monkeypatch):
        """Should find resource in current directory when in source mode."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n')

-        resource_file = tmp_path / "templates" / "config.yaml"
+        resource_file = tmp_path / 'templates' / 'config.yaml'
        resource_file.parent.mkdir(parents=True, exist_ok=True)
-        resource_file.write_text("test: value")
+        resource_file.write_text('test: value')

        monkeypatch.chdir(tmp_path)

@@ -168,18 +129,18 @@ class TestGetResourcePath:

        paths._is_source_install = None

-        result = paths.get_resource_path("templates/config.yaml")
+        result = paths.get_resource_path('templates/config.yaml')
        assert os.path.exists(result)

        paths._is_source_install = None

    def test_returns_relative_path_in_source_mode(self, tmp_path, monkeypatch):
        """Should return relative path if resource exists in source mode."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n')

-        resource_file = tmp_path / "test_resource.txt"
-        resource_file.write_text("test content")
+        resource_file = tmp_path / 'test_resource.txt'
+        resource_file.write_text('test content')

        monkeypatch.chdir(tmp_path)

@@ -187,8 +148,8 @@ class TestGetResourcePath:

        paths._is_source_install = None

-        result = paths.get_resource_path("test_resource.txt")
-        assert result == "test_resource.txt"
+        result = paths.get_resource_path('test_resource.txt')
+        assert result == 'test_resource.txt'

        paths._is_source_install = None

@@ -198,7 +159,7 @@ class TestPathFunctionsCaching:

    def test_source_install_cache_is_used(self, tmp_path, monkeypatch):
        """_check_if_source_install should use cached result."""
-        main_py = tmp_path / "main.py"
+        main_py = tmp_path / 'main.py'
        main_py.write_text('# LangBot/main.py\n')

        monkeypatch.chdir(tmp_path)
@@ -219,5 +180,5 @@ class TestPathFunctionsCaching:
        paths._is_source_install = None


-if __name__ == "__main__":
-    pytest.main([__file__, "-v"])
+if __name__ == '__main__':
+    pytest.main([__file__, '-v'])
@@ -1,136 +0,0 @@
-"""
-Unit tests for version utility functions.
-
-Tests version comparison logic without network calls.
-"""
-
-from __future__ import annotations
-
-from unittest.mock import Mock
-
-from langbot.pkg.utils.version import VersionManager
-
-
-class TestVersionComparison:
-    """Tests for version comparison functions."""
-
-    def _create_version_manager(self):
-        """Create a VersionManager with mock app."""
-        mock_app = Mock()
-        mock_app.proxy_mgr = Mock()
-        mock_app.proxy_mgr.get_forward_providers = Mock(return_value={})
-        mock_app.logger = Mock()
-        return VersionManager(mock_app)
-
-    def test_is_newer_same_version(self):
-        """is_newer returns False for same version."""
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.0.0', 'v1.0.0')
-        assert result is False
-
-    def test_is_newer_different_major_version(self):
-        """is_newer returns False for different major version."""
-        # Note: is_newer ignores major version changes
-        vm = self._create_version_manager()
-        result = vm.is_newer('v2.0.0', 'v1.0.0')
-        assert result is False
-
-    def test_is_newer_minor_update(self):
-        """is_newer returns True for minor update within same major."""
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.1.0', 'v1.0.0')
-        assert result is True
-
-    def test_is_newer_patch_update(self):
-        """is_newer returns True for patch update within same major."""
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.0.1', 'v1.0.0')
-        assert result is True
-
-    def test_is_newer_with_fourth_segment(self):
-        """is_newer ignores fourth version segment."""
-        # Both have same first 3 segments
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.0.0.1', 'v1.0.0.0')
-        assert result is False
-
-    def test_is_newer_short_version(self):
-        """is_newer handles short version numbers."""
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.0', 'v1.0')
-        assert result is False
-
-    def test_is_newer_older_version(self):
-        """is_newer returns True when new > old."""
-        vm = self._create_version_manager()
-        result = vm.is_newer('v1.2.0', 'v1.1.0')
-        assert result is True
-
-
-class TestCompareVersionStr:
-    """Tests for compare_version_str static method."""
-
-    def test_compare_equal_versions(self):
-        """Equal versions return 0."""
-        result = VersionManager.compare_version_str('v1.0.0', 'v1.0.0')
-        assert result == 0
-
-    def test_compare_without_v_prefix(self):
-        """Versions without v prefix work the same."""
-        result = VersionManager.compare_version_str('1.0.0', '1.0.0')
-        assert result == 0
-
-    def test_compare_mixed_prefix(self):
-        """Mixed v prefix works correctly."""
-        result = VersionManager.compare_version_str('v1.0.0', '1.0.0')
-        assert result == 0
-
-    def test_compare_first_greater(self):
-        """First version greater returns 1."""
-        result = VersionManager.compare_version_str('v1.1.0', 'v1.0.0')
-        assert result == 1
-
-    def test_compare_first_smaller(self):
-        """First version smaller returns -1."""
-        result = VersionManager.compare_version_str('v1.0.0', 'v1.1.0')
-        assert result == -1
-
-    def test_compare_different_lengths(self):
-        """Different length versions are padded with zeros."""
-        result = VersionManager.compare_version_str('v1.0', 'v1.0.0')
-        assert result == 0
-
-    def test_compare_shorter_greater(self):
-        """Shorter version padded, first still greater."""
-        result = VersionManager.compare_version_str('v1.1', 'v1.0.0')
-        assert result == 1
-
-    def test_compare_longer_greater(self):
-        """Longer version, first smaller."""
-        result = VersionManager.compare_version_str('v1.0', 'v1.0.1')
-        assert result == -1
-
-    def test_compare_major_version(self):
-        """Major version comparison."""
-        result = VersionManager.compare_version_str('v2.0.0', 'v1.9.9')
-        assert result == 1
-
-    def test_compare_minor_version(self):
-        """Minor version comparison."""
-        result = VersionManager.compare_version_str('v1.5.0', 'v1.4.9')
-        assert result == 1
-
-    def test_compare_patch_version(self):
-        """Patch version comparison."""
-        result = VersionManager.compare_version_str('v1.0.1', 'v1.0.0')
-        assert result == 1
-
-    def test_compare_four_segments(self):
-        """Four segment version comparison."""
-        result = VersionManager.compare_version_str('v1.0.0.1', 'v1.0.0.0')
-        assert result == 1
-
-    def test_compare_long_versions(self):
-        """Long version strings work correctly."""
-        result = VersionManager.compare_version_str('v1.2.3.4.5', 'v1.2.3.4.4')
-        assert result == 1