Feat/sandbox (#2072)

* feat: add mcp and skills * feat: add filter * feat: modify frontend * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add test * fix: fix box intergration test * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * fix: ruff * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths * fix: ruff * feat: enhance sandbox api * refactor(box): derive paths from shared host root * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * feat(box): unify native agent tools around exec/read/write/edit * chore(sandbox): move MCP loader changes to follow-up branch * feat(box): add session workspace quota enforcement and SDK quota metadata * feat(skills): add Agent Skills management system (#1917) * feat(skills): add Agent Skills management system Implement comprehensive skills management feature inspired by agentskills spec: Backend: - Add Skill and SkillPipelineBinding database entities - Add database migration (dbm018) for skills tables - Implement SkillManager for skill loading, matching, and resolution - Implement SkillService for CRUD operations - Add skills API endpoints for skill and pipeline binding management - Integrate skill index injection into pipeline preprocessor - Add skill activation detection in LocalAgentRunner Frontend: - Add Skills page with listing, search, and type filter - Add SkillDetailDialog for create/edit with preview - Add SkillCard and SkillForm components - Add skills API methods to BackendClient - Add skills entry to sidebar navigation - Add i18n translations (en-US, zh-Hans) Features: - Support skill and workflow types - Sub-skill composition via {{INVOKE_SKILL: name}} syntax - Progressive disclosure (index in prompt, full instructions on activation) - Pipeline-specific skill bindings with priority * fix: resolve cherry-pick conflicts for agentskills onto sandbox - Remove non-existent external_kb service import - Add skill_mgr mock to localagent sandbox_exec tests - Keep database version at 24 (sandbox branch's latest) * feat(skills): upgrade to package-backed skills with sandbox execution Evolve the skills system from pure prompt-based to package-backed with sandbox tool execution support: - Add source_type/package_root/entry_file/skill_tools fields to Skill entity - SkillManager loads SKILL.md from local package directories - SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped) - LocalAgent injects skill tools into use_funcs on skill activation - BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params) - Skill tool names auto-namespaced as skill__{skill}__{tool} - API validation for package_root allowlist and entry path traversal - Frontend source_type toggle, package_root input, skill_tools editor - Migration renumbered to 025 with ALTER TABLE fallback for existing DBs - Fix unclosed limitation section in i18n files - Fix skills API methods misplaced outside BackendClient class * fix: test info * feat(skills): switch skills to package-backed storage and add import tooling - skills 从 inline/package 双轨收敛成 package-first - instructions 改为写入并读取 SKILL.md - 新增本地目录扫描和 GitHub 安装 skill - 前端把 skills 整合进 plugins 页，新增 SkillsComponent 和 GitHub 导入弹窗 - skill form 去掉 source_type / type 筛选，改成目录扫描驱动 - Box skill tool 挂载模式从 ro 改成 rw - 测试和中英文文案同步更新 * feat: simplify langbot skill create and import * refactor(skills): clean up legacy skill API and harden activation flow * refactor(skills): remove skill dependency expansion and add skill_get * fix: lint * fix: delete * fix(skills): align tool manager loader initialization * refactor: remove sandbox execute skill * fix(skills): hide activation markers and isolate skill activation flow * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): unify runtime skill access around filesystem paths * refactor(skills): unify runtime skill access around filesystem paths * feat(skills): align rw package design and fix skill activation, visibility, and lint issues * refactor(skills): replace rich authoring API with import/reload flow and update Box design doc * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.*`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths * fix: ruff * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * chore(sandbox): move MCP loader changes to follow-up branch * refactor(plugins): simplify GitHub install flow to default master archive * revert(api): restore plugin GitHub import flow in plugins controller * Improve data-root handling and skill install previews * Add managed skill authoring tools for local agents * Refactor the skills UI around sidebar detail pages * Document why managed skill authoring tools bypass box * fix: lint * feat(web): refactor plugin/skill install flows and fix skills page - Fix sidebar skill icon - Add skills route and error page component - Refactor plugin GitHub install from dialog modal to inline card - Add skill install dropdown menu (create/upload/github) in sidebar - Wire sidebar → skills page communication via pendingSkillInstallAction context - Add i18n keys for error page and skill install actions * fix(web): persist sidebar collapsible section open state on navigation Sections opened via sub-item navigation now retain their expanded state when the user switches to a different section, instead of collapsing because the isActive fallback becomes false. --------- Co-authored-by: youhuanghe <1051233107@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com> * feat(sandbox): add MCP box integration on top of sandbox base (#2083) * refactor(mcp): extract box stdio runtime helper * refactor(box): introduce reusable workspace session helper * refactor(box): run Box Runtime as subprocess inside LangBot container Remove the separate langbot_box_runtime Docker service. Box Runtime now always launches as a local stdio subprocess, regardless of whether LangBot runs in Docker or not. The WebSocket transport path is kept only for explicit runtime_url configuration (remote deployment). This simplifies deployment by eliminating cross-container path mapping and network hops. Box Runtime is a pure scheduling process (talks to Docker socket / nsjail), it does not execute user code or touch the filesystem, so container isolation is unnecessary — unlike Plugin Runtime. * fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor When switching runner (e.g. local-agent → n8n), the newly mounted stage's first emit would re-capture the saved snapshot, erasing the dirty state caused by the runner change. The save button would incorrectly go dim. - Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty - Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent) - Use stable onSubmitRef to prevent useEffect subscription churn - Add previousInitialValues guard to prevent initialValues echo loops * style(web): align plugin list header button heights * docs(review): update Box architecture review documents Replace old review docs with 5 focused documents: - box-architecture.md: deep architecture analysis (LangBot + SDK) - box-issues.md: 22 issues rated P0/P1/P2 - box-test-coverage.md: test coverage analysis - box-tob-analysis.md: toB commercialization analysis - box-vs-plugin-runtime.md: Box vs Plugin runtime comparison * feat(web): improve login error layout and add Terms of Service link - Improve backend connection error display with bordered container, inline icon, and better visual hierarchy - Extract actual error message from axios response object - Add Terms of Service link (https://langbot.app/terms) to login footer - Add termsOfService i18n key for all 7 locales * refactor(web): replace all hardcoded SVG icons with lucide-react Unify icon usage across the entire frontend by replacing 67 hardcoded SVG icons with lucide-react components across ~25 files. This improves consistency, maintainability, and reduces bundle duplication. Key replacements: - Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc. - MCP forms: Loader2, XCircle, Trash2 - Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc. - Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc. - Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile Zero hardcoded <svg> tags remain in .tsx files. * fix(web): stop polling plugin tasks when no active installs The PluginInstallTaskProvider was unconditionally polling getAsyncTasks every 3s on all /home/* routes. Now it only syncs once on mount and starts periodic polling only when there are active (non-terminal) install tasks. * fix(deps): update langbot-plugin version and add new dependencies * refactor: use Space API for release checks and stop idle polling - version.py: switch release list API from GitHub to space.langbot.app, remove unused in-place update logic (update_all, compare_version_str), translate all comments/logs to English - PluginInstallTaskContext: only poll when active install tasks exist * feat(box): add --standalone-box flag and 3-way transport decision for Box runtime Align Box runtime connection logic with Plugin runtime's pattern: - Docker: WebSocket to langbot_box container (ws://langbot_box:5411) - --standalone-box: WebSocket to external Box process (ws://localhost:5411) - Windows: subprocess + WebSocket (workaround for async stdio limitation) - Unix/macOS: subprocess + stdio pipe (unchanged) BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess lifecycle reuse. Add langbot_box service to docker-compose.yaml. * refactor(box): use single port with path-based routing for Box WS Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411. Update review docs to reflect the single-port architecture. * feat(web): show Box runtime status in plugin debug info popover Add Box status section to the debug info popover on the plugin list page, displaying connection status, backend info, profile, active sessions, and recent error count. Fetched from GET /api/v1/box/status in parallel with plugin debug info. Includes i18n for all 8 supported languages. * fix(web): remove ephemeral sandbox count from Box status display The active_sessions count reflects transient sandbox containers that expire after 5 minutes of inactivity, making it misleading in the UI. Keep only connection status, backend, profile, and error count. * feat(box): configurable sandbox scope and unified skill containers Replace the per-message session_id with a template-based system configurable per pipeline via 'Sandbox Scope' in the local-agent panel. Default scope is per-chat ({launcher_type}_{launcher_id}). Unify skill exec into the same container as default exec — skills are mounted at /workspace/.skills/{name}/ via extra_mounts instead of getting separate containers. All pipeline-bound skills are injected at container creation time. - Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages) - Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService - Rewrite native.py skill exec path to use execute_tool with shared session - Update tests for new session_id format - Add design doc: docs/review/box-session-scope.md * feat(web): show active sandbox details in Box status popover Display sandbox count and a detailed list of active sessions including session ID, image, backend, resources (CPU/memory), network mode, and last used time. Fetched from GET /api/v1/box/sessions in parallel. Includes i18n for all 8 supported languages. * feat(box): add startup and availability logging for sandbox tools Log Box runtime initialization result (success with profile info, or failure warning). Log native tool availability status at ToolManager startup so it's immediately clear whether exec/read/write/edit tools are registered for the LLM. * feat(box): support custom sandbox container image via config.yaml Add 'image' field to box config section. When set, it overrides the profile default image (python:3.11-slim) for all sandbox containers. Priority: caller-specified > config.yaml image > profile default. * feat(box): add heartbeat and reconnection for Box runtime connector Add 20-second heartbeat ping loop to detect silent Box runtime disconnections. On disconnect, set available=false and attempt reconnection after 3 seconds via the disconnect callback chain. - BoxRuntimeConnector: heartbeat loop, disconnect callback parameter, disconnect detection in connection callback and WS failure handler - BoxService: wire disconnect callback to toggle available state and re-initialize the connector on reconnection * feat(web): move runtime status to dashboard, clean up plugin debug popover Add SystemStatusCards component to the monitoring dashboard showing Plugin Runtime and Box Runtime connection status with details (backend, profile, sandbox count). Remove all Box/session status from the plugin page debug popover — it now only shows debug URL and key. Includes i18n for all 8 supported languages. * refactor(web): compact system status into a single card alongside metrics Replace the separate two-card row with a single compact 'System Status' card placed as the 5th column in the metrics grid. Shows green/red dots for Plugin Runtime and Box Runtime. Click to expand a popover with connection details (backend, profile, sandbox count). * feat: show connector error details for Plugin and Box runtime status Record Box connector error in BoxService and expose it as 'connector_error' in GET /api/v1/box/status when unavailable. Display error messages in the dashboard System Status popover for both Plugin Runtime (plugin_connector_error) and Box Runtime (connector_error) when they are disconnected. * fix(web): auto-refresh system status and show disconnect errors in real time Poll Plugin Runtime and Box Runtime status every 30 seconds so the dashboard reflects disconnections without a manual page refresh. Also re-fetch when the popover is opened for immediate feedback. * fix(box): handle RPC failure in get_status/get_sessions gracefully When the Box runtime disconnects, there is a race between the heartbeat flipping _available=false and the frontend polling get_status(). If the poll arrives first, client.get_status() throws a ConnectionClosedError which propagated as a 500, causing the frontend to show a grey dot (null status) instead of a red dot with error details. Now get_status() catches RPC errors and returns available=false with the exception message as connector_error. get_sessions() returns an empty list when unavailable or on RPC failure. * fix(box): add persistent reconnection loop with exponential backoff The previous disconnect handler only retried once and then gave up. Now spawns a background task that retries with exponential backoff (3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again. Uses a _reconnecting guard to prevent duplicate loops. Calls connector.dispose() before each retry to clean up stale tasks. * fix(box): detect disconnect when handler.run() returns normally The generic Handler.run() catches ConnectionClosedError and breaks out of its loop (normal return) instead of raising, because it has no disconnect_callback. The old code only triggered reconnection in the except branch, so a clean WebSocket close was never detected. Now treat handler.run() returning normally (after successful handshake) as a disconnect event, triggering the reconnection callback. * fix(web): refresh system status card when clicking Refresh Data button Pass a refreshKey prop through OverviewCards to SystemStatusCard that increments on each Refresh Data click, triggering a re-fetch of Plugin and Box runtime status alongside the monitoring data refresh. * fix(web): fix system status card stuck in loading state fetchStatus(showLoading=false) never called setLoading(false), so the initial loading=true was never cleared. Simplify to always setLoading in the finally block — the spinner only shows on the very first load since subsequent fetches complete near-instantly. * feat(web): show active sandbox details in dashboard Box status popover Fetch box sessions alongside status and display each active sandbox in the popover with session ID, image, resources (CPU/memory), and last used time. * feat(box): add global sandbox scope option Add a 'Global (shared by all)' option to the sandbox scope selector. Uses a constant '{global}' template variable that always resolves to 'global', so all users and chats share one sandbox container. * refactor(web): replace popover with dialog for system status details Replace the dropdown popover with a proper Dialog for runtime status details. Add a small info button on the System Status card that opens the dialog. Session details now show in a spacious 2-column grid layout with full image name, backend, CPU/memory, network, mount path, and created/last-used timestamps. * fix(web): widen system status dialog and fix scroll border issue Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move overflow-y-auto to an inner container with overflow-hidden on DialogContent to prevent padding bleed at scroll edges. * feat(web): add tooltips for truncated fields in system status dialog Wrap session_id, image, and mount path fields with Tooltip components so hovering over truncated text shows the full value. * feat: add download button * feat: successfully install * feat: delete old filter * feat: youhua frontend * fix: align box runtime launch args * feat: translate * feat: refactor market * feat: youhua qianduan * chore: rename extension zh translation * feat(extensions): unify extensions endpoint and refresh extensions page UX - Rename /home/plugins route to /home/extensions and update all sidebar links. - Add unified GET /api/v1/extensions returning plugins, MCP servers and skills, sorted by name; replace the three separate frontend fetches with this single call. - Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/ Switch/Label) and clean up hardcoded color tokens on the extension card. - Add a localStorage-persisted "Group by type" switch that, when enabled in the All Types tab, renders extensions grouped by type with a compact section header. - Show a spinner while loading and rename the empty-state copy from "No plugins installed" to "No extensions installed". - Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): fallback lucide icon when extension icon is missing Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension card when the icon URL is empty or the image fails to load. Picked icons distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to avoid visual collision with plugin component badges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): unify installed-extensions list with plugins, MCP and skills - Render plugins, MCP servers and skills together under the "Installed Extensions" sidebar entry, alphabetically sorted to match the list page. - Resolve per-item routes by extension type (plugin -> /home/extensions, mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover context menu on extensionType === 'plugin'. - Lift the "group by type" toggle into SidebarDataContext (still persisted in localStorage) so the sidebar groups items with section headers whenever the list page has the toggle enabled. - Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the LangBot blue for MCP, skill, and missing-icon plugin items, overriding the SidebarMenuSubButton svg color rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): mobile-friendly layout for extensions and add-extension pages - Stack the extensions page header vertically on small screens, let the filter Tabs scroll horizontally if they overflow, hide the debug button label below sm and let the install/debug controls wrap. - Constrain the debug popover and its inputs to the viewport width so they no longer overflow on phone-sized screens. - Drop the card grid from a fixed 30rem column to a min(100%, 22rem) column at base / 28rem at sm, and reduce the gap, so cards render cleanly at 360px+ widths in both flat and grouped views. - Make the add-extension header actions wrap on lg- viewports and the install dialog responsive instead of a hard 500px box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: change ui * feat: delete version for mcp and skills * fix: constrain home page content width * fix: preserve monitoring card borders under sticky filters * fix(box): restore sandbox config and shared mcp runtime * fix(box): harden sandbox session isolation * fix(skill): remove auto activation setting * feat(skill): align skill system with Claude Code's Tool Call design - Replace text marker activation with `activate` tool (Tool Call mechanism) - Replace 7 authoring tools with 2: `activate` + `register_skill` - Add builtin skills loading from templates/skills/ - Add create-skill as first builtin skill - Remove SKILL_ACTIVATION_MARKER and text detection methods - Tool Result returns SKILL.md content (protects KV Cache) This aligns with Claude Code's progressive disclosure pattern: - Metadata (name+description) always visible in tool description - SKILL.md body loaded on activate via Tool Call - Bundled resources accessible through virtual path mapping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(tools): add glob and grep native sandbox tools Add file discovery and content search capabilities to the sandbox: - glob: Find files by pattern (supports ** recursive matching) - grep: Search file contents with regex patterns Both tools respect skill package paths and include safety limits (max 100 files for glob, max 200 matches for grep). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(skill): add skill file browsing capability - Add API endpoints for listing/reading/writing skill files - Add FileTree component in SkillForm for directory browsing - Users can now view scripts/, references/, assets/ directories - Files can be selected and edited in the instructions textarea - Add translations for new file browsing features Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): copy builtin skills to data/skills on startup - Builtin skills (templates/skills/) are now copied to data/skills/ - Users can view and manage builtin skills in the UI - Rename SkillAuthoringToolLoader to SkillToolLoader Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): improve file browsing and fix path handling - Fix nested directory display in skill file tree (preserve root entries) - Fix file content display when clicking files in skill browser - Add skill manager and tool manager as proper package modules - Separate fileContent state to allow editing non-SKILL.md files Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(toolmgr): correct skill_tool_loader attribute name Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call and shutdown methods to match the attribute defined in initialize(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(native): update tool descriptions to use register_skill Replace references to removed import_skill_from_directory with register_skill in exec/write/edit tool descriptions. * feat(toolmgr): enhance tool initialization with backend availability checks * refactor: remove unused imports and clean up code in various files * feat: polish extension detail pages * feat: persist sidebar list expansion * fix: refine extension ui and backend errors * fix: align add extension marketplace ui * feat: manage skills through box runtime * feat: support github skill installation * fix: import github skill directories * feat: install market extensions from card click * feat(web): improve skill import flow * feat: polish extension import flow * fix(mcp): stabilize shared box managed processes * fix(web): improve backend retry and sidebar scrolling * docs(review): refresh box architecture review for feat/sandbox Sync the docs/review/ suite to the current state of the feat/sandbox branch (both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review. - box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b} config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill Tool Call activation, shared multi-process MCP container, SkillManager, BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect - box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail image conflict, frontend monitoring card) into a Resolved section; add new P0 (INIT/backend ordering), P1 (extra_mounts immutability after container creation), P2 (skill_store test gap, integration tests not in CI) - box-session-scope.md: add §0 Implementation Status — Phase 1 shipped, MCP unification landed earlier than originally scoped - box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC), add 7 new test files including SDK backend_selection/e2b/skill_store - box-tob-analysis.md: connection recovery now满足基本要求; add E2B and backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat - box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned with Plugin Runtime; revise remaining gaps (WS auth, shared base class) * refactor(box): use unified env-override mechanism for box.local config The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two places (connector._get_box_config and service._local_config), duplicating logic that LoadConfigStage._apply_env_overrides_to_config already provides generically via the SECTION__SUBSECTION__KEY convention. - Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight from instance_config (the unified BOX__LOCAL__* overrides are already applied before BoxService initializes) - Harden _load_allowed_mount_roots to accept a comma-separated string, since the generic mechanism stores a freshly-created key as a raw string when config.yaml has no box.local.allowed_mount_roots entry - docker-compose: rename the langbot container env vars to BOX__LOCAL__* (the canonical convention); remove them entirely from the langbot_box container — the Box runtime never reads box.local from env/config.yaml, it is configured via the INIT RPC action Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: repair stale skill/sandbox tests for feat/sandbox The skill subsystem moved to Tool-Call activation and a Box-managed skill store; several tests still asserted removed APIs and a sys.modules stub leaked across the suite. Full unit suite now green (was 23 failing). - test_skill_tools: drop TestSkillManagerActivation (text-marker API removed); rewrite TestSkillActivationHelper around the current skill.activation.register_activated_skill; replace the CRUD TestSkillAuthoringToolLoader with TestSkillToolLoader covering the current activate/register_skill tools and sandbox-availability gating - test_tool_manager_native: ToolManager attr is skill_tool_loader (not skill_authoring_tool_loader); native loader now exposes 6 tools (exec/read/write/edit/glob/grep) and requires initialize() with a backend-available get_status() - test_localagent_sandbox_exec: remove obsolete activation-marker leakage tests and their helper providers - test_model_service / pipeline conftest: give the mocks skill_mgr=None so PreProcessor's local-agent skill-binding guard short-circuits - test_n8nsvapi: stop permanently overwriting sys.modules ('langbot.pkg.provider.runner' etc.); save and restore around the import so other modules get the real LocalAgentRunner base class Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(tests): run unit tests on every push to feat/** branches - Add feat/** to push branches so long-lived feature branches are tested on every push (they accumulate large changes before a PR) - Drop the push path filter entirely: every push to master/develop/ feat/** now runs the full unit suite (the old 'pkg/**' filter never matched the real source path 'src/langbot/pkg/**', so backend-only pushes silently skipped tests) - Fix the same broken path glob on the pull_request trigger ('pkg/**' -> 'src/langbot/pkg/**') Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skill): harden mount/reload paths and HTTP errors against stale skill cache The Box backends behave inconsistently when extra_mounts reference a missing host directory (nsjail aborts the entire sandbox start, Docker silently creates a root-owned empty dir on the host, E2B silently skips the upload). The cache in skill_mgr.skills is only refreshed on in-process mutations, so out-of-band changes — container rebuilds, manual rm in the box volume, anything the LangBot API didn't drive — leave a stale skill that later produces one of those bad mount paths. - box/service.py: build_skill_extra_mounts now filters skills whose package_root is not isdir on the LangBot-visible filesystem and logs a warning, instead of passing the bad mount through to the backend - skill/manager.py: reload_skills (Box path) drops skills whose package_root is missing on the LangBot-side filesystem before they reach the in-memory cache, with a summary warning - api/http/controller/groups/skills.py: file/CRUD handlers now also catch BoxError (RuntimeError subclass, previously slipping past ``except ValueError`` and surfacing as 500); list/get handlers gain a try/except so a transient Box RPC failure becomes a clean 400 instead of a stack trace Tests added for build_skill_extra_mounts (skip missing, skip empty, no skill manager) and SkillManager.reload_skills (drop missing on Box path). Full unit suite: 279 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(box): add box.enabled toggle and gate consumers on availability Make the Box sandbox runtime optional. When ``box.enabled`` is false in config (or when an enabled Box fails to connect), every dependent feature degrades to the same disabled-state UX rather than crashing or silently falling back to less safe code paths. Backend: - config.yaml: new top-level ``box.enabled: true`` flag (default true) - BoxService: - Read box.enabled on construction - initialize() short-circuits when disabled — no remote WS connect, no stdio subprocess fork - _on_runtime_disconnect is a no-op when disabled (no reconnect loop on a deliberately-off service) - get_status() now exposes ``enabled`` so the frontend can tell "disabled in config" from "configured but failed" - MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to be available, not just installed - MCP _init_stdio_python_server: when ap.box_service exists but is unavailable, refuse the stdio server with an actionable error instead of silently falling through to host-stdio (which bypasses the sandbox the operator asked for). Setups without ap.box_service installed at all keep the legacy host-stdio fallback for pre-Box dev mode - SkillService._require_box_for_write: refuses create/update/install/ write_skill_file when ap.box_service is installed but unavailable. Distinguishes disabled vs failed in the error message so the UI can surface the right hint. Legacy setups (no ap.box_service) keep the local fallback path — that distinction is what keeps the existing local-skills tests valid Tests: - Box disabled-state behavior (4 cases) - Skill write refusal in disabled & failed states (7 cases) - MCP stdio runtime info policy updated to match new refuse-when-down behavior Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface Box disabled/unavailable state across consumers When Box is disabled in config (``box.enabled = false``) or fails to connect, every dependent UI surface now degrades visibly: - ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``, ``disabled`` (config-off) and a single ``hint`` key so callers don't have to re-derive the three states - ``BoxUnavailableNotice`` reusable Alert banner driven by that hint - Dashboard SystemStatusCards: three-state dot + label (connected / disabled-gray / disconnected-red); disabled state shows the ``boxDisabled`` hint, failed state continues to show the connector error. Plugin block kept untouched - Skills page (create view) and SkillDetailContent (edit view): Save button disabled and banner inserted above the form when Box is unavailable — matches the backend gate added in the previous commit - PipelineExtension skill section: ``enable_all_skills`` switch, Add Skill button and Remove buttons all gate on Box availability; banner inline under the section header - PipelineFormComponent: banner above the ``local-agent`` stage card when Box is unavailable, since that stage carries the sandbox-bound ``box-session-id-template`` field - Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale files updated with ``boxDisabled`` / ``boxUnavailable`` / ``boxRequiredHint`` strings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(box): document the box.enabled toggle and gate behavior matrix - docker-compose: move ``langbot_box`` under compose profiles (``box`` and ``all``) so ``docker compose up`` no longer requires the sandbox container. Inline comment explains how to pair the profile choice with ``box.enabled`` so the langbot service does not thrash trying to reach a runtime that was never started - docs/review/box-architecture.md: - Annotate ``box.enabled`` in the config.yaml example, listing the exact side effects (no remote/stdio connect; tools/skills/MCP stdio off; reads still work) - Replace the bare compose snippet with the actual profile-driven invocation and the BOX__ENABLED pairing - New "关闭/连接失败时的行为矩阵" section: a single table mapping every consumer (native tools, activate/register_skill, stdio MCP, skill list/CRUD, pipeline AI config, extensions page, dashboard) to its disabled-state behavior, plus the legacy ``ap.box_service`` distinguisher note Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip The previous commit hard-coded a BoxUnavailableNotice banner above the ``local-agent`` stage card. That works, but it shouts at the user about every field in that stage when in reality only one field — ``box-session-id-template`` — depends on the sandbox. Use the dynamic-form schema's existing variable-injection mechanism (``__system.*`` references via ``systemContext``) and add a sibling to ``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays visible, becomes inert, and an info icon next to its label exposes the reason on hover. The rest of the AI tab is left untouched. - entities/form/dynamic.ts: extend IDynamicFormItemSchema with ``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject`` - DynamicFormComponent: evaluate disable_if with the same resolver as show_if; OR the result into isFieldDisabled; render an Info tooltip trigger next to the label when the condition matches - ai.yaml metadata: attach disable_if (__system.box_available eq false) and a localized disabled_tooltip to box-session-id-template - PipelineFormComponent: drop the BoxUnavailableNotice import and the per-stage banner; pass ``systemContext={ box_available: boxAvailable }`` only for the local-agent stage so other stages aren't paying the re-render cost Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): friendly UI message when stdio MCP refused by Box state Previously the MCP detail dialog dumped the raw RuntimeError text from ``_init_stdio_python_server`` — English-only, prefixed with "Failed after 4 attempts", and exposing internal config names. The retry wrapper also kept retrying a refusal that is deterministically going to fail again, polluting logs. Replace the raw text with a structured signal: - New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio refusal path sets it before raising and uses a short opaque discriminator (``box_disabled_in_config`` / ``box_unavailable``) as the message body — never user-facing - ``_lifecycle_loop_with_retry`` short-circuits on ``BOX_UNAVAILABLE``: surfaces the error immediately, no retries, no "Failed after N attempts" prefix. Silences the warning storm seen during smoke-testing - ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``, ``retry_count``, ``box_session_id``, ``box_enabled`` to match what the backend already returns in get_runtime_info_dict() - Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and ``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect ``error_phase === 'box_unavailable'`` and render a two-line localized notice: state line ("Box disabled / unreachable") plus remediation line ("enable Box or switch to http/sse") - 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get ``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``, ``mcp.boxStdioRefusedSuggestion`` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp-web): block stdio MCP creation at the form when Box is unavailable When Box is disabled in config (``box.enabled = false``) or unreachable, saving a new MCP server in stdio mode produced one that could never start — the user would only learn that from the runtime error on the detail page. Stop the user before they save instead. Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog ``MCPFormDialog.tsx``) now: - Disable the ``stdio`` option in the mode select when Box is unavailable, with a small "(requires Box)" suffix so the reason is obvious. Existing stdio configs still display their current value - Show ``BoxUnavailableNotice`` inline under the mode select when the currently-selected mode is stdio and Box is unavailable, so editing a stale stdio config makes the cause visible - Disable the Save / Submit button while stdio is selected under that condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop so the parent ``MCPDetailContent`` can disable both its Submit and Save buttons. ``MCPFormDialog`` disables its Save button locally - Refuse the submit handler too (Enter-key path) with a toast carrying the same i18n message i18n: ``mcp.boxRequired`` (short tag in the disabled option) and ``mcp.stdioBlockedByBoxToast`` added to all 8 locales. Backend runtime gate (``_init_stdio_python_server`` refusal + ``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place as the last line of defence for API bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): prevent plugin config form overflow * refactor(skill): remove all local-filesystem fallbacks; Box is the sole source Skills now flow exclusively through the Box runtime. Every read and write method funnels through ``_box_service()``; when Box is unavailable (disabled in config, connection failed, or simply not installed) the operation either returns an empty surface (``list_skills`` → []) or raises with a clear ``Box runtime ... not initialised / disabled / unavailable: ...`` message via the new ``_require_box(action)`` helper. Why: the legacy local-fallback path scanned ``data/skills/``, but Box manages its own ``box.local.skills_root`` (default ``data/box/skills/``). The two diverging directories caused stale / phantom skill lists when Box flapped, and the local-fallback writes silently bypassed all the sandboxing the operator had configured. SkillService (``api/http/service/skill.py``): - New ``_require_box(action)`` returns the box service or raises a structured ValueError. ``_require_box_for_write`` kept as alias - ``list_skills`` → returns [] when Box is down so the UI can render the disabled banner cleanly - ``get_skill`` / ``get_skill_by_name`` → return None - All read-file / write-file / scan-dir / create / update / delete / install / preview methods → ``_require_box`` then box delegate. Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview pipelines) removed entirely SkillManager (``pkg/skill/manager.py``): - ``reload_skills`` returns early with empty cache when Box is down. data/skills/ discovery loop removed - ``refresh_skill_from_disk`` now just reports cache presence; the on-disk re-parse is gone since Box is the only writer Tests: - Drop 11 obsolete test_skill_service.py tests that exercised the removed local-fallback paths (create/install/file/delete/update) - Add list-empty + read-refused tests; flip the legacy-allow test to legacy-refuses-too - Rewrite refresh_skill_from_disk test to match the new behaviour Several helper methods (_managed_skill_path, _resolve_skill_path, _preview_skill_candidates, _install_preview_candidates, etc.) are now unreachable; a follow-up commit will prune them so this diff stays reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(skill): prune dead local-filesystem helpers left over from Box migration Follow-up to the Box-only refactor. The previous commit removed the local-fallback BRANCHES from every public method; this one removes the HELPERS those branches called, which are now unreachable. SkillService (service/skill.py): 787 → 449 lines Removed: scan_directory (sync), _read_skill_package, _write_skill_md, _resolve_create_field, _managed_skill_path, _managed_install_root_for_package, _normalize_package_root, _resolve_skill_path, _find_skill_entry, _discover_skill_directories, _safe_extract_zip, _extract_uploaded_skill_to_temp, _download_github_skill_to_temp, _resolve_github_source_root, _build_preview_target_dir, _preview_skill_candidates, _select_preview_candidates, _install_preview_candidates, _preview_source_root, _resolve_installed_skills, plus the module-level _FRONTMATTER_FIELDS and _build_skill_md. Kept (still needed by the surviving GitHub-import path): _download_github_asset, _download_github_skill_directory_as_zip, _find_github_skill_archive_entry, _copy_github_skill_directory_to_zip, _is_github_skill_md_url, _parse_github_skill_md_url, _resolve_github_skill_md_package_name, _validate_github_asset_url, _uploaded_skill_target_stem, _validate_skill_name. Imports dropped: shutil, tempfile, yaml, ....utils.paths. SkillManager (skill/manager.py): 187 → 88 lines Removed: get_managed_skills_root, _discover_skill_directories, _find_skill_entry, _load_skill_file, _normalize_package_root. Imports dropped: datetime, parse_frontmatter, paths. Tests: - test_skill_service.py: drop the 3 sync scan_directory tests + skill_service fixture + _create_skill_file helper - test_skill_tools.py: drop test_load_skill_file_success; rename TestSkillManagerPackageLoading → TestSkillManagerCache Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skill): re-inject skill index into local-agent system prompt The contributor's original PR (#1917) appended an ``Available Skills`` index to the system prompt before the LLM saw the user message, so the LLM could decide whether to activate a skill. ``7145447b`` removed the text-marker activation flow and, together with it, the entire system prompt injection — but the Tool Call replacement only put the available skills inside the ``activate`` tool's description. In practice the LLM ignores tool descriptions for selection and goes straight to native tools, so user-visible skill activation silently broke. Restore the injection, adapted for the Tool Call era: - SkillManager regains ``get_skill_index(bound_skills)`` and ``build_skill_aware_prompt_addition(bound_skills)``. The addendum carries only ``name (display_name): description`` for each pipeline-visible skill plus one instruction line pointing at the ``activate`` tool. No SKILL.md contents — KV cache stays clean - PreProcessor appends the addendum to the first system message (or inserts a new one) of ``query.prompt.messages`` for the local-agent runner. Handles plain-string and ContentElement[] bodies. Skips cleanly when no skills are visible - 3 new test_preproc cases: injection happens, bound-skills subset honoured, empty addendum touches nothing. 280 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(box): downgrade get_status.available when backend probed unavailable Until now ``BoxService.get_status`` returned ``available: true`` whenever the runtime connector was healthy, even if the runtime itself reported ``backend: { available: false }`` (operator selected nsjail without the binary, Docker daemon crashed mid-session, E2B credentials wrong, ...). The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the top-level flag and showed "connected" while every actual call to native exec or skill management would fail. The native-tool loader already polled ``status.backend.available`` independently and hid its tools correctly, but every other consumer (dashboard banner, the disabled-state hint, the LLM-facing message) disagreed with it. Combine the two in the payload: ``available = self._available AND status.backend.available``. When ``backend.available`` is false we now also surface a ``connector_error`` that names the backend ("Configured sandbox backend \"nsjail\" is unavailable") so the dialog shows the actionable reason instead of an empty error pane. The detailed ``backend`` object is preserved unchanged for the dialog. Internal ``box_service.available`` (used by ``skill_service`` writes, ``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally NOT changed — it still tracks connector health only, so a backend blip does not trigger spurious reconnect loops. Tests: - ``test_get_status_downgrades_available_when_backend_dead`` — exercise the new branch (connector OK, backend.available=false → top-level available=false, connector_error mentions the backend name) - ``test_get_status_keeps_available_true_when_backend_ok`` — guard against regressing the happy path Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary): ``GET /api/v1/box/status`` now returns ``available: false`` with the named connector_error, instead of the previous misleading ``available: true``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface the specific Box failure reason in unavailable banner When Box is configured but the runtime reports its backend is dead (e.g. ``box.backend = nsjail`` but the binary is missing, or Docker daemon crashed), the backend now returns a structured ``connector_error`` like ``Configured sandbox backend "nsjail" is unavailable``. The previous notice only said "Box sandbox is unavailable" + a generic "enable Box" hint, hiding the actionable detail. - ``useBoxStatus``: derive ``reason`` from ``status.connector_error``. Only exposed for the failed-state (``hint === 'boxUnavailable'``), since the disabled-by-config message already carries its reason - ``BoxUnavailableNotice``: insert the reason as a small monospaced line between the state message and the action hint. The disabled variant is unchanged (operator chose the state) - Wire ``reason`` through every existing call site (Skills page + detail, PipelineExtension, both MCP forms). Old unused ``context`` prop dropped Net layout (3 lines, still compact): ⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ... Configured sandbox backend "nsjail" is unavailable This feature requires the Box runtime. Enable it in config ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reconcile master's unit tests with feat/sandbox refactors The merge from master brought in new unit tests that target pre-refactor APIs on feat/sandbox. Reconcile each: - factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills dict + inert prompt-addition builder) and a Mock pipeline_service so the PreProcessor skill-index injection branch can run end-to-end in tests. - pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so pipeline.stage is fully initialised before any individual stage test (preproc, longtext, ...) tries to lazy-load it. Without this preload, running test_preproc.py in isolation hit a circular-import error via the stage -> app -> pipelinemgr -> stage chain. - provider/test_tool_manager.py: ToolManager now probes four loaders (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in the execute_func_call fixture and assert all four shutdowns fire. - utils/test_paths.py: drop the three cwd-dependent _check_if_source_install cases. The refactor walks Path(__file__).resolve().parents looking for pyproject.toml + main.py, so cwd no longer factors in and there's no file read to mock-fail. The positive case and caching test still apply. - utils/test_version.py: delete entirely. is_newer and compare_version_str were removed when VersionManager was refactored to use the Space API for release checks (1b4107a9); the tests targeted a surface that no longer exists. * refactor(box): launch box runtime via the lbp CLI subcommand Mirror the plugin runtime: box is now started through the same CLI entry point (langbot_plugin.cli) instead of the box module directly. - docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box` (WebSocket is the default transport, no flag needed — matches `rt`). - box/connector.py: both subprocess launch sites (_start_local_stdio and the Windows _start_subprocess_then_ws path) invoke `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport. - docs/review: update stale `-m langbot_plugin.box[.server]` references. Pairs with the SDK change that removes box's direct-launch entry points (python -m langbot_plugin.box / .box.server) and the legacy --mode flag. * chore: bump langbot-plugin beta 1 * fix(ci): resolve langbot-plugin from PyPI and clear lint failures CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image. Root causes and fixes: - pyproject.toml had a [tool.uv.sources] editable override pinning langbot-plugin to ../langbot-plugin-sdk. That path only exists in a paired local checkout, so `uv sync` failed on every CI runner ("Distribution not found"). Remove the override and regenerate uv.lock so langbot-plugin==0.4.0b1 resolves from PyPI, matching master. - tests/integration/api/test_pipelines.py: the pipeline extensions endpoint now calls ap.skill_service.list_skills(); add the missing skill_service mock to the fake_pipeline_app fixture (the test came from master, the endpoint change from feat/sandbox). - Apply ruff format to three src files and prettier to three web files that had committed formatting drift, failing `ruff format --check` and `pnpm lint`. * chore: bump beta version * docs: remove BOX_BACKEND override reference * fix(pipelines): stop attributing dashboard debug WS to bound web_page_bot The dashboard pipeline-debug WebSocket (/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket (/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths, but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when the same pipeline happened to be bound to a web_page_bot, passed that bot as `owner_bot` into `handle_websocket_message`. The adapter then used the page bot's listeners + adapter for the request, so debug sessions were logged as "page bot" activity in the dashboard. Debug sessions must always run under the built-in websocket_proxy_bot. Remove `_find_owner_bot`, drop the `owner_bot` parameter from the debug-path `_handle_receive`, and call `handle_websocket_message` without it so the adapter takes its default proxy-bot branch. The embed handler still resolves and passes its `runtime_bot` for the page-bot path, so attribution there is unchanged. * fix(plugin): install marketplace MCP from canonical mode + extra_args _install_mcp_from_marketplace read the dropped `mcp_data.config` field and reconstructed mode/extra_args by guessing from the URL — which lost stdio's command/args/env/box entirely, so stdio MCP installs from the marketplace always failed. Use the Space record's canonical `mode` and `extra_args` directly (the same shape stored in mcp_servers), and gate the install on `mode` instead of the removed `config`. After a successful install, best-effort POST to the marketplace install endpoint to bump install_count. * feat(web): show recommendation lists in plugin market; mixed-type icons The marketplace recommendation lists (curated rows from Space) were never mounted in the plugin market page. Wire them in: - fetch recommendation lists on mount and render them above the extension grid, only when no search/filter is active. Recommendation lists now mix plugins, MCPs and skills, so resolve each card's icon by type (plugin / mcp / skill marketplace icon URL) instead of always using the plugin icon endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): auto-open install dialog from one-click deep link Accept a deep link from LangBot Space's one-click install: /home/add-extension?install=1&extension_type=<plugin|mcp|skill>&author=&name=&version= On mount, populate the install info, open the confirm dialog directly, and strip the params from the URL. Reuses the existing marketplace install flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: push marketplace URL to runtime; fix market client base race - On connecting to the plugin runtime, push the configured space.url via the new SET_RUNTIME_CONFIG action so the runtime downloads plugins from the same Space, instead of relying on its own CLOUD_SERVICE_URL env/default. Wrapped in try/except so an older SDK without the action degrades gracefully. - web: the plugin market fetched recommendation lists (and listings) via the sync cloud client before its baseURL was resolved from system info, so it hit the default space.langbot.app. Await getCloudServiceClient() before the initial fetches and for the recommendation list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): don't show MCP "connection failed" while still connecting The MCP status UI rendered "连接失败" for any non-connected state, so during a normal connection attempt the subtitle showed "连接失败" while the status pill below it showed "连接中..." — contradictory. Only treat an explicit ERROR (or box-unavailable) status as failed; a CONNECTING or initial/unresolved status now shows "连接中". Applied to the MCP detail form (subtitle + StatusDisplay) and the MCP server card. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): type-aware install dialog + refresh sidebar after install The marketplace install confirm dialog was hardcoded to "安装插件 / 确定要安装插件 X 吗" for every type. Make it type-aware (plugin / MCP / skill) and show more info: type chip, author/name id, and version when present. Also refresh all sidebar extension lists (plugins, MCP servers, skills) when an install task completes, so the newly-installed extension appears immediately regardless of type (previously only refreshPlugins ran). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): richer install dialog (icon + name + description), drop redundant type row The install dialog already states the type in its title, so the "类型" row was redundant. Replace the info box with the extension's icon (avatar), display name, author/name id + version, and description — built from the PluginV4 for in-app installs and from the icon endpoint by type for the one-click deep link. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): TDZ crash in add-extension (installIconURL before installInfo) installIconURL was computed above the useState declaration of installInfo, causing "Cannot access 'installInfo' before initialization" (500) on the add-extension page. Move the computation below the state declarations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): redesign install-progress dialog for MCP/skill The progress dialog showed plugin-only stages (download + dependency install) for every type. MCP/skill have no such steps, so show a single "installing → done/failed" row for them (MCP: adding & connecting the server; skill: installing the package) while keeping the detailed download/deps stages for plugins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): add missing market.componentName i18n keys The marketplace component filter (and component badges) used market.componentName.{Tool,Command,EventListener,KnowledgeEngine,Parser,Page} but those keys only existed under plugins.componentName, so the market UI showed raw keys. Add a componentName block to the market namespace (zh-Hans + en-US; other locales fall back to zh-Hans). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): sidebar extensions refresh button + full-name tooltip - Add a refresh button to the installed-extensions category header in the sidebar; it re-fetches plugins + MCP servers + skills and spins while loading. - The sidebar item tooltip now shows the extension's full name (with the description below when present), so truncated MCP/extension names are readable on hover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(plugin-market): rename component filter to "插件组件" with hint tooltip + persist filters - Rename the in-app plugin market component filter label to "插件组件" / "Plugin Component" - Add an Info icon tooltip explaining what plugin components are (Tool / Command / EventListener, etc.) - Persist filter selections (type / component / tags / sort) in localStorage so they survive reloads; restored on mount (URL type param still wins) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): restore missing "页面"(Page) component filter option The market component-filter list on this branch was a diverged rewrite that dropped the Page component kind master had added. The i18n key (market.componentName.Page) already existed; re-add the Page entry to the componentOptions list so plugins providing Page components can be filtered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(i18n): reword plugin component filter hint Drop the redundant "插件组件是" lead-in and mention that components extend LangBot's capabilities; mirror the wording in en-US. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(i18n): backfill missing market/addExtension keys in 6 locales check-i18n surfaced that market.componentName.*, market.filterByComponentHint and the addExtension.install* keys existed only in en-US/zh-Hans. Backfill them for es-ES, ja-JP, ru-RU, th-TH, vi-VN and zh-Hant (reusing each locale's existing component-name translations) and align the filterByComponent label with the new "Plugin Component" wording. check-i18n now passes for all locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(plugins): relabel "group by type" as "group by format" The installed-extensions grouping is by extension format (plugin / MCP / skill), so rename the toggle label accordingly across all 8 locales (key unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): cursor-pointer on tag filter trigger The TagsFilter Select trigger used the default cursor; add cursor-pointer so the tag filter is clearly clickable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sidebar): show edition badge (Community / Cloud) in logo area Add a small badge next to the LangBot name in the sidebar header that reflects systemInfo.edition: a neutral "Community" badge for the community edition and a blue "Cloud" badge for the cloud edition. Adds sidebar.editionCommunity / sidebar.editionCloud across all 8 locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(sidebar): unify zh-Hans cloud edition label to 云端版 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sidebar): edition badge - drop hover, use "Cloud" in all locales The edition badge is not interactive, so remove the hover background on the cloud badge. Also use the literal "Cloud" label uniformly across all locales instead of localized variants (云端版/クラウド版/...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(box): cap tool-call loop and run workspace-quota walk off the event loop Two robustness fixes that bite under normal sandbox usage (not just attack), hardening the self-hosted community edition before release: - localagent: cap the tool-call loop at MAX_TOOL_CALL_ROUNDS (128). A looping or adversarial model could otherwise emit tool calls indefinitely (each potentially a sandbox exec), producing a non-terminating request and runaway cost. The cap is generous enough not to interrupt legitimate multi-step agentic workflows. - box.service: make _enforce_workspace_quota async and run the recursive workspace scan via asyncio.to_thread. It ran on every quota-enforced exec and a large workspace would block the whole asyncio runtime (all bots/pipelines). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): refresh box docs; trim issue list to SaaS blockers only Community self-hosted edition is release-ready, so the box review docs are updated to current state (date 2026-06-02 + status note) and box-issues.md is rewritten to keep only the SaaS / multi-tenant / network-exposed release blockers (S1-S8): unauthenticated control plane, no per-pipeline exec authorization, unbounded sessions + no reaper, no kernel-level quota, mount validation gaps (/ + extra_mounts), missing container hardening, lock-around- cold-start, and the lower-severity follow-ups. Resolved items (tool-call loop cap, async quota scan, host_path mount allowlist, _is_path_under dedup) moved to a short "resolved before community release" record; community-only and pure-cleanup items dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(deps): pin langbot-plugin to 0.4.0 Track the stable SDK release (0.4.0b1 -> 0.4.0); regenerate uv.lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: WangCham <651122857@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: fdc310 <82008029+fdc310@users.noreply.github.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com>
2026-07-20 03:16:14 +00:00 · 2026-06-03 11:12:39 +08:00
parent 4054ba2a76
commit 96b041846d
161 changed files with 22518 additions and 4029 deletions
@@ -0,0 +1,595 @@
+# Box 系统架构深度分析
+
+> 更新日期: 2026-06-02
+> 状态更新: 自部署社区版已具备发布条件（box 可选、降级完善、无迁移欠债）；工具调用循环上限、配额遍历异步化、`host_path` 挂载白名单等已落地。剩余多租户 / 安全硬化项见 [SaaS 阻塞项清单](./box-issues.md)。
+> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+> 相关文档: [SaaS 阻塞项](./box-issues.md) | [Session 作用域](./box-session-scope.md) | [Runtime 对比](./box-vs-plugin-runtime.md) | [测试覆盖](./box-test-coverage.md) | [toB 分析](./box-tob-analysis.md)
+
+---
+
+## 1. 全局架构
+
+```
+┌──────────────────────────────────────────────────────────────────┐
+│                       LangBot 主进程                              │
+│                                                                   │
+│  LocalAgentRunner ──> ToolManager ──> NativeToolLoader            │
+│       │                    │              │                       │
+│       │                    │      exec / read / write / edit      │
+│       │                    │              glob / grep             │
+│       │                    │                                      │
+│       │                    ├──> MCPLoader ──> BoxStdioSession     │
+│       │                    │       (shared 容器, 多 process)       │
+│       │                    │                                      │
+│       │                    ├──> SkillToolLoader (activate 工具)    │
+│       │                    │                                      │
+│       │                    ├──> SkillAuthoringToolLoader          │
+│       │                    │                                      │
+│       │                    └──> PluginToolLoader                  │
+│       │                                                           │
+│  BoxService (门面)                                                 │
+│    ├─ Profile 管理 (locked 字段)                                   │
+│    ├─ Host mount 校验 (allowed_mount_roots)                        │
+│    ├─ Workspace quota 检查                                         │
+│    ├─ 输出截断 (head+tail)                                         │
+│    ├─ Session ID 模板解析 (resolve_box_session_id)                 │
+│    ├─ 技能挂载组装 (build_skill_extra_mounts)                      │
+│    ├─ 重连循环 (_reconnect_loop, 指数退避)                          │
+│    └─ BoxRuntimeConnector                                          │
+│         ├─ 心跳 loop (20s ping)                                    │
+│         └─ ActionRPCBoxClient                                      │
+│              │  Action RPC (stdio 或 WebSocket)                    │
+│                                                                    │
+│  SkillManager (skill_mgr)                                          │
+│    └─ 从 Box runtime 拉取 skills, 不可用时回落 data/skills          │
+└──────────────────────────────────────────────────────────────────┘
+               │
+               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│              Box Runtime 进程 (SDK 侧)                            │
+│                                                                   │
+│  BoxServerHandler (Action RPC 处理, INIT 配置注入)                  │
+│       │                                                           │
+│  BoxRuntime (session 管理 / 进程生命周期 / TTL reaper)              │
+│       │       └─ session.managed_processes: dict[pid, _ManagedProcess]
+│       │                                                           │
+│  Backend (启动时根据 box.backend 配置选择):                          │
+│    DockerBackend ──┐                                              │
+│    PodmanBackend ──┤── CLISandboxBackend                          │
+│    NsjailBackend ──┘  (本地 CLI 或 fallback 到容器内 CLI)            │
+│    E2BBackend         (云沙箱, 需要 E2B_API_KEY)                    │
+│                                                                   │
+│  BoxSkillStore                                                    │
+│    ├─ list / get / create / update / delete                       │
+│    ├─ scan_skill_directory / read_skill_file / write_skill_file   │
+│    └─ preview_skill_zip / install_skill_zip (zip 或 GitHub)        │
+│                                                                   │
+│  aiohttp 单端口服务 (默认 :5410):                                    │
+│    /rpc/ws                                       — Action RPC      │
+│    /v1/sessions/{id}/managed-process/ws          — 默认 process     │
+│    /v1/sessions/{id}/managed-process/{pid}/ws    — 指定 process     │
+└──────────────────────────────────────────────────────────────────┘
+               │
+               ▼
+┌──────────────────────────────────────────────────────────────────┐
+│  容器 / 沙箱 (Docker/Podman 容器, nsjail sandbox, 或 E2B 远程沙箱)  │
+│  - 隔离文件系统 / 网络 / PID 命名空间                                │
+│  - 资源限制 (CPU, 内存, PID 数, 可选 workspace 配额)                 │
+│  - 主挂载 (host_path → mount_path) + 任意条 extra_mounts             │
+│      └─ Skills 通过 extra_mounts 挂在 /workspace/.skills/<name>     │
+│  - exec: 用户命令在此执行                                            │
+│  - managed process: 多个长驻进程并存 (MCP Server / 自定义服务)        │
+└──────────────────────────────────────────────────────────────────┘
+```
+
+**核心设计原则**:
+- Box Runtime 作为独立进程运行，通过 Action RPC 与 LangBot 主进程通信，两者复用 SDK 的 IO 层（Handler → Connection → Controller）
+- 一个 session_id 对应一个容器/沙箱实例。同一 session 内可并存多条 mount 与多个 managed process
+- Skill / 默认 exec / MCP Server 共享同一个 session 容器（详见 [box-session-scope.md](./box-session-scope.md)）
+
+---
+
+## 2. LangBot 侧模块
+
+### 2.1 BoxService (`pkg/box/service.py`, 722 行)
+
+应用层门面，协调 Profile、安全校验、配额、连接、Skill 挂载与 Session 模板：
+
+主要公开方法（按定义顺序）：
+
+```
+BoxService
+  ├─ initialize()                              连接 Box Runtime + 默认 workspace 准备
+  ├─ _on_runtime_disconnect(connector)         触发重连
+  ├─ _reconnect_loop(connector)                指数退避重连
+  ├─ available (property)                      连接状态
+  │
+  ├─ resolve_box_session_id(query)             从 pipeline 模板解析 session_id
+  ├─ build_skill_extra_mounts(query)           组装 pipeline-bound skill 的挂载列表
+  │
+  ├─ execute_tool(parameters, query)           Agent 调用 exec 时的入口
+  │    ├─ _apply_profile / build_spec
+  │    ├─ _validate_host_mount
+  │    ├─ _enforce_workspace_quota (phase=pre)
+  │    ├─ client.execute(spec)
+  │    ├─ _enforce_workspace_quota (phase=post)
+  │    └─ _truncate (stdout/stderr)
+  │
+  ├─ execute_spec_payload(spec_payload, ...)   内部入口（其他 loader 调用）
+  ├─ create_session(spec_payload, ...)         显式创建 session
+  ├─ start_managed_process(session_id, ...)    启动 managed process
+  ├─ get_managed_process(session_id, pid)      查询进程状态（pid 默认 'default'）
+  ├─ stop_managed_process(session_id, pid)     单独停止某个 managed process
+  ├─ get_managed_process_websocket_url(...)    返回 WS attach URL
+  │
+  ├─ list_skills() / get_skill(name)           Skill 元数据
+  ├─ create_skill / update_skill / delete_skill  Skill CRUD
+  ├─ scan_skill_directory(path)                扫描目录
+  ├─ list_skill_files / read_skill_file / write_skill_file
+  ├─ preview_skill_zip / install_skill_zip     zip / GitHub 安装
+  │
+  ├─ shutdown() / dispose()                    清理：RPC SHUTDOWN + 进程终止
+  ├─ get_status() / get_sessions() / get_recent_errors()
+  └─ get_system_guidance()                     LLM 系统提示
+```
+
+**Profile 系统**: 4 个内置 Profile（`default` / `offline_readonly` / `network_basic` / `network_extended`），`locked` frozenset 字段不可被 LLM 覆盖。参数合并顺序：Profile defaults → LLM 请求参数 → locked 强制值。
+
+**输出截断**: 默认 4000 字符上限，保留前 60% + 后 40%，中间插入 `[...truncated...]`。
+
+**Skill 挂载合并**: `execute_tool()` 调用时，`build_skill_extra_mounts(query)` 会把当前 pipeline-bound 的所有 skill 的 `package_root` 作为 `extra_mounts` 加入 BoxSpec，挂在 `/workspace/.skills/<name>`。LLM 通过 `activate` 工具显式激活某个 skill 后，工具调用才允许引用这个 skill 的虚拟路径。
+
+### 2.2 BoxRuntimeConnector (`pkg/box/connector.py`, 357 行)
+
+管理与 Box Runtime 的通信连接：
+
+- **本地 stdio**: Unix/macOS 默认路径，fork `python -m langbot_plugin.cli.__init__ box -s --ws-control-port {port}` 子进程（与 plugin runtime 统一走 `lbp` CLI 入口）
+- **本地 subprocess + WS**: Windows 本地（asyncio ProactorEventLoop 不支持 stdio pipe）
+- **远程 WebSocket**: Docker 部署 / `box.runtime.endpoint` 显式配置时，连接 `ws://{host}:{port}/rpc/ws`
+- **同步等待**: `asyncio.Event` + `wait_for(timeout=30s)` 模式确认连接
+- **心跳**: `_heartbeat_loop()` 每 20s 调用 `ping()`，失败仅 DEBUG 日志（断开检测靠 connection close）
+- **重连**: `runtime_disconnect_callback` 由 BoxService 提供，触发 `_reconnect_loop`
+- **INIT 注入**: 连接建立后立即下发当前 `box.*` 配置子树（剔除 `runtime` 私有字段），Runtime 据此初始化 backend
+
+> **历史改进**: 2026-04-16 版本本文档曾列 P0 「Box 无心跳 / 无重连」，已修复（commit `2dfd9d5d`、`c6882cf`、`5029d9c` 等）。
+
+### 2.3 BoxWorkspaceSession 工具 (`pkg/box/workspace.py`, 413 行)
+
+此文件目前提供两类能力：
+
+1. **路径与命令重写工具函数** — `normalize_host_path` / `rewrite_mounted_path` / `unwrap_venv_path` / `rewrite_venv_command` / `infer_workspace_host_path`，被 MCP loader 与 Skill 路径解析共用。
+2. **`BoxWorkspaceSession`** — 围绕 BoxService 的轻量包装，专供 MCP-in-Box 场景使用（管理一个共享 session 的 session_id、构建挂载 payload、stage host 文件到共享 workspace）。
+
+**变化点**: 早期 Skill exec 会为每个 skill 创建独立 BoxWorkspaceSession（独占 session）；当前实现已转为 `extra_mounts` 模式，Skill 不再独占容器，只追加挂载。这部分 wrapping 逻辑已从 native loader 移除。
+
+### 2.4 policy.py (`pkg/box/policy.py`, 98 行) — 仍是死代码
+
+三层安全策略设计（`SandboxPolicy` / `ToolPolicy` / `ElevatedPolicy`），全项目无任何导入或调用。详见 [SaaS 阻塞项 S2](./box-issues.md)。
+
+### 2.5 SkillManager (`pkg/skill/manager.py`, 186 行)
+
+```
+SkillManager
+  ├─ initialize()                  调用 reload_skills()
+  ├─ reload_skills()               先从 Box runtime list_skills()，
+  │                                 不可用则回落 data/skills/ 扫描
+  ├─ refresh_skill_from_disk()     单 skill 重新加载
+  ├─ get_skill_by_name(name)
+  └─ get_managed_skills_root()     返回 Box 视角的 skills_root 路径
+```
+
+skill 元数据通过 `parse_frontmatter` 解析 `SKILL.md` 头部（`name` / `description` / `instructions`），不再做整体扫描的代价（典型 < 50 个）。
+
+### 2.6 Skill activation (`pkg/skill/activation.py`, 33 行) + Skill loader 辅助
+
+历史上 skill 通过 LLM 在文本中输出 `[ACTIVATE_SKILL:name]` 标记激活；当前已改为 **Tool Call 机制**：
+
+- `SkillToolLoader` (`pkg/provider/tools/loaders/skill.py`, 157 行) 暴露 `activate` 工具，参数为 skill 名
+- 工具实现调用 `register_activated_skill(query, skill_data)`，将激活态写入 `query.variables['_activated_skills']`
+- 这种 KV-cache-friendly 模式对齐 Claude Code 设计；详见 [box-session-scope.md §4.3](./box-session-scope.md) 的 Tool Call 描述
+
+`activation.py` 现仅保留对外辅助函数（pipeline 层调用 loader 的 `register_activated_skill`）。
+
+---
+
+## 3. SDK 侧模块
+
+### 3.1 BoxRuntime (`box/runtime.py`, 599 行)
+
+核心编排器，管理 session 生命周期与 backend 调度：
+
+```
+Session 生命周期:
+
+  Client EXEC / CREATE_SESSION
+       │
+       ▼
+  _get_or_create_session(spec)
+    ├─ _reap_expired_sessions_locked()   清理 TTL 过期 session
+    ├─ 已存在? → _assert_session_compatible() → 复用
+    ├─ Backend session 失踪? → 重建 (commit c6882cf)
+    └─ 新建? → backend.start_session(spec) → 创建容器
+       │       └─ 应用 spec.extra_mounts （多挂载）
+       ▼
+  execute(spec)
+    ├─ 获取 session lock (每 session 独立)
+    ├─ backend.exec(session, spec)       在容器中执行命令
+    ├─ 更新 last_used_at
+    └─ 超时? → 销毁 session
+       │
+       ▼
+  Session 保持存活直到:
+    ├─ TTL 过期 (默认 300s，下次操作时清理)
+    ├─ 执行超时 (自动销毁)
+    ├─ 客户端 DELETE_SESSION
+    └─ SHUTDOWN
+```
+
+**关键设计**:
+- 每 session 有独立 `asyncio.Lock`，同一 session 内的命令串行执行
+- 每 session 维护 `managed_processes: dict[process_id, _ManagedProcess]`，支持多个长驻进程并存（MCP / 自定义）
+- 全局 `_lock` 保护 `_sessions` dict 的读写
+- 兼容性检查：比较核心 spec 字段，`image` 字段对不支持自定义镜像的 backend（nsjail/E2B）会跳过
+
+**Backend 选择 (`_select_backend`)**: 优先级
+1. 显式 `box.backend` 配置（`docker` / `nsjail` / `e2b`）
+2. `local` (默认) → Docker / Podman / nsjail CLI 顺序探测
+3. `get_status` 调用时若当前 backend 不可用，会尝试重新选择 (commit `e5617c7`)
+
+### 3.2 Backend 系统
+
+#### CLISandboxBackend (`box/backend.py`, 411 行)
+
+Docker / Podman 公共基类：
+
+```
+start_session(spec):
+  1. validate_sandbox_security(spec)
+  2. docker/podman run -d --rm --name <name>
+     --network none (可选)
+     --cpus/--memory/--pids-limit
+     --read-only + --tmpfs /tmp
+     -v <host>:<mount>:<mode>          主挂载
+     -v <extra.host>:<extra.mount>:..  额外挂载 (extra_mounts)
+     <image> sh -lc 'while true; do sleep 3600; done'
+  3. 返回 BoxSessionInfo
+
+exec(session, spec):
+  docker/podman exec -e KEY=VAL <container>
+    sh -lc 'mkdir -p <workdir> && cd <workdir> && <cmd>'
+
+start_managed_process(session, spec):
+  docker/podman exec -i <container>
+    sh -lc 'mkdir -p <cwd> && cd <cwd> && exec <command> <args>'
+  返回 asyncio.subprocess.Process (stdin/stdout PIPE)
+```
+
+容器以 idle 进程启动，实际命令通过 `docker exec` 执行。`--rm` 确保容器退出时自动清理。
+
+**Windows 支持**: backend 内对 Windows 路径处理与 subprocess 调用做了适配（commit `120817a`）。
+
+**孤儿清理**: 启动时枚举 `langbot.box=true` 标签的容器，instance_id 不匹配的强制删除。
+
+#### NsjailBackend (`box/nsjail_backend.py`, 552 行)
+
+轻量级 Linux 沙箱（无容器引擎依赖）：
+
+- 使用 namespace 隔离（user/mount/pid/ipc/uts/cgroup/net）
+- 挂载宿主 `/usr`/`/lib`/`/bin`/`/sbin` 只读 + 选定 `/etc` 条目
+- 每 session 创建独立目录（workspace/tmp/home）
+- 资源限制: cgroup v2 优先，fallback 到 rlimit
+- **CLI 兼容**: 通过 `shutil.which(self._nsjail_bin)` 检测系统安装版 nsjail；不存在时再尝试容器内 nsjail（commit `686fcc0`、`feed530`）
+- **无自定义镜像**: 使用宿主 OS，`image` 字段固定为 `'host'`，兼容性检查跳过 image
+
+#### E2BBackend (`box/e2b_backend.py`, 429 行)
+
+云沙箱后端（commit `75b547f` 引入）：
+
+- 通过 `e2b` SDK 与 E2B 平台通信
+- 配置：`box.e2b.api_key` / `api_url` / `template`
+- 支持 `extra_mounts`（commit `0fea9b1` 同步上传文件）
+- 无本地容器引擎依赖，适合无 Docker 的部署或 SaaS 多租户场景
+- 不支持自定义 image 字段，由 template 控制
+
+### 3.3 Server (`box/server.py`, 508 行)
+
+单端口 aiohttp 服务（默认 5410），通过路径区分（commit `8c71ec5` 合并端口）：
+
+1. **Action RPC** (`/rpc/ws`): `BoxServerHandler` 处理所有 action，包括 `INIT` 配置注入、skill store 操作等
+2. **WS Relay** (`/v1/sessions/{id}/managed-process/ws` 与 `/v1/sessions/{id}/managed-process/{pid}/ws`): 双向桥接 WebSocket ↔ 指定 managed process stdin/stdout
+
+stdio 模式同样会在 5410 启动 aiohttp，专门承担 managed process attach；Action RPC 走 stdin/stdout。
+
+### 3.4 Client (`box/client.py`, 377 行)
+
+`ActionRPCBoxClient` 封装 `Handler.call_action()` 调用：
+
+- 25+ 方法对应 25+ 个 RPC action（exec / session / managed-process / skill / status / shutdown）
+- 错误还原: `_translate_action_error()` 通过字符串前缀匹配还原 SDK 侧异常类型
+- `execute()` timeout = 300s，其他默认 15s
+- `BoxRuntimeClient` 是 ABC，供后续可能的非 RPC 实现复用
+
+包级别 `__init__.py` 显式导出：`BoxRuntimeClient`、`ActionRPCBoxClient`（commit `df9c722`）。
+
+### 3.5 Actions (`box/actions.py`, 34 行)
+
+`LangBotToBoxAction` 枚举共定义 **25 个** action：
+
+| 类别 | Actions |
+|------|---------|
+| 控制 | `INIT`、`HEALTH`、`STATUS`、`GET_BACKEND_INFO`、`SHUTDOWN` |
+| 执行 | `EXEC` |
+| Session | `CREATE_SESSION` / `GET_SESSION` / `GET_SESSIONS` / `DELETE_SESSION` |
+| Managed Process | `START_MANAGED_PROCESS` / `GET_MANAGED_PROCESS` / `STOP_MANAGED_PROCESS` |
+| Skill | `LIST_SKILLS` / `GET_SKILL` / `CREATE_SKILL` / `UPDATE_SKILL` / `DELETE_SKILL` / `SCAN_SKILL_DIRECTORY` / `LIST_SKILL_FILES` / `READ_SKILL_FILE` / `WRITE_SKILL_FILE` / `PREVIEW_SKILL_ZIP` / `INSTALL_SKILL_ZIP` |
+
+### 3.6 Models (`box/models.py`, 331 行)
+
+核心数据模型：
+
+| 模型 | 用途 |
+|------|------|
+| `BoxNetworkMode` | `OFF` / `ON` |
+| `BoxExecutionStatus` | `COMPLETED` / `TIMED_OUT` |
+| `BoxHostMountMode` | `NONE` / `READ_ONLY` / `READ_WRITE` |
+| `BoxManagedProcessStatus` | `RUNNING` / `EXITED` |
+| `BoxMountSpec` | 单条挂载（host_path/mount_path/mode）— **新增** |
+| `BoxSpec` | 执行请求；新增 `extra_mounts: list[BoxMountSpec]`、`persistent`、`workspace_quota_mb` |
+| `BoxProfile` | 4 个内置 Profile + `locked` frozenset |
+| `BoxSessionInfo` | Session 状态（含 backend_name/created_at/last_used_at） |
+| `BoxManagedProcessSpec` | 长驻进程参数（process_id/command/args/env/cwd） |
+| `BoxManagedProcessInfo` | 进程状态（status/exit_code/stderr_preview/attached） |
+| `BoxExecutionResult` | 执行结果（status/exit_code/stdout/stderr/duration_ms） |
+
+`BoxSpec` 校验器: `workdir` 默认继承 `mount_path`；`host_path` 支持 POSIX 和 Windows 路径；设置 `host_path` 时 `workdir` 必须在 `mount_path` 下。
+
+### 3.7 BoxSkillStore (`box/skill_store.py`, 647 行)
+
+新增模块（commit `4ab3502`），把 skill 持久化收归 Box runtime：
+
+```
+BoxSkillStore
+  ├─ list_skills() / get_skill(name)
+  ├─ create_skill(data) / update_skill(name, data) / delete_skill(name)
+  ├─ scan_skill_directory(path)            扫描目录返回候选 skill 包列表
+  ├─ list_skill_files(name, path)          浏览 skill 内文件树
+  ├─ read_skill_file(name, path) / write_skill_file(name, path, content)
+  ├─ preview_skill_zip(zip_bytes, ...)     不落盘预览 zip 内容
+  └─ install_skill_zip(zip_bytes, ...)     解压、校验、复制到 skills_root
+     └─ 支持 source_subdir / target_suffix（commit 1aa043f）
+```
+
+GitHub 安装路径：HTTP 层（`api/http/service/skill.py`）先 `git clone` 拉取，再走 `install_skill_zip` 或 directory 路径。Skill 文件存放于 `box.local.skills_root`（默认 `skills`，相对 `host_root`），容器内对应 `/workspace/.skills/`。
+
+### 3.8 Security (`box/security.py`, 52 行)
+
+`validate_sandbox_security()`: 黑名单校验 host_path，阻止挂载 `/etc`/`/proc`/`/sys`/`/dev`/`/root`/`/boot` 及 Docker/Podman socket。
+
+**已知缺陷**: 根路径 `/` 未拦截，用户 home 目录未拦截，是 denylist 而非 allowlist 策略。详见 [SaaS 阻塞项 S5](./box-issues.md)。
+
+### 3.9 Errors (`box/errors.py`, 33 行)
+
+| 异常类型 | 含义 |
+|----------|------|
+| `BoxError` | 基类 |
+| `BoxValidationError` | spec/参数校验失败 |
+| `BoxBackendUnavailableError` | 无可用 backend |
+| `BoxRuntimeUnavailableError` | Runtime 服务不可用 |
+| `BoxSessionConflictError` | session 已存在但 spec 不兼容 |
+| `BoxSessionNotFoundError` | session 不存在 |
+| `BoxManagedProcessConflictError` | session 已有同名 process |
+| `BoxManagedProcessNotFoundError` | process 不存在 |
+
+---
+
+## 4. 工具系统集成
+
+### 4.1 ToolManager 编排 (`toolmgr.py`)
+
+```
+ToolManager.initialize()
+  ├─ NativeToolLoader      (exec / read / write / edit / glob / grep)
+  ├─ PluginToolLoader      (插件工具)
+  ├─ MCPLoader             (MCP Server 工具)
+  ├─ SkillToolLoader       (activate 工具 — Tool Call 激活)
+  └─ SkillAuthoringToolLoader  (Skill CRUD)
+
+工具调用优先级: native → plugin → mcp → skill → skill_authoring
+```
+
+### 4.2 Native Tools (`native.py`, 846 行)
+
+| 工具 | 是否在 Box 中执行 | 是否访问宿主文件系统 |
+|------|:---:|:---:|
+| `exec`  | 是 | 否 |
+| `read`  | **否** | **是** — 直接 `open()` 宿主文件 |
+| `write` | **否** | **是** — 直接 `open()` 宿主文件 |
+| `edit`  | **否** | **是** — 直接 `open()` 宿主文件 |
+| `glob`  | **否** | **是** — 直接遍历宿主目录 |
+| `grep`  | **否** | **是** — 直接读宿主文件 |
+
+**沙箱边界不对称**: 这是刻意的设计权衡 — `read`/`write`/`edit`/`glob`/`grep` 绕过沙箱以获得性能（避免容器 I/O 开销与跨进程拷贝），但意味着 LLM 可以直接读写 `allowed_mount_roots` 下任何文件。Skill 路径经 `_resolve_host_path()` 重写，禁止穿越 `package_root`。
+
+**exec 的 Skill 分支**: 命令中引用 `/workspace/.skills/<name>` 的 skill 时：
+1. 验证 skill 已激活
+2. 单次 exec 只能引用一个 skill 包
+3. 若 skill 是 Python 项目（有 `requirements.txt` 或 `pyproject.toml`），命令会被 venv bootstrap 包裹（在 skill 挂载点内创建 `.venv`）
+4. 调用 `box_service.execute_tool()` → 走默认 session_id 与已组装好的 `extra_mounts`，**不再为每 skill 起独立 session**
+
+### 4.3 MCP-in-Box (`mcp_stdio.py`, 354 行)
+
+`BoxStdioSessionRuntime` 让 MCP stdio 服务器在 Box 容器中运行，**共享 session、多 process**模式（commit `529088e`）：
+
+```
+initialize()
+  1. 复用/创建共享 session (session_id = _build_box_session_id())
+     - persistent=True，长期保持
+  2. workspace.execute_raw(install_cmd) 安装依赖 (可选)
+  3. 将每个 MCP server 文件 stage 到 /workspace/.mcp/<process_id>/
+  4. workspace.start_managed_process(process_id=<server>)
+  5. websocket_client(ws_url) 通过 WS relay 连接
+  6. ClientSession.initialize() MCP 协议握手
+```
+
+配置 (`MCPServerBoxConfig`): `network='on'` (MCP 服务器通常需要网络)，`host_path_mode='ro'` (默认只读)，`startup_timeout_sec=120` (留时间给 pip install)。
+
+每条 MCP server 是同一 session 中的一个 managed process，独立的 `process_id`、独立 attach URL，互不阻塞。
+
+---
+
+## 5. 启动与生命周期
+
+### 5.1 启动顺序 (`build_app.py`)
+
+```
+BuildAppStage.run(ap)
+  ├─ ... (persistence, models, sessions) ...
+  │
+  ├─ BoxService(ap)
+  ├─ box_service.initialize()
+  │    └─ connector.initialize()
+  │         ├─ [stdio] fork box subprocess
+  │         ├─ [subprocess+WS] Windows 本地
+  │         └─ [remote WS] connect URL
+  │    └─ 启动心跳 _heartbeat_task
+  ├─ ap.box_service = box_service
+  │
+  ├─ ToolManager(ap)
+  ├─ tool_mgr.initialize()
+  │    ├─ NativeToolLoader   (检查 box_service.available)
+  │    ├─ PluginToolLoader
+  │    ├─ MCPLoader          (Box 可用时，stdio MCP 走沙箱)
+  │    └─ SkillAuthoringToolLoader
+  ├─ ap.tool_mgr = tool_mgr
+  │
+  ├─ ... (platform, pipeline) ...
+  ├─ SkillManager.initialize()    (从 Box runtime 加载 skill 列表)
+  └─ ... (RAG, HTTP, plugins) ...
+```
+
+BoxService 在 ToolManager **之前**初始化。ToolManager 创建 loader 时检查 `box_service.available`。
+
+### 5.2 初始化失败处理
+
+```python
+try:
+    await self._runtime_connector.initialize()
+    self._available = True
+except Exception as e:
+    self._available = False
+    logger.warning(f"Box runtime unavailable: {e}")
+```
+
+**静默降级**: Box 初始化失败不会阻止应用启动，仅导致 6 个 native tool、所有 Skill 工具和 MCP-in-Box 工具不暴露给 LLM。与 Plugin 的行为不同（Plugin 失败会抛异常）。
+
+### 5.3 销毁流程
+
+```
+app.dispose()
+  └─ box_service.dispose()
+       ├─ connector.dispose()
+       │    ├─ cancel _heartbeat_task
+       │    ├─ cancel _handler_task / _ctrl_task
+       │    └─ terminate subprocess (SIGTERM)
+       └─ loop.create_task(client.shutdown())
+            └─ RPC SHUTDOWN → Box Runtime 清理所有容器
+```
+
+Box 额外做了 RPC SHUTDOWN 通知 Runtime 主动清理容器，比 Plugin 的直接杀进程更安全。
+
+---
+
+## 6. 配置
+
+### config.yaml (重构后)
+
+```yaml
+box:
+    enabled: true         # 整个 Box 子系统的总开关。设为 false 时：
+                          #  - 不连接远程 Box runtime，不 fork 本地 stdio 子进程
+                          #  - sandbox 工具 (exec/read/write/edit/glob/grep) 不暴露给 LLM
+                          #  - skill 添加/编辑 / GitHub 安装 / 文件写入全部拒绝
+                          #  - stdio 模式的 MCP server 启动时报错（http/sse 模式不受影响）
+                          #  - skill 列表/读取保持只读可用
+                          # BOX__ENABLED 环境变量可覆盖（统一约定）
+    backend: 'local'      # 'local' (探测) / 'docker' / 'nsjail' / 'e2b'
+                          # 由 box.backend / BOX__BACKEND 选择后端
+    runtime:
+        endpoint: ''      # 外部 Runtime 的 WS 基地址 'ws://host:5410'
+                          # 留空 = 本地自管 Runtime
+    local:
+        profile: 'default'
+        image: ''                       # 覆盖 profile 默认 image
+        host_root: './data/box'         # 工作区挂载根，Docker 部署需绝对路径
+        default_workspace: ''           # 默认 '<host_root>/default'
+        skills_root: 'skills'           # Box 管理的 skill 包目录（相对 host_root）
+        allowed_mount_roots:            # 默认 ['<host_root>']
+            - './data/box'
+            - '/tmp'
+        workspace_quota_mb: null        # 配额覆盖，null = 走 profile
+    e2b:
+        api_key: ''                     # 也可走 E2B_API_KEY 环境变量
+        api_url: ''                     # 自托管 E2B 时填写
+        template: ''                    # 默认 template ID
+```
+
+> **重大变更**: 较 2026-04-16 文档，配置结构完全重组（commit `eefdea4`）。原字段 `box.profile` / `box.runtime_url` / `box.shared_host_root` / `box.allowed_host_mount_roots` 全部迁入 `box.local.*` 子表，新增 `box.backend` 与 `box.e2b.*` 配置组。
+
+### docker-compose.yaml
+
+`langbot_box` 服务受 compose profile 控制,默认 `docker compose up` **不会**启动它。需要 sandbox 时:
+
+```bash
+docker compose --profile box up        # 启动 langbot + langbot_box + plugin runtime
+docker compose --profile all up        # 同上
+docker compose up                       # 只起 langbot + plugin runtime (box 关闭)
+```
+
+若不起 `langbot_box`,需要同步在 `data/config.yaml` 中设 `box.enabled: false`(或 langbot 容器 env 加 `BOX__ENABLED=false`),否则 LangBot 会一直尝试连接不存在的 Box runtime 并报错。
+
+```yaml
+# langbot_box 的关键 volume
+volumes:
+  - ${LANGBOT_BOX_ROOT}:${LANGBOT_BOX_ROOT}         # 工作区挂载(源/目标同路径)
+  - /var/run/docker.sock:/var/run/docker.sock       # Docker backend 复用宿主 docker
+```
+
+### 关闭/连接失败时的行为矩阵
+
+`box.enabled = false` 与"启用但连接失败"在用户可观察行为上**完全一致**——都通过 `BoxService.available = False` 表达,只是 `get_status` 多返回 `enabled` 字段供前端区分文案。
+
+| 消费方 | Box 可用 | Box 不可用(disabled 或 failed) |
+|---|---|---|
+| native exec/read/write/edit/glob/grep 工具 | 暴露给 LLM | **不暴露** |
+| `activate` / `register_skill` 工具 | 暴露给 LLM | **不暴露** |
+| stdio MCP server | 在 Box 内启动 | **`_init_stdio_python_server` 抛 RuntimeError** 拒绝;不退化到宿主 stdio |
+| http/sse MCP server | 正常 | 正常(不依赖 Box) |
+| Skill 列表/读取 (`list_skills`/`get_skill`/`read_skill_file`) | 走 Box runtime | 走 LangBot 本地 `data/skills/` 只读 fallback |
+| Skill 创建/编辑/安装/写文件 | 走 Box runtime | **HTTP 400** + 明确错误信息(`_require_box_for_write`) |
+| Pipeline AI 配置中 `box-session-id-template` | 正常生效 | **前端 banner** 提示字段无效 |
+| Pipeline 扩展页 `enable_all_skills` / 绑定 skill | 可编辑 | **前端禁用** + banner |
+| 仪表盘 Box 状态卡片 | 绿点 / "已连接" | 灰点 / "已禁用"(disabled) 或 红点 / "已断开"(failed) |
+
+> 后端拒写的边界条件:如果 `ap.box_service` **完全没装**(老式 dev mode,没经过 BuildAppStage),`_require_box_for_write` 视作 no-op,保留 `data/skills/` 本地路径——以兼容历史测试与最小化设置。生产环境总会装 `ap.box_service`,因此该 fallback 不会被触发。
+
+### Pipeline 配置 (templates/metadata/pipeline/ai.yaml)
+
+`local-agent.config.box-session-id-template` 控制 session 作用域，预设：
+
+- `{launcher_type}_{launcher_id}` — 每个会话 (推荐，默认)
+- `{launcher_type}_{launcher_id}_{sender_id}` — 群聊每个用户
+- `{launcher_type}_{launcher_id}_{conversation_id}` — 每个对话上下文
+- `{query_id}` — 每条消息（完全隔离）
+
+详见 [box-session-scope.md](./box-session-scope.md)。
+
+### REST API
+
+| 端点 | 方法 | 说明 | 前端 |
+|------|------|------|:---:|
+| `/api/v1/box/status` | GET | 可用性、Profile、后端信息 | ✅ 监控页 |
+| `/api/v1/box/sessions` | GET | 活跃 session 列表 | ❌ |
+| `/api/v1/box/errors` | GET | 最近 50 条错误 | ❌ |
+| `/api/v1/skills` 等 | GET/POST/PUT/DELETE | Skill CRUD、文件浏览、zip/GitHub 安装、preview | ✅ Skill 管理页 |
+
+前端 `web/src/app/home/monitoring/components/overview-cards/SystemStatusCards.tsx` 已接入 `/api/v1/box/status`，展示 backend 名称、profile 与活跃 session 数。Sessions 与 errors API 仍未接入。
@@ -0,0 +1,76 @@
+# Box 系统 — SaaS 发布前阻塞项
+
+> 更新日期: 2026-06-02
+> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+> 相关文档: [架构分析](./box-architecture.md) | [Session 作用域](./box-session-scope.md) | [Runtime 对比](./box-vs-plugin-runtime.md) | [测试覆盖](./box-test-coverage.md) | [toB 分析](./box-tob-analysis.md)
+
+## 范围说明
+
+**自部署社区版已具备发布条件**：默认 stdio 模式、box 为可选项；box 关闭 / 不可用时后端、前端、工具、skill、stdio-MCP 均能干净降级（清晰报错、不崩溃）；配置向后兼容（旧 `data/config.yaml` 可直接启动）；无新增 ORM 模型、无迁移欠债；市场安装失败不会破坏实例。CI 全绿。
+
+本清单**只保留发布 SaaS / 多租户 / 公网暴露前必须处理的阻塞项**。社区版（可信、单运营者、内网）不受这些项阻塞——它们的风险面在"不可信调用方能直接触达 Box 控制面"或"多租户共享资源"的场景才成立。
+
+## 已解决（社区版发布前）
+
+| 项 | 处理 |
+|----|------|
+| 工具调用循环无上限 (原 #13) | `localagent.py` 增加 `MAX_TOOL_CALL_ROUNDS=128`，超限优雅终止（`cafef1a3`） |
+| 配额校验同步遍历阻塞事件循环 (原 #10) | `_enforce_workspace_quota` 改 async，工作区遍历走 `asyncio.to_thread`（`cafef1a3`） |
+| `host_path` 挂载白名单 (原 #3 的 LangBot 侧) | `pkg/box/service.py` `allowed_mount_roots` 白名单，空列表时拒绝一切宿主挂载 |
+| 重复的 `_is_path_under` (原 #12) | 已去重，仅保留一处定义 |
+| 重连 / 心跳 / Windows 兼容 / nsjail image 字段 / 前端 Box 状态接入 | 见上一轮 review 记录，均已合入 |
+
+---
+
+## SaaS 阻塞项
+
+### S1. Box 控制面无认证 — Critical
+
+- **位置**: SDK `box/server.py` — Action RPC WS (`/rpc/ws`) 与 managed-process relay (`/v1/sessions/{id}/managed-process/{pid}/ws`)
+- **现状**: 两个 WS handler 在 `ws.prepare` 后直接服务，无任何 token / 鉴权；box 默认绑定 `0.0.0.0:5410`。任何能触达该端口者可发起 `EXEC`、创建 session、attach 任意 session 的 managed-process stdin/stdout、甚至 `SHUTDOWN`。LangBot→box 的 INIT 也未下发任何凭证。
+- **缓解现状**: 默认 `docker-compose.yaml` 的 `langbot_box` 未把 5410 发布到宿主（爆炸半径限于内网 bridge）；但 box 挂载了 `/var/run/docker.sock`，同网络的任意服务（含被攻破的插件）→ 宿主 root。若运营者把 5410 发布到宿主或独立以 `0.0.0.0` 起 box，则完全裸奔。
+- **要求**: INIT 时下发 token，两个 WS 路由按连接校验（query/header）。这是 SaaS 的**头号**阻塞项。
+
+### S2. 无 exec 授权模型（policy.py 死代码） — High
+
+- **位置**: LangBot `pkg/box/policy.py`（`SandboxPolicy` / `ToolPolicy` / `ElevatedPolicy` 全项目无引用）；`pkg/provider/tools/loaders/native.py`；`pkg/provider/tools/toolmgr.py`
+- **现状**: 原生工具（`exec/read/write/edit/glob/grep`）按"box 是否可用"全有或全无地暴露，**无 per-pipeline 的 exec 网关 / 工具白名单 / 沙箱模式 / 权限提升控制**。只要 box 可用，任何使用 local-agent + 函数调用模型的 pipeline 都能跑任意 shell。
+- **要求**: 接入 policy.py（或等价机制），按 pipeline 控制是否暴露 `exec`、可用工具白名单、沙箱网络/只读模式。
+
+### S3. 会话资源无界（DoS） — High
+
+- **#5 session 数量无上限**: SDK `box/runtime.py` `_get_or_create_session` 的 `_sessions` dict 无容量限制——可变 `session_id` 的恶意调用可无限创建容器，耗尽宿主 CPU/内存/PID/磁盘。
+- **#8 无定时回收**: 过期 session 仅在 `_get_or_create_session` 时机会性清理，无独立周期任务；一波创建后转静默会永久泄漏容器。
+- **要求**: `max_sessions` 上限（拒绝或 LRU），加独立周期 reaper（如 60s）。
+
+### S4. 工作区配额无内核级限制（TOCTOU） — Med-High
+
+- **位置**: LangBot `pkg/box/service.py` `_enforce_workspace_quota`（应用层 read-then-check）；SDK 侧 `workspace_quota_mb` 仅记录/透传，无 `--storage-opt size=` 等内核/FS 限额
+- **现状**: 执行前后两次检查之间存在竞态窗口；单条命令（`dd`/`fallocate`）可在检查间隙撑爆磁盘，事后检查只能补救。
+- **要求**: Docker `--storage-opt size=` 做内核级限制，或 Redis 原子计数预留式配额。
+
+### S5. 挂载校验缺口 — Med-High
+
+- **位置**: SDK `box/security.py` `_BLOCKED_HOST_PATHS_POSIX`；`box/backend.py` 的 `extra_mounts` 处理
+- **现状**: ① SDK 黑名单仍不含 `/`（前缀匹配，`host_path="/"` 可通过，挂载整个宿主 fs）；用户 home、`/usr`、`/opt`、`/tmp` 也未拦截。② `validate_sandbox_security` 只校验 `spec.host_path`，**从不遍历 `spec.extra_mounts`**——LangBot 侧 `allowed_mount_roots` 也只校验 `host_path`。当前 `extra_mounts` 仅由 `build_skill_extra_mounts` 内部填充（agent 不可达），但缺乏纵深防御：一旦 S1 的无认证 RPC 被触达，extra_mounts 可挂任意宿主路径，两层都不拦。
+- **要求**: SDK 黑名单加入 `/`（或改白名单）；`extra_mounts` 在 SDK 与 LangBot 两侧都纳入挂载校验。
+
+### S6. 容器加固缺失 — Med
+
+- **位置**: SDK `box/backend.py` 的 `docker run` 组装
+- **现状**: 未设置 `--cap-drop=ALL`、`--security-opt=no-new-privileges`、非 root `--user`；叠加挂载 docker.sock，逃逸面偏大。
+- **要求**: 默认加上上述加固 flag（需回归常用 skill 不被破坏）。
+
+### S7. 全局锁内执行慢操作（扩展性） — Med
+
+- **位置**: SDK `box/runtime.py` `_get_or_create_session`：`self._lock` 持有期间调用 `backend.start_session()`（`docker run` / nsjail 启动 / E2B `Sandbox.create`）
+- **影响**: 冷启动（镜像拉取数秒、E2B >1s）期间串行阻塞所有并发请求——多租户负载下整个 Box runtime 停顿。降级表现是延迟而非失败。
+- **要求**: 锁内只做状态检查与注册，容器创建移到锁外。
+
+### S8. 其他硬化 / 跟进 — Low
+
+- **#9** SDK `box/server.py` 直接读 `runtime._sessions` 私有字段、绕过锁，并发下可能读到不一致状态——应加公共访问方法。
+- **#16** `pkg/provider/tools/toolmgr.py` `execute_func_call` 按优先级分发，plugin/MCP 若有同名 `exec/read/write/...` 工具会被静默遮蔽——应加命名空间或冲突告警。
+- **#4** SDK `box/runtime.py` INIT/handshake 与 backend 实例化的残留竞态（仅"纯远程 WS box 先启动、LangBot 后连"场景成立；stdio/compose 路径下 config 经 env 在 spawn 时已就位，无竞态）——应在 INIT 完成前拒绝业务 action。
+- **#11** `extra_mounts` 在容器创建时固定（SDK `runtime.py` 兼容性检查不含 extra_mounts）；长生命周期共享 session 后续新激活的 skill 不会挂上（当前缓解：创建时挂上 pipeline 绑定的全部 skill）——动态绑定场景需销毁重建或文档说明。
+- **#21** 集成测试未进 CI：容器实际执行、E2B 真机、managed-process WS attach 仅本地可跑。安全关键路径缺自动化覆盖——SaaS 前建议加 Docker-in-Docker CI stage 或合并前手动 checklist。
@@ -0,0 +1,402 @@
+# Box Session Scope Design
+
+> Date: 2026-04-18 (last reviewed 2026-06-02)
+> Status (2026-06-02): the self-hosted community edition is release-ready (box optional, clean degradation, no migration debt). Tool-call loop cap, async quota scan, and the host_path mount allowlist have landed. Remaining multi-tenant / security hardening is tracked in [box-issues.md](./box-issues.md).
+> Branch: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+> Related: [Box Architecture](./box-architecture.md) | [Box vs Plugin Runtime](./box-vs-plugin-runtime.md)
+
+---
+
+## 0. Implementation Status (2026-05-19)
+
+This document was authored as a design proposal. The current `feat/sandbox` branch
+has shipped the design largely as written:
+
+| Item | Status | Notes |
+|------|--------|-------|
+| `BoxMountSpec` + `BoxSpec.extra_mounts` | ✅ Shipped | SDK `box/models.py` |
+| Docker / nsjail / E2B backends apply extra mounts | ✅ Shipped | Last gap closed by SDK commit `0fea9b1` (E2B) |
+| `box-session-id-template` in `local-agent` pipeline config | ✅ Shipped | `templates/metadata/pipeline/ai.yaml`, default `{launcher_type}_{launcher_id}` |
+| `BoxService.resolve_box_session_id(query)` | ✅ Shipped | `pkg/box/service.py:166` |
+| `BoxService.build_skill_extra_mounts(query)` | ✅ Shipped | `pkg/box/service.py:189` |
+| Skill exec uses unified container + extra mounts | ✅ Shipped | `pkg/provider/tools/loaders/native.py` skill branch |
+| MCP-in-Box uses shared persistent session, multi-process | ✅ Shipped (earlier than originally scoped) | SDK commit `529088e`, LangBot `mcp_stdio.py:_build_box_session_id` |
+| `BoxManagedProcessSpec.process_id` + multi-process per session | ✅ Shipped | `BoxRuntime` keeps `managed_processes: dict[pid, _ManagedProcess]` |
+| Per-tenant / quota integration with templates | ❌ Not started | See [box-tob-analysis.md](./box-tob-analysis.md) |
+
+The "Phase 2 deferred" note in §10 is **out of date** — MCP unification went in on
+the same line. Pipeline-scoped (not user-scoped) MCP container is the realized
+behavior: each pipeline's MCP servers share one `mcp-<pipeline>` session, and
+user exec sessions use the template-derived id.
+
+The remaining open work is multi-tenant overlays (tenant_id in session_id,
+quota counters keyed by tenant), tracked in the toB analysis doc rather than here.
+
+---
+
+## 1. Problems
+
+### 1.1 Default exec: per-message containers
+
+Currently, `BoxService.execute_tool()` sets `session_id = str(query.query_id)` — an
+auto-incrementing integer per incoming message. Every user message creates a new sandbox
+container. Dependencies installed and in-container state are lost between messages.
+
+### 1.2 Three isolated container pools
+
+Default exec, skills, and MCP servers each manage their own containers with
+independent session IDs:
+
+| Path         | Session ID                                    | Container   |
+|--------------|-----------------------------------------------|-------------|
+| Default exec | `str(query_id)` (per message)                 | Ephemeral   |
+| Skill exec   | `skill-{launcher}_{id}-{skill_name}`          | Per skill   |
+| MCP stdio    | `mcp-{server_uuid}`                           | Per server  |
+
+This means a single logical user interaction can spawn 3+ containers that cannot
+share state, see each other's files, or reuse installed dependencies.
+
+### 1.3 Single bind mount limitation
+
+`BoxSpec` currently supports only **one** `host_path` → `mount_path` bind mount.
+This prevents mounting both a default workspace and skill directories into the
+same container.
+
+---
+
+## 2. Concept Model
+
+```
+Platform Message
+  → Query (query_id: int, auto-increment, per message)
+    → Session (launcher_type + launcher_id, per chat window)
+      → Conversation (uuid, per dialogue context within a Session)
+```
+
+| Concept       | Key                                 | Example                    | Scope                        |
+|---------------|-------------------------------------|----------------------------|------------------------------|
+| Query         | `query_id`                          | `42`                       | Single message               |
+| Session       | `launcher_type` + `launcher_id`     | `group_123456`             | Chat window (group or PM)    |
+| Conversation  | `conversation_id` (UUID)            | `a1b2c3d4-...`             | Dialogue context within a Session |
+| Sender        | `sender_id`                         | `789`                      | Individual user              |
+
+Note: in a **group chat**, all users share the same Session (keyed by `group_id`). The
+individual sender is tracked as `sender_id` but does not affect Session/Conversation routing.
+
+---
+
+## 3. Target Scenarios
+
+| #  | Scenario                       | Box Granularity                          | Desired `session_id`                                   |
+|----|--------------------------------|------------------------------------------|---------------------------------------------------------|
+| 1  | Personal assistant             | 1 Box per user, long-lived               | `{launcher_type}_{launcher_id}`                          |
+| 2  | Customer service               | 1 Box per customer, cross-pipeline       | `{launcher_type}_{launcher_id}`                          |
+| 3  | Internal employee tool         | 1 Box per employee                       | `{launcher_type}_{launcher_id}`                          |
+| 4  | Group chat shared assistant    | 1 Box per group                          | `{launcher_type}_{launcher_id}`                          |
+| 5  | Group chat isolated per user   | 1 Box per user within a group            | `{launcher_type}_{launcher_id}_{sender_id}`              |
+| 6  | Teaching (cross-channel)       | 1 Box per student across groups/PMs      | `{sender_id}`                                           |
+| 7  | One-off execution              | 1 Box per message (current behavior)     | `{query_id}`                                            |
+| 8  | Multi-project development      | 1 Box per conversation context           | `{launcher_type}_{launcher_id}_{conversation_id}`        |
+
+No single fixed granularity covers all scenarios. A template-based approach is needed.
+
+---
+
+## 4. Design Overview
+
+Two key changes:
+
+1. **Unified container**: exec, skills, and MCP all share the same container per
+   session scope. No more separate container pools.
+2. **Configurable session scope**: `session_id` is generated from a template with
+   pipeline variables, configurable per pipeline.
+
+### 4.1 Unified Container with Multiple Mounts
+
+A single container per session scope is created on first use. It has:
+
+- **Primary mount**: default workspace at `/workspace` (from `default_host_workspace`)
+- **Skill mounts**: each pipeline-bound skill's `package_root` mounted at
+  `/workspace/.skills/{skill_name}/`
+- **MCP servers**: run as managed processes inside the same container
+
+```
+Container (session_id = "group_123456")
+  /workspace/                          ← default workspace (bind mount, rw)
+  /workspace/.skills/web-search/       ← skill package (bind mount, rw)
+  /workspace/.skills/data-analysis/    ← skill package (bind mount, rw)
+  [managed process: mcp-server-a]      ← MCP server running inside
+  [managed process: mcp-server-b]      ← MCP server running inside
+```
+
+This requires extending `BoxSpec` to support multiple mounts (see §5).
+
+### 4.2 Session ID Template
+
+A new field `box-session-id-template` in the `local-agent` pipeline runner config
+controls the session scope:
+
+```yaml
+# templates/metadata/pipeline/ai.yaml (under local-agent.config)
+- name: box-session-id-template
+  label:
+    en_US: Sandbox Scope
+    zh_Hans: 沙箱作用域
+  description:
+    en_US: >-
+      Determines how sandbox environments are shared. Use variables to
+      control isolation granularity.
+    zh_Hans: >-
+      决定沙箱环境的共享方式。使用变量控制隔离粒度。
+  type: select
+  required: false
+  default: "{launcher_type}_{launcher_id}"
+  options:
+    - value: "{launcher_type}_{launcher_id}"
+      label:
+        en_US: Per chat (Recommended)
+        zh_Hans: 每个会话（推荐）
+    - value: "{launcher_type}_{launcher_id}_{sender_id}"
+      label:
+        en_US: Per user in chat
+        zh_Hans: 会话中每个用户
+    - value: "{launcher_type}_{launcher_id}_{conversation_id}"
+      label:
+        en_US: Per conversation context
+        zh_Hans: 每个对话上下文
+    - value: "{query_id}"
+      label:
+        en_US: Per message (isolated)
+        zh_Hans: 每条消息（完全隔离）
+```
+
+Available template variables (populated by PreProcessor in `query.variables`):
+
+| Variable            | Source                          | Example              |
+|---------------------|---------------------------------|----------------------|
+| `{launcher_type}`   | `query.session.launcher_type`   | `person` / `group`   |
+| `{launcher_id}`     | `query.session.launcher_id`     | `123456`             |
+| `{sender_id}`       | `query.sender_id`               | `789`                |
+| `{conversation_id}` | `conversation.uuid`             | `a1b2c3d4-...`       |
+| `{query_id}`        | `query.query_id`                | `42`                 |
+
+Default `{launcher_type}_{launcher_id}` covers scenarios 1–4 out of the box.
+
+---
+
+## 5. SDK Changes: Multi-Mount BoxSpec
+
+### 5.1 Model Extension
+
+```python
+# box/models.py
+
+class BoxMountSpec(pydantic.BaseModel):
+    """A single bind mount specification."""
+    host_path: str
+    mount_path: str
+    mode: BoxHostMountMode = BoxHostMountMode.READ_WRITE
+
+class BoxSpec(pydantic.BaseModel):
+    # ... existing fields ...
+    host_path: str | None = None              # Primary mount (backward compat)
+    host_path_mode: BoxHostMountMode = BoxHostMountMode.READ_WRITE
+    mount_path: str = DEFAULT_BOX_MOUNT_PATH
+    extra_mounts: list[BoxMountSpec] = []     # NEW: additional mounts
+```
+
+`extra_mounts` is additive — the existing `host_path` / `mount_path` pair remains
+the primary mount for backward compatibility.
+
+### 5.2 Backend: Apply Extra Mounts
+
+```python
+# box/backend.py — CLISandboxBackend.start_session()
+
+# Primary mount (unchanged)
+if spec.host_path is not None and spec.host_path_mode != BoxHostMountMode.NONE:
+    args.extend(['-v', f'{spec.host_path}:{spec.mount_path}:{spec.host_path_mode.value}'])
+
+# Extra mounts (NEW)
+for mount in spec.extra_mounts:
+    if mount.mode != BoxHostMountMode.NONE:
+        args.extend(['-v', f'{mount.host_path}:{mount.mount_path}:{mount.mode.value}'])
+```
+
+Same pattern for nsjail backend.
+
+---
+
+## 6. LangBot Changes
+
+### 6.1 Session ID Resolution
+
+In `BoxService.execute_tool()`:
+
+```python
+# Before:
+spec_payload.setdefault('session_id', str(query.query_id))
+
+# After:
+template = (query.pipeline_config or {}).get('ai', {}) \
+    .get('local-agent', {}).get('box-session-id-template',
+         '{launcher_type}_{launcher_id}')
+variables = query.variables or {}
+session_id = template.format_map(collections.defaultdict(
+    lambda: 'unknown', variables
+))
+spec_payload.setdefault('session_id', session_id)
+```
+
+### 6.2 Skill Exec: Use Same Container
+
+Currently `native.py:_invoke_exec` creates a separate `BoxWorkspaceSession` per
+skill with `host_path=package_root`. Instead:
+
+1. Use the **same session_id** as default exec (from the template).
+2. Pass the skill's `package_root` as an **extra mount** at
+   `/workspace/.skills/{skill_name}/` instead of replacing `/workspace`.
+3. The container already has the default workspace at `/workspace`.
+
+```python
+# native.py — _invoke_exec, skill branch (REVISED)
+
+# Same session_id as default exec
+session_id = resolve_box_session_id(query)
+
+spec_payload = {
+    'cmd': rewritten_command,
+    'workdir': rewritten_workdir,
+    'session_id': session_id,
+    'extra_mounts': [{
+        'host_path': package_root,
+        'mount_path': f'/workspace/.skills/{selected_skill_name}',
+        'mode': 'rw',
+    }],
+}
+result = await self.ap.box_service.execute_spec_payload(spec_payload, query)
+```
+
+The virtual path `/workspace/.skills/{name}` no longer needs rewriting at the
+command level — it maps directly to the bind mount path inside the container.
+
+### 6.3 MCP: Use Same Container
+
+MCP servers should run inside the same container as exec and skills. Changes:
+
+1. `BoxStdioSessionRuntime` uses the pipeline's session_id template instead of
+   `mcp-{server_uuid}`.
+2. MCP server's working directory is a subdirectory (e.g. `/workspace/.mcp/{name}/`).
+3. MCP server's dependencies are mounted or installed into that subdirectory.
+4. The MCP server runs as a managed process inside the shared container.
+
+Since MCP servers start at LangBot boot (not per-query), the session must be
+created eagerly. The container will be kept alive by the managed process
+exemption in TTL reaping (`runtime.py:259`).
+
+**Note**: MCP sessions are pipeline-scoped (not per-launcher), so their session_id
+should be a **fixed identifier per pipeline** rather than the user-facing template.
+This means one shared MCP container per pipeline, with user exec sessions separate.
+
+Alternatively, in a future iteration, MCP managed processes could be launched
+lazily into the user's container on first MCP tool call. This is more complex
+but maximizes sharing. For V1, keeping MCP containers at pipeline scope is
+simpler and more predictable.
+
+---
+
+## 7. Mount Layout Summary
+
+### Default exec (no skills activated)
+
+```
+Container (session_id from template)
+  /workspace/          ← default_host_workspace (rw)
+```
+
+### Exec with activated skills
+
+```
+Container (same session_id)
+  /workspace/                          ← default_host_workspace (rw)
+  /workspace/.skills/web-search/       ← skill package_root (rw)
+  /workspace/.skills/data-analysis/    ← skill package_root (rw)
+```
+
+Extra mounts are **additive** — they are added when the container is first
+created (or on the first exec that references a skill). Since Docker bind
+mounts are specified at container creation time, skills must be known at
+creation time.
+
+**Resolution**: When creating a container, inject `extra_mounts` for **all
+pipeline-bound skills** (from `extensions_preferences`), not just the
+currently activated one. This way any skill can be activated later without
+recreating the container.
+
+### MCP servers (V1: pipeline-scoped)
+
+```
+Container (session_id = "mcp-pipeline-{pipeline_uuid}")
+  /workspace/                    ← MCP shared workspace
+  /workspace/.mcp/server-a/      ← MCP server A files
+  /workspace/.mcp/server-b/      ← MCP server B files
+  [managed process: server-a]
+  [managed process: server-b]
+```
+
+---
+
+## 8. Data Migration
+
+Existing pipelines do not have `box-session-id-template`. The backend uses
+`.get(..., default)` so missing keys fall back to `{launcher_type}_{launcher_id}`.
+This changes behavior from per-message to per-launcher for existing pipelines.
+
+Recommendation: **accept the behavior change** — per-launcher is the more
+intuitive default, and the old per-message behavior was rarely desired.
+
+---
+
+## 9. Cloud Quota Implications
+
+| Scope                                         | Typical concurrent containers |
+|-----------------------------------------------|-------------------------------|
+| `{query_id}` (per message)                    | Many, short-lived             |
+| `{launcher_type}_{launcher_id}` (per chat)    | = active chat count           |
+| `{sender_id}` (per user)                      | = active user count           |
+| `{conversation_id}` (per conversation)        | Between per-chat and per-msg  |
+
+With the unified container model, each scope value maps to exactly **one**
+container (instead of potentially 3+ per-message). This significantly reduces
+resource usage.
+
+Quota enforcement point: `BoxRuntime._get_or_create_session()` in the SDK.
+
+---
+
+## 10. Implementation Phases
+
+### Phase 1: Session scope + skill unification (this PR)
+
+1. **SDK**: Extend `BoxSpec` with `extra_mounts: list[BoxMountSpec]`.
+2. **SDK**: Update Docker/nsjail backends to apply extra mounts.
+3. **LangBot**: Add `box-session-id-template` to `local-agent` YAML metadata
+   and default pipeline config JSON.
+4. **LangBot**: Update `BoxService.execute_tool()` to use template interpolation.
+5. **LangBot**: Update `native.py:_invoke_exec` skill branch to use same
+   session_id + extra mounts instead of separate `BoxWorkspaceSession`.
+6. **LangBot**: On container creation, inject extra mounts for all
+   pipeline-bound skills.
+7. **Frontend**: No code change — `DynamicFormComponent` renders `select` fields.
+8. **Tests**: Unit tests for template interpolation and multi-mount specs.
+
+### Phase 2: MCP unification (future)
+
+1. Refactor `BoxStdioSessionRuntime` to use pipeline-scoped shared container.
+2. MCP servers become managed processes in the shared container.
+3. Support multiple concurrent managed processes per container.
+
+MCP unification is deferred because it requires changes to the managed process
+model (currently 1 managed process per session) and has startup ordering
+concerns (MCP servers start at boot, before any user query determines
+a session_id).
@@ -0,0 +1,122 @@
+# Box 系统测试覆盖分析
+
+> 更新日期: 2026-06-02
+> 状态更新: 自部署社区版已具备发布条件（box 可选、降级完善、无迁移欠债）；工具调用循环上限、配额遍历异步化、`host_path` 挂载白名单等已落地。剩余多租户 / 安全硬化项见 [SaaS 阻塞项清单](./box-issues.md)。
+> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+
+---
+
+## 1. 测试文件清单
+
+### LangBot 仓库
+
+| 文件 | 行数 | CI 运行 | 覆盖范围 |
+|------|------|---------|---------|
+| `tests/unit_tests/box/test_box_connector.py` | 106 | 是 | Connector 传输决策、WS relay URL、dispose、心跳/重连 |
+| `tests/unit_tests/box/test_box_service.py` | 1224 | 是 | Service 核心逻辑（最全面） |
+| `tests/unit_tests/box/test_workspace.py` | 147 | 是 | WorkspaceSession 路径重写、payload 构建 |
+| `tests/unit_tests/provider/test_mcp_box_integration.py` | 707 | 是 | MCP Box 配置、路径重写、payload、shared-session/multi-process、runtime info |
+| `tests/unit_tests/provider/test_localagent_sandbox_exec.py` | 444 | 是 | LocalAgent exec 流程、流式、Skill 激活 (Tool Call) |
+| `tests/unit_tests/provider/test_tool_manager_native.py` | 249 | 是 | ToolManager 路由、native tool CRUD、路径穿越、6 工具暴露 |
+| `tests/unit_tests/provider/test_skill_tools.py` | 582 | 是 | Skill 管理、Tool Call 激活、路径、authoring CRUD |
+| `tests/unit_tests/test_skill_service.py` | 396 | 是 | HTTP service：skill CRUD、zip/GitHub install、文件浏览 |
+| `tests/unit_tests/test_paths.py` | 23 | 是 | paths 工具 |
+| `tests/unit_tests/test_preproc.py` | 134 | 是 | PreProcessor 注入 session 变量、bound skill 解析 |
+| `tests/unit_tests/pipeline/test_chat_handler_logging.py` | 78 | 是 | Chat handler 日志相关回归 |
+| `tests/integration_tests/box/test_box_integration.py` | 329 | **否** | 真实容器执行、超时、网络隔离 |
+| `tests/integration_tests/box/test_box_mcp_integration.py` | 368 | **否** | Managed process、WS attach、shared-session 清理 |
+
+### SDK 仓库
+
+| 文件 | 行数 | CI 运行 | 覆盖范围 |
+|------|------|---------|---------|
+| `tests/box/test_backend_selection.py` | 255 | 是 | 显式 backend / local 模式探测顺序 / 配置变更触发 reselect |
+| `tests/box/test_nsjail_backend.py` | 452 | 是 | nsjail 可用性、安装版 CLI vs 容器内 CLI、session、arg 构建、资源限制 |
+| `tests/box/test_e2b_backend.py` | 482 | 是 | E2B SDK mock、session 生命周期、extra_mounts 同步 |
+| `tests/box/test_skill_store.py` | 88 | 是 | zip preview/install、基础 file CRUD |
+
+**总计**: 17 个测试文件, ~6,500 行测试代码; 其中 2 个集成测试（约 700 行）在 CI 中不运行。
+
+> 较 2026-04-16 版增加：`test_skill_service.py`、`test_paths.py`、`test_preproc.py`、`test_chat_handler_logging.py` (LangBot)，`test_backend_selection.py`、`test_e2b_backend.py`、`test_skill_store.py` (SDK)。`test_nsjail_backend.py` 增加 CLI 兼容性 case (commit `feed530`)。
+
+---
+
+## 2. 覆盖良好的区域
+
+| 区域 | 质量 | 说明 |
+|------|------|------|
+| BoxRuntime session 管理 | 优秀 | session 复用、冲突检测、TTL 配置、消失 session 重建 |
+| BoxService Profile 系统 | 优秀 | 4 个内置 Profile、locked/unlocked 字段、timeout clamp |
+| BoxService host mount 安全 | 优秀 | allowed_mount_roots、disallowed_roots、shared host root |
+| BoxService workspace quota | 优秀 | 前置/后置配额检查、超额清理 |
+| BoxService 输出截断 | 优秀 | 短/精确边界/长输出、独立 stderr |
+| BoxService 可观测性 | 优秀 | 状态报告、error ring buffer、buffer 上限 |
+| BoxService session 模板 | 良好 | `resolve_box_session_id` + `build_skill_extra_mounts` 在 service / native / mcp 三处都有覆盖 |
+| RPC client/server 协议 | 优秀 | execute/get_sessions/delete/create/conflict error |
+| BoxRuntimeConnector | 良好 | local/remote 模式、Docker 平台、relay URL、心跳与重连回调 |
+| BoxWorkspaceSession | 良好 | payload 构建、managed process 路径重写、stage host file |
+| BoxHostMountMode.NONE | 良好 | 枚举校验、workdir 约束 |
+| NsjailBackend | 良好 | 可用性、安装版 vs 容器内、session 生命周期、arg 构建、资源限制 |
+| E2BBackend | 良好 | mock SDK、session/extra_mounts 同步 |
+| Backend selection | 良好 | 显式 backend 优先级、local 探测顺序、配置变更触发 reselect |
+| MCP Box 集成 | 良好 | config model、路径重写、payload、shared-session 多 process |
+| Native tool loader | 良好 | 6 工具（exec/read/write/edit/glob/grep）、路径穿越拦截 |
+| LocalAgent exec 流程 | 良好 | 完整 tool call 循环、流式、system prompt 注入、Tool Call 激活 |
+| Skill 系统 | 良好 | 加载、Tool Call 激活、marker、路径解析、authoring CRUD、HTTP service |
+
+---
+
+## 3. 覆盖缺失的区域
+
+### 3.1 零测试 / 严重不足
+
+| 区域 | 源文件 | 影响 |
+|------|--------|------|
+| **`security.py`** | SDK `box/security.py` (52 行) | `validate_sandbox_security()` 无任何测试。阻止 `/etc`/`/proc`/Docker socket 等危险挂载的安全函数从未被验证 |
+| **`policy.py`** | `pkg/box/policy.py` (98 行) | 三层安全策略无测试（也是死代码） |
+| **`skill_store.py` 边缘场景** | SDK `box/skill_store.py` (647 行) vs 测试 88 行 | GitHub 安装路径、`source_subdir` / `target_suffix` 组合、损坏 zip、文件冲突等场景未覆盖 |
+
+### 3.2 未测试的关键路径
+
+| 区域 | 说明 |
+|------|------|
+| **Session TTL 过期** | 测试配置了 `session_ttl_sec` 但从未推进时间验证过期清理 |
+| **并发 session 访问** | 无并发 exec / 并发创建 / race condition 测试 |
+| **Container backend (Docker)** | 仅通过集成测试覆盖（CI 不运行），单元测试全用 FakeBackend |
+| **E2B 真实 sandbox** | 单测全是 mock，未对接真实 E2B API |
+| **BoxRuntime shutdown()** | 在 test cleanup 中调用但未验证行为 |
+| **BoxServerHandler 错误路径** | 畸形请求、未知 action 类型 |
+| **WS relay** | 仅在集成测试中覆盖（CI 不运行） |
+| **NsjailBackend managed process** | 完全未测试 |
+| **MCP stdio 完整生命周期** | 依赖安装 → 进程启动 → 健康检查 → 多 process 并发 → 重试 |
+| **BoxService start/stop_managed_process** | 单 process 流转有单测，多 process 互不阻塞主要靠集成测试 |
+| **重连指数退避** | connector 单测覆盖回调接线，未实际跑完整重连周期 |
+
+### 3.3 边缘情况缺失
+
+| 区域 | 说明 |
+|------|------|
+| BoxSpec 校验 | 无效 session_id 格式、超长命令、env 特殊字符 |
+| BoxSpec.extra_mounts | 重复 mount_path、与 host_path 冲突、绝对 vs 相对路径 |
+| BoxExecutionResult | 仅 COMPLETED 和 TIMED_OUT，无 ERROR 状态测试 |
+| 多后端 fallback | local 模式探测顺序仅靠 mock，无真实 Docker 不可用 → nsjail 真机 fallback 测试 |
+| Profile YAML 加载 | 测试用硬编码字符串，未从真实 config.yaml 加载 |
+| INIT 配置变更触发 backend 重建 | 单测仅在初始化场景验证 |
+
+---
+
+## 4. 集成测试 vs CI 的差距
+
+CI 仅运行 `tests/unit_tests/`，以下场景**从未在自动化中验证**:
+
+- 真实容器的创建/执行/销毁
+- 容器网络隔离（`--network none`）
+- 容器资源限制生效（cpus/memory/pids_limit）
+- Managed process 的 WS 双向 I/O
+- 多 process 同 session 并发 I/O
+- 孤儿容器清理
+- Session 删除清理容器
+- 进程退出检测
+- E2B 真实 sandbox 行为
+
+**建议**: 在 CI 中加一个可选的 Docker-in-Docker 集成测试 stage，至少覆盖核心执行路径（exec / MCP attach / session 销毁）。
@@ -0,0 +1,167 @@
+# Box 系统 toB 商业化分析
+
+> 更新日期: 2026-06-02
+> 状态更新: 自部署社区版已具备发布条件（box 可选、降级完善、无迁移欠债）；工具调用循环上限、配额遍历异步化、`host_path` 挂载白名单等已落地。剩余多租户 / 安全硬化项见 [SaaS 阻塞项清单](./box-issues.md)。
+> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+
+---
+
+## 1. 现有优势
+
+| 能力 | toB 价值 | 代码位置 |
+|------|---------|---------|
+| **沙箱隔离执行** | 企业安全运行不受信代码的基础能力 | SDK `box/backend.py` |
+| **多后端支持** | 适配不同企业容器基础设施 (Podman/Docker/nsjail/E2B) | SDK `box/runtime.py` `_select_backend()` |
+| **E2B 云沙箱** | SaaS / 无 Docker 部署的兜底执行环境 | SDK `box/e2b_backend.py` |
+| **连接自愈** | 心跳 + 自动重连，单点 Box runtime 故障可恢复 | `pkg/box/connector.py` `_heartbeat_loop`, `pkg/box/service.py` `_reconnect_loop` |
+| **Profile + locked 字段** | 运维锁定安全边界，LLM/用户无法绕过 | `pkg/box/service.py`, SDK `box/models.py` |
+| **资源限制** | CPU/内存/PID 数限制防止资源滥用 | SDK `backend.py` `--cpus/--memory/--pids-limit` |
+| **Workspace quota** | 磁盘用量控制 | `pkg/box/service.py` `_enforce_workspace_quota` |
+| **静默降级** | Box 不可用不影响其他功能，降低部署门槛 | `pkg/box/service.py:78` `_available=False` |
+| **孤儿容器清理** | 防止泄漏的容器持续占用资源 | SDK `backend.py` `cleanup_orphaned_containers` |
+| **网络隔离** | `--network none` 防止数据外泄 | SDK `backend.py` start_session |
+| **只读根文件系统** | `--read-only` 防止容器被持久篡改 | SDK `backend.py` start_session |
+| **Host path 白名单** | `allowed_host_mount_roots` 限制可挂载目录 | `pkg/box/service.py` `_validate_host_mount` |
+
+---
+
+## 2. toB 差距分析
+
+### 2.1 安全与合规
+
+| 维度 | 现状 | toB 要求 | 优先级 |
+|------|------|---------|--------|
+| **WS relay 认证** | 无认证，任何人可 attach | 至少 token 认证 | **P0** |
+| **安全策略** | policy.py 是死代码，实际无细粒度控制 | 工具级 allow/deny、沙箱模式控制 | **P0** |
+| **审计日志** | 仅内存中 50 条 `_recent_errors` | 持久化审计：谁何时执行了什么、结果如何 | **P0** |
+| **Host path 校验** | 黑名单策略，`/` 未拦截 | 白名单策略，默认拒绝 | **P1** |
+| **数据驻留** | 无控制 | GDPR / 等保要求的数据隔离 | **P2** |
+
+### 2.2 多租户
+
+| 维度 | 现状 | toB 要求 | 优先级 |
+|------|------|---------|--------|
+| **租户隔离** | 无租户概念 | BoxSpec/Profile 绑定 tenant_id | **P0** |
+| **RBAC** | 仅 token 认证 | admin/operator/viewer 角色权限 | **P0** |
+| **资源配额** | 单一 workspace quota | 每租户 CPU 时间/内存/并发/执行次数配额 | **P1** |
+| **Session 隔离** | 所有 session 共享 dict | 按租户分区，互不可见 | **P1** |
+
+### 2.3 可靠性
+
+| 维度 | 现状 | toB 要求 | 优先级 |
+|------|------|---------|--------|
+| **连接恢复** | 已实现：20s 心跳 + `_reconnect_loop` 指数退避 | 已满足基本要求 | 已有 |
+| **Session 清理** | 机会性（仅新建时触发） | 定时清理 + 独立 reaper | **P1** |
+| **水平扩展** | 单 Box Runtime 实例 | 多实例负载均衡（按 tenant 路由） | **P1** |
+| **优雅降级** | 已有（_available=False） | 已满足基本要求 | 已有 |
+| **Backend 自愈** | 已实现：`get_status` 时若 backend 不可用会重新选择 | 已满足基本要求 | 已有 |
+
+### 2.4 可观测性
+
+| 维度 | 现状 | toB 要求 | 优先级 |
+|------|------|---------|--------|
+| **监控指标** | 无 Prometheus metrics | session 数/执行延迟/资源用量/错误率 | **P1** |
+| **结构化日志** | Python logging, 无结构化 | JSON 格式日志，含 trace_id/tenant_id | **P1** |
+| **前端面板** | 监控页接入 `/api/v1/box/status`（backend 名 + 活跃 session 数）；`sessions` / `errors` 仍未接入 | 完整状态面板 + 历史错误/审计列表 | **P2** |
+
+---
+
+## 3. SaaS 部署架构建议
+
+### 3.1 方案 A: 共享 Box Runtime Pool (快速上线)
+
+```
+LangBot Instance ──> Box Runtime (共享)
+                       ├─ tenant_id 标签隔离
+                       ├─ Redis 配额计数器
+                       └─ Container labels: langbot.tenant_id=xxx
+```
+
+- **优点**: 改动最小，加 tenant_id 到 BoxSpec/labels 即可
+- **缺点**: 容器引擎共享，安全隔离弱
+
+### 3.2 方案 B: 每租户 K8s Namespace + gVisor (推荐中期)
+
+```
+LangBot ──> K8s API
+              ├─ namespace: tenant-xxx
+              │    ├─ RuntimeClass: gVisor (runsc)
+              │    ├─ ResourceQuota
+              │    └─ NetworkPolicy
+              └─ namespace: tenant-yyy
+                   └─ ...
+```
+
+- **优点**: 强隔离（namespace + gVisor），原生 K8s 配额
+- **缺点**: 需要重写 backend 为 K8s Job，部署复杂度高
+
+### 3.3 方案 C: K8s Job 直接编排 (长期)
+
+```
+LangBot ──> K8s Job per execution
+              ├─ 每次执行创建 Job
+              ├─ Pod Security Standards
+              ├─ 自动调度和资源分配
+              └─ Job TTL Controller 自动清理
+```
+
+- **优点**: 最强隔离，天然水平扩展
+- **缺点**: 冷启动延迟，架构重写
+
+**推荐演进路径**: A → B → C
+
+---
+
+## 4. 配额体系建议
+
+### 三层配额
+
+| 层 | 实现 | 作用 |
+|----|------|------|
+| **内核层** | Docker `--cpus`/`--memory`/`--storage-opt` | 硬性资源上限，不可绕过 |
+| **应用层** | Redis 原子计数器 | 并发 session 数/执行次数/CPU 时间预算 |
+| **计费层** | 月度聚合 | 按租户计费（session-hours/execution-count） |
+
+### Profile 与套餐映射
+
+| 套餐 | Profile | locked 字段 | 配额 |
+|------|---------|------------|------|
+| Free | `offline_readonly` | network, host_path_mode, rootfs | 10 exec/天, 0.5 CPU, 256MB |
+| Pro | `default` | (无) | 100 exec/天, 1 CPU, 512MB |
+| Enterprise | `network_extended` | (按需) | 无限, 2 CPU, 1GB, 自定义镜像 |
+
+### TOCTOU 配额修复
+
+当前 `_enforce_workspace_quota` 的 TOCTOU 问题可通过两种方式解决:
+
+1. **预留式配额** (应用层): Redis `INCRBY` 预扣额度 → 执行 → 成功则扣减，失败则回滚
+2. **内核级限制** (Docker): `--storage-opt size=500m` 直接限制容器可写层大小
+
+---
+
+## 5. 优先实施路线
+
+### Phase 1 (2-4 周): 安全基线
+
+- [ ] WS relay 加 token 认证
+- [ ] 接入或删除 policy.py
+- [x] ~~Box 加重连和心跳~~（已完成，见 [box-issues.md 已解决](./box-issues.md)）
+- [ ] 审计日志持久化（至少写文件/数据库）
+- [ ] `security.py` 加 `/` 拦截，考虑白名单
+- [ ] INIT 与 backend 初始化顺序整理（避免 backend 在配置到达前实例化）
+
+### Phase 2 (4-8 周): 多租户基础
+
+- [ ] BoxSpec 加 `tenant_id` 字段
+- [ ] 容器 labels 加 tenant 标识
+- [ ] Redis 配额计数器（并发/执行次数/时间）
+- [ ] RBAC 基础框架
+- [ ] 定时 session reaper
+
+### Phase 3 (8-16 周): 生产就绪
+
+- [ ] Prometheus metrics exporter
+- [ ] 前端 Box 状态面板
+- [ ] K8s backend 支持 (方案 B)
+- [ ] 结构化日志 (JSON, trace_id)
+- [ ] 水平扩展支持
@@ -0,0 +1,222 @@
+# Box Runtime vs Plugin Runtime: 连接架构对比
+
+> 更新日期: 2026-06-02
+> 状态更新: 自部署社区版已具备发布条件（box 可选、降级完善、无迁移欠债）；工具调用循环上限、配额遍历异步化、`host_path` 挂载白名单等已落地。剩余多租户 / 安全硬化项见 [SaaS 阻塞项清单](./box-issues.md)。
+> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
+
+---
+
+## 1. 总体差异
+
+| 维度 | Plugin Runtime | Box Runtime |
+|------|---------------|-------------|
+| **继承关系** | `PluginRuntimeConnector(ManagedRuntimeConnector)` | `BoxRuntimeConnector`（独立类） |
+| **传输分支** | 3 条 (Docker/WS, Win32/subprocess+WS, Unix/stdio) | 3 条 (本地 stdio, Win32/subprocess+WS, 远程 WS) |
+| **心跳** | 20s ping loop | 20s ping loop（`_heartbeat_loop`） |
+| **重连** | WS 模式: sleep 3s → re-initialize | 由 BoxService `_reconnect_loop` 处理，指数退避 |
+| **Handler 类型** | `RuntimeConnectionHandler` (1132 行, 25+ action) | 基础 `Handler` + `BoxServerHandler`（SDK 端 25 action） |
+| **Client 抽象** | Handler 即 API | 独立 `ActionRPCBoxClient` 封装 Handler |
+| **启用/禁用** | `is_enable_plugin` 开关 | 无开关（可用/不可用由初始化结果决定） |
+| **初始化失败** | 异常上抛 | 静默降级 `_available=False` |
+| **Shutdown** | 直接杀进程 | RPC SHUTDOWN → 清理容器 → 再杀进程 |
+
+---
+
+## 2. 传输决策
+
+### Plugin: 3-路决策
+
+```python
+# pkg/plugin/connector.py:106-165
+if get_platform() == 'docker' or use_websocket_to_connect_plugin_runtime():
+    # Docker/WS → ws://langbot_plugin_runtime:5400/control/ws
+elif get_platform() == 'win32':
+    # Windows → 起子进程(无 pipe) + ws://localhost:5400/control/ws
+else:
+    # Unix/Mac → StdioClientController(python -m langbot_plugin.cli rt -s)
+```
+
+### Box: 3-路决策
+
+```python
+# pkg/box/connector.py
+if self._uses_websocket():
+    if platform.get_platform() == 'win32' and not self.configured_runtime_url:
+        await self._start_subprocess_then_ws()  # subprocess + ws://localhost:5410/rpc/ws
+    else:
+        await self._connect_remote_ws()         # ws://{host}:5410/rpc/ws
+else:
+    await self._start_local_stdio()             # StdioClientController
+```
+
+> 历史：2026-04-16 版本本文档曾把 Box 描述为 2 路决策（缺 Windows 分支）。现已对齐 Plugin 的 3 路设计。
+
+### 决策矩阵
+
+| 环境 | Plugin | Box |
+|------|--------|-----|
+| Docker | WS → `:5400` | WS → `:5410/rpc/ws` |
+| `--standalone-box` | N/A | WS → `localhost:5410/rpc/ws` |
+| Windows 非 Docker | subprocess + WS (`:5400`) | subprocess + WS (`localhost:5410/rpc/ws`) |
+| Unix/Mac 非 Docker | stdio | stdio |
+| 手动配置 URL | 通过配置项 | WS → 用户配置的 URL |
+
+---
+
+## 3. 连接建立
+
+### 同步模式差异
+
+**Plugin**: `new_connection_callback` 内直接 ping + await handler_task，`initialize()` 通过 `create_task()` 异步启动，不阻塞等待连接。
+
+**Box**: 使用 `asyncio.Event` + `wait_for(timeout=30s)` 模式，`initialize()` 同步等待连接成功或超时。
+
+### Box stdio 路径
+
+```
+connector._start_local_stdio()
+  ├─ connected = asyncio.Event()
+  ├─ ctrl = StdioClientController(python, ['-m', 'langbot_plugin.cli.__init__', 'box', '-s', '--ws-control-port', N])
+  ├─ _ctrl_task = create_task(ctrl.run(callback))
+  │    callback:
+  │      handler = Handler(connection)          ← 基础 Handler, 无 disconnect_callback
+  │      client.set_handler(handler)
+  │      _handler_task = create_task(handler.run())
+  │      call_action(PING, {})                  ← 握手, timeout=15s
+  │      connected.set()                        ← 通知外层
+  │      await _handler_task                    ← 阻塞直到断开
+  └─ await wait_for(connected.wait(), 30s)      ← 同步等待
+```
+
+### Plugin stdio 路径
+
+```
+connector.initialize()
+  ├─ ctrl = StdioClientController(python, ['-m', 'langbot_plugin.cli', 'rt', '-s'])
+  ├─ task = ctrl.run(callback)
+  │    callback:
+  │      disconnect_callback:
+  │        [WS] → runtime_disconnect_callback → 重连
+  │        [stdio] → 仅日志, 不重连
+  │      handler = RuntimeConnectionHandler(conn, disconnect_cb, ap)
+  │      create_task(handler.run())
+  │      handler.ping()                         ← 握手, timeout=10s
+  │      await handler_task                     ← 阻塞直到断开
+  ├─ create_task(heartbeat_loop())              ← 20s ping loop
+  └─ create_task(task)                          ← 不等待连接
+```
+
+---
+
+## 4. 心跳与重连
+
+### 心跳
+
+| 维度 | Plugin | Box |
+|------|--------|-----|
+| 有心跳? | 是 | 是（`connector.py` `_heartbeat_loop`） |
+| 间隔 | 20s | 20s |
+| 失败处理 | 仅 DEBUG 日志，不触发重连 | 仅 DEBUG 日志，依赖 connection close 触发重连 |
+| 生命周期 | 整个应用生命周期 | 连接建立后启动；`dispose()` 时 cancel |
+
+### 重连
+
+| 维度 | Plugin | Box |
+|------|--------|-----|
+| Docker/WS 断开 | `runtime_disconnect_callback` → sleep 3s → re-initialize | `runtime_disconnect_callback` → `BoxService._reconnect_loop()`（指数退避） |
+| WS 连接失败 | 同上 | 同上；初次失败时 `_available=False`，重连成功后恢复 |
+| stdio 断开 | 仅日志，不重连 | 接同样回调；stdio 重连需重新 fork 子进程 |
+| 重连退避 | 固定 3s，无 backoff | 指数退避 |
+
+> 历史：2026-04-16 版本本文档曾把心跳与重连标记为 Box 缺失。这两项已在 commit `2dfd9d5d` / `c6882cf` / `5029d9c` 等修复（详见 [box-issues.md 已解决](./box-issues.md)）。
+
+---
+
+## 5. 共享 IO 层
+
+两者复用同一套 SDK IO 基础设施：
+
+```
+Handler ← ABC                              (runtime/io/handler.py)
+  ├── RuntimeConnectionHandler              (Plugin 用, LangBot 侧)
+  ├── ControlConnectionHandler              (Plugin 用, SDK 侧)
+  ├── BoxServerHandler                      (Box 用, SDK 侧)
+  └── 匿名 Handler 实例                     (Box 用, LangBot 侧)
+
+Connection ← ABC
+  ├── StdioConnection    (stdio: 16KB chunks, 应用层分帧协议)
+  └── WebSocketConnection (WS: 64KB chunks, 原生 WS 分帧)
+
+Controller ← ABC
+  ├── StdioClientController    (fork 子进程, pipe stdin/stdout)
+  ├── StdioServerController    (接管当前进程 stdin/stdout)
+  ├── WebSocketClientController (连接 WS 服务端)
+  └── WebSocketServerController (监听 WS 端口)
+```
+
+共享的核心机制：
+- `call_action()` / `call_action_generator()` — RPC 调用/流式调用
+- `ActionRequest` / `ActionResponse` — 请求/响应协议
+- `seq_id` 关联 — 并发请求复用单连接
+- `CommonAction.PING` — 两者都用于初始握手
+- 文件传输 (`send_file`) — Plugin 用，Box 不用
+
+---
+
+## 6. 端口方案
+
+| 服务 | Plugin | Box |
+|------|--------|-----|
+| Action RPC (stdio) | stdin/stdout | stdin/stdout |
+| Action RPC (WS) | `:5400` | `:5410/rpc/ws` |
+| 辅助服务 | debug WS `:5401` | managed process WS relay `:5410/v1/sessions/{id}/managed-process/ws` |
+
+**Box 特点**: 单端口 aiohttp 服务（默认 5410），通过路径区分 Action RPC 和 managed process relay。即使在 stdio 模式，也在 `:5410` 启动 aiohttp 用于 managed process attach。Plugin 在 stdio 模式不开额外端口。
+
+---
+
+## 7. 销毁对比
+
+### Plugin
+
+```python
+dispose():
+  if stdio: ctrl.process.terminate()
+  _dispose_subprocess()         # Windows 子进程
+  heartbeat_task.cancel()
+```
+
+### Box
+
+```python
+connector.dispose():
+  _handler_task.cancel()
+  _ctrl_task.cancel()
+  _subprocess.terminate()
+
+service.dispose():
+  connector.dispose()
+  loop.create_task(client.shutdown())   # RPC SHUTDOWN → 清理所有容器
+```
+
+Box 的 RPC SHUTDOWN 确保容器被正确停止，不会成为孤儿。Plugin 直接杀进程。
+
+---
+
+## 8. 改进建议
+
+### P0
+
+1. **两者都加 WS 认证**: 至少 token 认证（INIT 时下发，连接时校验）
+
+### P1
+
+2. **考虑 Box 继承 ManagedRuntimeConnector**: 复用 `_start_runtime_subprocess` / `_wait_until_ready` / `_dispose_subprocess`，减少重复代码
+3. **Plugin 重连加退避**: 固定 3s 无 backoff 可能造成日志洪水，建议向 Box 的指数退避看齐
+4. **统一连接管理模式**: Event-based (Box) vs direct-await (Plugin)，考虑收敛为一种
+
+### 已完成（自上一轮）
+
+- ~~Box 加重连~~（commit `2dfd9d5d`）
+- ~~Box 加心跳~~（20s loop 与 Plugin 一致）
+- ~~Box 加 Windows 支持~~（commit `120817a` / `fafb7a4`）