LangBot

mirror of https://github.com/langbot-app/LangBot.git synced 2026-07-19 02:46:09 +00:00

Author	SHA1	Message	Date
huanghuoguoguo	96b041846d	Feat/sandbox (#2072 ) * feat: add mcp and skills * feat: add filter * feat: modify frontend * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add test * fix: fix box intergration test * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * fix: ruff * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths fix: ruff * feat: enhance sandbox api * refactor(box): derive paths from shared host root * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * feat(box): unify native agent tools around exec/read/write/edit * chore(sandbox): move MCP loader changes to follow-up branch * feat(box): add session workspace quota enforcement and SDK quota metadata * feat(skills): add Agent Skills management system (#1917) * feat(skills): add Agent Skills management system Implement comprehensive skills management feature inspired by agentskills spec: Backend: - Add Skill and SkillPipelineBinding database entities - Add database migration (dbm018) for skills tables - Implement SkillManager for skill loading, matching, and resolution - Implement SkillService for CRUD operations - Add skills API endpoints for skill and pipeline binding management - Integrate skill index injection into pipeline preprocessor - Add skill activation detection in LocalAgentRunner Frontend: - Add Skills page with listing, search, and type filter - Add SkillDetailDialog for create/edit with preview - Add SkillCard and SkillForm components - Add skills API methods to BackendClient - Add skills entry to sidebar navigation - Add i18n translations (en-US, zh-Hans) Features: - Support skill and workflow types - Sub-skill composition via {{INVOKE_SKILL: name}} syntax - Progressive disclosure (index in prompt, full instructions on activation) - Pipeline-specific skill bindings with priority * fix: resolve cherry-pick conflicts for agentskills onto sandbox - Remove non-existent external_kb service import - Add skill_mgr mock to localagent sandbox_exec tests - Keep database version at 24 (sandbox branch's latest) * feat(skills): upgrade to package-backed skills with sandbox execution Evolve the skills system from pure prompt-based to package-backed with sandbox tool execution support: - Add source_type/package_root/entry_file/skill_tools fields to Skill entity - SkillManager loads SKILL.md from local package directories - SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped) - LocalAgent injects skill tools into use_funcs on skill activation - BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params) - Skill tool names auto-namespaced as skill__{skill}__{tool} - API validation for package_root allowlist and entry path traversal - Frontend source_type toggle, package_root input, skill_tools editor - Migration renumbered to 025 with ALTER TABLE fallback for existing DBs - Fix unclosed limitation section in i18n files - Fix skills API methods misplaced outside BackendClient class * fix: test info * feat(skills): switch skills to package-backed storage and add import tooling - skills 从 inline/package 双轨收敛成 package-first - instructions 改为写入并读取 SKILL.md - 新增本地目录扫描和 GitHub 安装 skill - 前端把 skills 整合进 plugins 页，新增 SkillsComponent 和 GitHub 导入弹窗 - skill form 去掉 source_type / type 筛选，改成目录扫描驱动 - Box skill tool 挂载模式从 ro 改成 rw - 测试和中英文文案同步更新 * feat: simplify langbot skill create and import * refactor(skills): clean up legacy skill API and harden activation flow * refactor(skills): remove skill dependency expansion and add skill_get * fix: lint * fix: delete * fix(skills): align tool manager loader initialization * refactor: remove sandbox execute skill * fix(skills): hide activation markers and isolate skill activation flow * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): switch skill model to filesystem-backed packages * refactor(skills): unify runtime skill access around filesystem paths * refactor(skills): unify runtime skill access around filesystem paths * feat(skills): align rw package design and fix skill activation, visibility, and lint issues * refactor(skills): replace rich authoring API with import/reload flow and update Box design doc * feat(box): add sandbox_exec tool loop for local-agent calculations * feat(box): add host workspace mounting and sandbox_exec guidance * feat(box): add BoxProfile with resource limits and improved output truncation - Implement head+tail output truncation (60/40 split) so LLM sees both beginning and final results; add streaming byte-limited reads in backend to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB) - Define BoxProfile model with locked fields and max_timeout_sec clamping - Add four built-in profiles: default, offline_readonly, network_basic, network_extended with differentiated resource and security constraints - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit, read_only_rootfs) and pass corresponding container CLI flags (--cpus, --memory, --pids-limit, --read-only, --tmpfs) - Profile loaded from config (box.profile), applied in service layer before BoxSpec validation; locked fields cannot be overridden by tool-call parameters * feat(box): add obs * refactor(box): unify box service lifecycle and local runtime management * refactor(box): remove legacy in-process runtime code and clean up smells After the architecture settled on always using an independent Box Runtime service, several pieces of compatibility code and design shortcuts were left behind. This commit cleans them up: - Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from production code (moved to test-only helper). - Remove unused `_clip_bytes` method from backend. - Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd` default to empty and validating non-empty only in `runtime.execute()`. - Extract `get_box_config()` helper to eliminate 5× duplicated config access boilerplate. - Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing tool schema to enforce request-scoped session isolation. - Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls `box_service.shutdown()` (handled by `Application.dispose()`). - Simplify `_assert_session_compatible` with a loop. - Inline client creation in `BoxRuntimeConnector`. - Remove redundant `BOX__RUNTIME_URL` env var from docker-compose (auto-detected by code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security ## Summary When Podman/Docker is available, all stdio-mode MCP servers now automatically run inside Box containers with dependency installation, path rewriting, and lifecycle management. When no container runtime exists, LangBot starts normally and stdio MCP falls back to host-direct execution. ## What changed ### MCP stdio → Box integration (mcp.py) - Add `MCPServerBoxConfig` pydantic model for structured box configuration with validation and defaults (network, host_path_mode, timeouts, resources) - Auto-infer `host_path` from command/args with venv detection: recognizes `.venv/bin/python` patterns and walks up to the project root - Rewrite host paths to container `/workspace` paths transparently - Replace venv python commands with container-native `python` - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run `pip install` inside the container before starting the MCP server - Copy project to `/tmp` before install to handle read-only mounts - Add retry with exponential backoff (3 retries, 2s/4s/8s delays) - Add Box managed process health monitoring (poll every 5s) - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally` block of `_lifecycle_loop`, covering all exit paths - Fix retry logic: `_ready_event` is only set after all retries exhaust or on success, not on first failure - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled` ### Box security (security.py — new) - `validate_sandbox_security()` blocks dangerous host paths: `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`, docker.sock, podman socket - Called at the start of `CLISandboxBackend.start_session()` ### Box models (models.py) - Add `BoxHostMountMode.NONE` — skips volume mount entirely - Adjust `validate_host_mount_consistency` to allow arbitrary workdir when `host_path_mode=NONE` ### Box backend (backend.py) - Add `validate_sandbox_security()` call in `start_session()` - Add `langbot.box.config_hash` label on containers for drift detection - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg - Add `cleanup_orphaned_containers()` to base class (no-op default) and CLI implementation (single batched `rm -f` command) ### Box runtime (runtime.py) - Call `cleanup_orphaned_containers()` during `initialize()` to remove lingering containers from previous runs ### Box service (service.py) - Graceful degradation: `initialize()` catches runtime errors and sets `available=False` instead of crashing LangBot startup - Add `available` property and guard on `execute_sandbox_tool()` - Add `skip_host_mount_validation` parameter to `build_spec()` and `create_session()` — MCP paths are admin-configured and trusted, bypassing `allowed_host_mount_roots` restrictions meant for LLM-generated sandbox_exec commands ### Default behavior - stdio MCP servers automatically use Box when `box_service.available` is True (Podman/Docker detected); no explicit `box` config needed - When no container runtime exists, falls back to host-direct stdio - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false` (for site-packages), `host_path_mode=ro`, `startup_timeout=120s` ### Tests - `test_box_security.py`: blocked paths, safe paths, subpath rejection - `test_mcp_box_integration.py`: config model, path rewriting, venv unwrap, host_path inference, payload building, runtime info, box availability check - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests * feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests ## Changes ### Precise orphan container cleanup - Runtime generates a unique instance_id on startup - Every container gets a `langbot.box.instance_id` label - `cleanup_orphaned_containers()` only removes containers from previous instances, preserving containers owned by the current one - Containers from older versions (no label) are also cleaned up - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as a no-op default method, removing hasattr duck-typing ### Fine-grained MCP error classification - New `MCPSessionErrorPhase` enum with 7 phases: session_create, dep_install, process_start, relay_connect, mcp_init, runtime, tool_call - Each phase in `_init_box_stdio_server()` sets the error phase before re-raising, enabling precise failure diagnosis - `retry_count` tracked across retry attempts - `get_runtime_info_dict()` exposes `error_phase` and `retry_count` ### GET /v1/sessions/{id} API - `BoxRuntime.get_session()` returns session details including managed process info when present - `handle_get_session` HTTP handler + route in server.py - `BoxRuntimeClient.get_session()` abstract method + remote impl ### stdio defaults to Box when runtime is available - `_uses_box_stdio()` checks `box_service.available` instead of requiring explicit `box` key in server_config - `BoxService.initialize()` catches runtime errors gracefully, sets `available=False` instead of crashing LangBot startup - When no container runtime exists, stdio MCP falls back to host-direct execution ### Code quality (from /simplify review) - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants - Removed dead `_box_network_mode()` method and unused `bc` variable - Fixed broken import `from ....box.models` → `from ...box.models` - Cached `_resolve_host_path()` result — computed once, passed through - Config hash now includes `host_path` field - Batched orphan cleanup into single `rm -f` command ### Session leak fix - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s finally block, covering all exit paths (normal shutdown, error, retry, final failure) ### Integration tests - 6 end-to-end tests covering managed process lifecycle, WebSocket stdio bidirectional IO, session cleanup verification, single session query, process exit detection, and orphan cleanup safety * refactor: use rpc * fix: import * refactor(box): clean up sandbox subsystem code quality and efficiency - Fix O(n²) stderr trimming in runtime.py with running length tracker - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task, unused config_hash computation, unused imports - Deduplicate connection callback in BoxRuntimeConnector, parse URL once - Use enum comparison instead of stringly-typed spec.network.value check - Replace manual _result_to_dict/_session_to_dict with model_dump() - Cache NativeToolLoader tool definition and sandbox system guidance - Extract _is_path_under() helper to eliminate duplicated path checks - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining - Add JSON startswith guard in logging_utils to skip futile json.loads - Fix ruff lint errors (F401 unused imports, F841 unused variables) * fix: ruff * refactor(sandbox): keep box logic out of pipeline and localagent - Move sandbox system-prompt guidance from LocalAgentRunner into BoxService.get_system_guidance() so all box domain knowledge stays in the box module. - Remove standalone logging_utils.py; merge format_result_log() into MessageHandler base class alongside cut_str(). - Strip sandbox-specific JSON parsing from log formatting; tool results now use generic truncation. - Revert TYPE_CHECKING changes in stage.py and runner.py that were unrelated to this feature. - Skip two test files affected by a pre-existing circular import (runner ↔ app) until the import cycle is resolved in a separate PR. * refactor(box): move box runtime to langbot-plugin-sdk Extract self-contained box runtime modules (actions, backend, client, errors, models, runtime, security, server) to langbot-plugin-sdk and update all imports to use `langbot_plugin.box.`. Keep only service and connector in LangBot core as they depend on the Application context. - Update docker-compose to use `langbot_plugin.box.server` entry point - Update pyproject.toml to use local SDK via `tool.uv.sources` - Remove migrated source files and their unit/integration tests - Update remaining test imports to match new module paths fix: ruff * fix(box): tighten sandbox exposure and restore box integration coverage * refactor(types): remove quoted annotations under postponed evaluation * chore(sandbox): move MCP loader changes to follow-up branch * refactor(plugins): simplify GitHub install flow to default master archive * revert(api): restore plugin GitHub import flow in plugins controller * Improve data-root handling and skill install previews * Add managed skill authoring tools for local agents * Refactor the skills UI around sidebar detail pages * Document why managed skill authoring tools bypass box * fix: lint * feat(web): refactor plugin/skill install flows and fix skills page - Fix sidebar skill icon - Add skills route and error page component - Refactor plugin GitHub install from dialog modal to inline card - Add skill install dropdown menu (create/upload/github) in sidebar - Wire sidebar → skills page communication via pendingSkillInstallAction context - Add i18n keys for error page and skill install actions * fix(web): persist sidebar collapsible section open state on navigation Sections opened via sub-item navigation now retain their expanded state when the user switches to a different section, instead of collapsing because the isActive fallback becomes false. --------- Co-authored-by: youhuanghe <1051233107@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com> * feat(sandbox): add MCP box integration on top of sandbox base (#2083) * refactor(mcp): extract box stdio runtime helper * refactor(box): introduce reusable workspace session helper * refactor(box): run Box Runtime as subprocess inside LangBot container Remove the separate langbot_box_runtime Docker service. Box Runtime now always launches as a local stdio subprocess, regardless of whether LangBot runs in Docker or not. The WebSocket transport path is kept only for explicit runtime_url configuration (remote deployment). This simplifies deployment by eliminating cross-container path mapping and network hops. Box Runtime is a pure scheduling process (talks to Docker socket / nsjail), it does not execute user code or touch the filesystem, so container isolation is unnecessary — unlike Plugin Runtime. * fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor When switching runner (e.g. local-agent → n8n), the newly mounted stage's first emit would re-capture the saved snapshot, erasing the dirty state caused by the runner change. The save button would incorrectly go dim. - Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty - Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent) - Use stable onSubmitRef to prevent useEffect subscription churn - Add previousInitialValues guard to prevent initialValues echo loops * style(web): align plugin list header button heights * docs(review): update Box architecture review documents Replace old review docs with 5 focused documents: - box-architecture.md: deep architecture analysis (LangBot + SDK) - box-issues.md: 22 issues rated P0/P1/P2 - box-test-coverage.md: test coverage analysis - box-tob-analysis.md: toB commercialization analysis - box-vs-plugin-runtime.md: Box vs Plugin runtime comparison * feat(web): improve login error layout and add Terms of Service link - Improve backend connection error display with bordered container, inline icon, and better visual hierarchy - Extract actual error message from axios response object - Add Terms of Service link (https://langbot.app/terms) to login footer - Add termsOfService i18n key for all 7 locales * refactor(web): replace all hardcoded SVG icons with lucide-react Unify icon usage across the entire frontend by replacing 67 hardcoded SVG icons with lucide-react components across ~25 files. This improves consistency, maintainability, and reduces bundle duplication. Key replacements: - Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc. - MCP forms: Loader2, XCircle, Trash2 - Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc. - Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc. - Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile Zero hardcoded <svg> tags remain in .tsx files. * fix(web): stop polling plugin tasks when no active installs The PluginInstallTaskProvider was unconditionally polling getAsyncTasks every 3s on all /home/* routes. Now it only syncs once on mount and starts periodic polling only when there are active (non-terminal) install tasks. * fix(deps): update langbot-plugin version and add new dependencies * refactor: use Space API for release checks and stop idle polling - version.py: switch release list API from GitHub to space.langbot.app, remove unused in-place update logic (update_all, compare_version_str), translate all comments/logs to English - PluginInstallTaskContext: only poll when active install tasks exist * feat(box): add --standalone-box flag and 3-way transport decision for Box runtime Align Box runtime connection logic with Plugin runtime's pattern: - Docker: WebSocket to langbot_box container (ws://langbot_box:5411) - --standalone-box: WebSocket to external Box process (ws://localhost:5411) - Windows: subprocess + WebSocket (workaround for async stdio limitation) - Unix/macOS: subprocess + stdio pipe (unchanged) BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess lifecycle reuse. Add langbot_box service to docker-compose.yaml. * refactor(box): use single port with path-based routing for Box WS Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411. Update review docs to reflect the single-port architecture. * feat(web): show Box runtime status in plugin debug info popover Add Box status section to the debug info popover on the plugin list page, displaying connection status, backend info, profile, active sessions, and recent error count. Fetched from GET /api/v1/box/status in parallel with plugin debug info. Includes i18n for all 8 supported languages. * fix(web): remove ephemeral sandbox count from Box status display The active_sessions count reflects transient sandbox containers that expire after 5 minutes of inactivity, making it misleading in the UI. Keep only connection status, backend, profile, and error count. * feat(box): configurable sandbox scope and unified skill containers Replace the per-message session_id with a template-based system configurable per pipeline via 'Sandbox Scope' in the local-agent panel. Default scope is per-chat ({launcher_type}_{launcher_id}). Unify skill exec into the same container as default exec — skills are mounted at /workspace/.skills/{name}/ via extra_mounts instead of getting separate containers. All pipeline-bound skills are injected at container creation time. - Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages) - Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService - Rewrite native.py skill exec path to use execute_tool with shared session - Update tests for new session_id format - Add design doc: docs/review/box-session-scope.md * feat(web): show active sandbox details in Box status popover Display sandbox count and a detailed list of active sessions including session ID, image, backend, resources (CPU/memory), network mode, and last used time. Fetched from GET /api/v1/box/sessions in parallel. Includes i18n for all 8 supported languages. * feat(box): add startup and availability logging for sandbox tools Log Box runtime initialization result (success with profile info, or failure warning). Log native tool availability status at ToolManager startup so it's immediately clear whether exec/read/write/edit tools are registered for the LLM. * feat(box): support custom sandbox container image via config.yaml Add 'image' field to box config section. When set, it overrides the profile default image (python:3.11-slim) for all sandbox containers. Priority: caller-specified > config.yaml image > profile default. * feat(box): add heartbeat and reconnection for Box runtime connector Add 20-second heartbeat ping loop to detect silent Box runtime disconnections. On disconnect, set available=false and attempt reconnection after 3 seconds via the disconnect callback chain. - BoxRuntimeConnector: heartbeat loop, disconnect callback parameter, disconnect detection in connection callback and WS failure handler - BoxService: wire disconnect callback to toggle available state and re-initialize the connector on reconnection * feat(web): move runtime status to dashboard, clean up plugin debug popover Add SystemStatusCards component to the monitoring dashboard showing Plugin Runtime and Box Runtime connection status with details (backend, profile, sandbox count). Remove all Box/session status from the plugin page debug popover — it now only shows debug URL and key. Includes i18n for all 8 supported languages. * refactor(web): compact system status into a single card alongside metrics Replace the separate two-card row with a single compact 'System Status' card placed as the 5th column in the metrics grid. Shows green/red dots for Plugin Runtime and Box Runtime. Click to expand a popover with connection details (backend, profile, sandbox count). * feat: show connector error details for Plugin and Box runtime status Record Box connector error in BoxService and expose it as 'connector_error' in GET /api/v1/box/status when unavailable. Display error messages in the dashboard System Status popover for both Plugin Runtime (plugin_connector_error) and Box Runtime (connector_error) when they are disconnected. * fix(web): auto-refresh system status and show disconnect errors in real time Poll Plugin Runtime and Box Runtime status every 30 seconds so the dashboard reflects disconnections without a manual page refresh. Also re-fetch when the popover is opened for immediate feedback. * fix(box): handle RPC failure in get_status/get_sessions gracefully When the Box runtime disconnects, there is a race between the heartbeat flipping _available=false and the frontend polling get_status(). If the poll arrives first, client.get_status() throws a ConnectionClosedError which propagated as a 500, causing the frontend to show a grey dot (null status) instead of a red dot with error details. Now get_status() catches RPC errors and returns available=false with the exception message as connector_error. get_sessions() returns an empty list when unavailable or on RPC failure. * fix(box): add persistent reconnection loop with exponential backoff The previous disconnect handler only retried once and then gave up. Now spawns a background task that retries with exponential backoff (3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again. Uses a _reconnecting guard to prevent duplicate loops. Calls connector.dispose() before each retry to clean up stale tasks. * fix(box): detect disconnect when handler.run() returns normally The generic Handler.run() catches ConnectionClosedError and breaks out of its loop (normal return) instead of raising, because it has no disconnect_callback. The old code only triggered reconnection in the except branch, so a clean WebSocket close was never detected. Now treat handler.run() returning normally (after successful handshake) as a disconnect event, triggering the reconnection callback. * fix(web): refresh system status card when clicking Refresh Data button Pass a refreshKey prop through OverviewCards to SystemStatusCard that increments on each Refresh Data click, triggering a re-fetch of Plugin and Box runtime status alongside the monitoring data refresh. * fix(web): fix system status card stuck in loading state fetchStatus(showLoading=false) never called setLoading(false), so the initial loading=true was never cleared. Simplify to always setLoading in the finally block — the spinner only shows on the very first load since subsequent fetches complete near-instantly. * feat(web): show active sandbox details in dashboard Box status popover Fetch box sessions alongside status and display each active sandbox in the popover with session ID, image, resources (CPU/memory), and last used time. * feat(box): add global sandbox scope option Add a 'Global (shared by all)' option to the sandbox scope selector. Uses a constant '{global}' template variable that always resolves to 'global', so all users and chats share one sandbox container. * refactor(web): replace popover with dialog for system status details Replace the dropdown popover with a proper Dialog for runtime status details. Add a small info button on the System Status card that opens the dialog. Session details now show in a spacious 2-column grid layout with full image name, backend, CPU/memory, network, mount path, and created/last-used timestamps. * fix(web): widen system status dialog and fix scroll border issue Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move overflow-y-auto to an inner container with overflow-hidden on DialogContent to prevent padding bleed at scroll edges. * feat(web): add tooltips for truncated fields in system status dialog Wrap session_id, image, and mount path fields with Tooltip components so hovering over truncated text shows the full value. * feat: add download button * feat: successfully install * feat: delete old filter * feat: youhua frontend * fix: align box runtime launch args * feat: translate * feat: refactor market * feat: youhua qianduan * chore: rename extension zh translation * feat(extensions): unify extensions endpoint and refresh extensions page UX - Rename /home/plugins route to /home/extensions and update all sidebar links. - Add unified GET /api/v1/extensions returning plugins, MCP servers and skills, sorted by name; replace the three separate frontend fetches with this single call. - Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/ Switch/Label) and clean up hardcoded color tokens on the extension card. - Add a localStorage-persisted "Group by type" switch that, when enabled in the All Types tab, renders extensions grouped by type with a compact section header. - Show a spinner while loading and rename the empty-state copy from "No plugins installed" to "No extensions installed". - Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): fallback lucide icon when extension icon is missing Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension card when the icon URL is empty or the image fails to load. Picked icons distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to avoid visual collision with plugin component badges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): unify installed-extensions list with plugins, MCP and skills - Render plugins, MCP servers and skills together under the "Installed Extensions" sidebar entry, alphabetically sorted to match the list page. - Resolve per-item routes by extension type (plugin -> /home/extensions, mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover context menu on extensionType === 'plugin'. - Lift the "group by type" toggle into SidebarDataContext (still persisted in localStorage) so the sidebar groups items with section headers whenever the list page has the toggle enabled. - Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the LangBot blue for MCP, skill, and missing-icon plugin items, overriding the SidebarMenuSubButton svg color rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extensions): mobile-friendly layout for extensions and add-extension pages - Stack the extensions page header vertically on small screens, let the filter Tabs scroll horizontally if they overflow, hide the debug button label below sm and let the install/debug controls wrap. - Constrain the debug popover and its inputs to the viewport width so they no longer overflow on phone-sized screens. - Drop the card grid from a fixed 30rem column to a min(100%, 22rem) column at base / 28rem at sm, and reduce the gap, so cards render cleanly at 360px+ widths in both flat and grouped views. - Make the add-extension header actions wrap on lg- viewports and the install dialog responsive instead of a hard 500px box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: change ui * feat: delete version for mcp and skills * fix: constrain home page content width * fix: preserve monitoring card borders under sticky filters * fix(box): restore sandbox config and shared mcp runtime * fix(box): harden sandbox session isolation * fix(skill): remove auto activation setting * feat(skill): align skill system with Claude Code's Tool Call design - Replace text marker activation with `activate` tool (Tool Call mechanism) - Replace 7 authoring tools with 2: `activate` + `register_skill` - Add builtin skills loading from templates/skills/ - Add create-skill as first builtin skill - Remove SKILL_ACTIVATION_MARKER and text detection methods - Tool Result returns SKILL.md content (protects KV Cache) This aligns with Claude Code's progressive disclosure pattern: - Metadata (name+description) always visible in tool description - SKILL.md body loaded on activate via Tool Call - Bundled resources accessible through virtual path mapping Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(tools): add glob and grep native sandbox tools Add file discovery and content search capabilities to the sandbox: - glob: Find files by pattern (supports ** recursive matching) - grep: Search file contents with regex patterns Both tools respect skill package paths and include safety limits (max 100 files for glob, max 200 matches for grep). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(skill): add skill file browsing capability - Add API endpoints for listing/reading/writing skill files - Add FileTree component in SkillForm for directory browsing - Users can now view scripts/, references/, assets/ directories - Files can be selected and edited in the instructions textarea - Add translations for new file browsing features Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): copy builtin skills to data/skills on startup - Builtin skills (templates/skills/) are now copied to data/skills/ - Users can view and manage builtin skills in the UI - Rename SkillAuthoringToolLoader to SkillToolLoader Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(skill): improve file browsing and fix path handling - Fix nested directory display in skill file tree (preserve root entries) - Fix file content display when clicking files in skill browser - Add skill manager and tool manager as proper package modules - Separate fileContent state to allow editing non-SKILL.md files Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(toolmgr): correct skill_tool_loader attribute name Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call and shutdown methods to match the attribute defined in initialize(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(native): update tool descriptions to use register_skill Replace references to removed import_skill_from_directory with register_skill in exec/write/edit tool descriptions. * feat(toolmgr): enhance tool initialization with backend availability checks * refactor: remove unused imports and clean up code in various files * feat: polish extension detail pages * feat: persist sidebar list expansion * fix: refine extension ui and backend errors * fix: align add extension marketplace ui * feat: manage skills through box runtime * feat: support github skill installation * fix: import github skill directories * feat: install market extensions from card click * feat(web): improve skill import flow * feat: polish extension import flow * fix(mcp): stabilize shared box managed processes * fix(web): improve backend retry and sidebar scrolling * docs(review): refresh box architecture review for feat/sandbox Sync the docs/review/ suite to the current state of the feat/sandbox branch (both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review. - box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b} config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill Tool Call activation, shared multi-process MCP container, SkillManager, BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect - box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail image conflict, frontend monitoring card) into a Resolved section; add new P0 (INIT/backend ordering), P1 (extra_mounts immutability after container creation), P2 (skill_store test gap, integration tests not in CI) - box-session-scope.md: add §0 Implementation Status — Phase 1 shipped, MCP unification landed earlier than originally scoped - box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC), add 7 new test files including SDK backend_selection/e2b/skill_store - box-tob-analysis.md: connection recovery now满足基本要求; add E2B and backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat - box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned with Plugin Runtime; revise remaining gaps (WS auth, shared base class) * refactor(box): use unified env-override mechanism for box.local config The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two places (connector._get_box_config and service._local_config), duplicating logic that LoadConfigStage._apply_env_overrides_to_config already provides generically via the SECTION__SUBSECTION__KEY convention. - Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight from instance_config (the unified BOX__LOCAL__* overrides are already applied before BoxService initializes) - Harden _load_allowed_mount_roots to accept a comma-separated string, since the generic mechanism stores a freshly-created key as a raw string when config.yaml has no box.local.allowed_mount_roots entry - docker-compose: rename the langbot container env vars to BOX__LOCAL__* (the canonical convention); remove them entirely from the langbot_box container — the Box runtime never reads box.local from env/config.yaml, it is configured via the INIT RPC action Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: repair stale skill/sandbox tests for feat/sandbox The skill subsystem moved to Tool-Call activation and a Box-managed skill store; several tests still asserted removed APIs and a sys.modules stub leaked across the suite. Full unit suite now green (was 23 failing). - test_skill_tools: drop TestSkillManagerActivation (text-marker API removed); rewrite TestSkillActivationHelper around the current skill.activation.register_activated_skill; replace the CRUD TestSkillAuthoringToolLoader with TestSkillToolLoader covering the current activate/register_skill tools and sandbox-availability gating - test_tool_manager_native: ToolManager attr is skill_tool_loader (not skill_authoring_tool_loader); native loader now exposes 6 tools (exec/read/write/edit/glob/grep) and requires initialize() with a backend-available get_status() - test_localagent_sandbox_exec: remove obsolete activation-marker leakage tests and their helper providers - test_model_service / pipeline conftest: give the mocks skill_mgr=None so PreProcessor's local-agent skill-binding guard short-circuits - test_n8nsvapi: stop permanently overwriting sys.modules ('langbot.pkg.provider.runner' etc.); save and restore around the import so other modules get the real LocalAgentRunner base class Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(tests): run unit tests on every push to feat/ branches - Add feat/ to push branches so long-lived feature branches are tested on every push (they accumulate large changes before a PR) - Drop the push path filter entirely: every push to master/develop/ feat/ now runs the full unit suite (the old 'pkg/' filter never matched the real source path 'src/langbot/pkg/', so backend-only pushes silently skipped tests) - Fix the same broken path glob on the pull_request trigger ('pkg/' -> 'src/langbot/pkg/*') Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(skill): harden mount/reload paths and HTTP errors against stale skill cache The Box backends behave inconsistently when extra_mounts reference a missing host directory (nsjail aborts the entire sandbox start, Docker silently creates a root-owned empty dir on the host, E2B silently skips the upload). The cache in skill_mgr.skills is only refreshed on in-process mutations, so out-of-band changes — container rebuilds, manual rm in the box volume, anything the LangBot API didn't drive — leave a stale skill that later produces one of those bad mount paths. - box/service.py: build_skill_extra_mounts now filters skills whose package_root is not isdir on the LangBot-visible filesystem and logs a warning, instead of passing the bad mount through to the backend - skill/manager.py: reload_skills (Box path) drops skills whose package_root is missing on the LangBot-side filesystem before they reach the in-memory cache, with a summary warning - api/http/controller/groups/skills.py: file/CRUD handlers now also catch BoxError (RuntimeError subclass, previously slipping past ``except ValueError`` and surfacing as 500); list/get handlers gain a try/except so a transient Box RPC failure becomes a clean 400 instead of a stack trace Tests added for build_skill_extra_mounts (skip missing, skip empty, no skill manager) and SkillManager.reload_skills (drop missing on Box path). Full unit suite: 279 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(box): add box.enabled toggle and gate consumers on availability Make the Box sandbox runtime optional. When ``box.enabled`` is false in config (or when an enabled Box fails to connect), every dependent feature degrades to the same disabled-state UX rather than crashing or silently falling back to less safe code paths. Backend: - config.yaml: new top-level ``box.enabled: true`` flag (default true) - BoxService: - Read box.enabled on construction - initialize() short-circuits when disabled — no remote WS connect, no stdio subprocess fork - _on_runtime_disconnect is a no-op when disabled (no reconnect loop on a deliberately-off service) - get_status() now exposes ``enabled`` so the frontend can tell "disabled in config" from "configured but failed" - MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to be available, not just installed - MCP _init_stdio_python_server: when ap.box_service exists but is unavailable, refuse the stdio server with an actionable error instead of silently falling through to host-stdio (which bypasses the sandbox the operator asked for). Setups without ap.box_service installed at all keep the legacy host-stdio fallback for pre-Box dev mode - SkillService._require_box_for_write: refuses create/update/install/ write_skill_file when ap.box_service is installed but unavailable. Distinguishes disabled vs failed in the error message so the UI can surface the right hint. Legacy setups (no ap.box_service) keep the local fallback path — that distinction is what keeps the existing local-skills tests valid Tests: - Box disabled-state behavior (4 cases) - Skill write refusal in disabled & failed states (7 cases) - MCP stdio runtime info policy updated to match new refuse-when-down behavior Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface Box disabled/unavailable state across consumers When Box is disabled in config (``box.enabled = false``) or fails to connect, every dependent UI surface now degrades visibly: - ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``, ``disabled`` (config-off) and a single ``hint`` key so callers don't have to re-derive the three states - ``BoxUnavailableNotice`` reusable Alert banner driven by that hint - Dashboard SystemStatusCards: three-state dot + label (connected / disabled-gray / disconnected-red); disabled state shows the ``boxDisabled`` hint, failed state continues to show the connector error. Plugin block kept untouched - Skills page (create view) and SkillDetailContent (edit view): Save button disabled and banner inserted above the form when Box is unavailable — matches the backend gate added in the previous commit - PipelineExtension skill section: ``enable_all_skills`` switch, Add Skill button and Remove buttons all gate on Box availability; banner inline under the section header - PipelineFormComponent: banner above the ``local-agent`` stage card when Box is unavailable, since that stage carries the sandbox-bound ``box-session-id-template`` field - Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale files updated with ``boxDisabled`` / ``boxUnavailable`` / ``boxRequiredHint`` strings Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(box): document the box.enabled toggle and gate behavior matrix - docker-compose: move ``langbot_box`` under compose profiles (``box`` and ``all``) so ``docker compose up`` no longer requires the sandbox container. Inline comment explains how to pair the profile choice with ``box.enabled`` so the langbot service does not thrash trying to reach a runtime that was never started - docs/review/box-architecture.md: - Annotate ``box.enabled`` in the config.yaml example, listing the exact side effects (no remote/stdio connect; tools/skills/MCP stdio off; reads still work) - Replace the bare compose snippet with the actual profile-driven invocation and the BOX__ENABLED pairing - New "关闭/连接失败时的行为矩阵" section: a single table mapping every consumer (native tools, activate/register_skill, stdio MCP, skill list/CRUD, pipeline AI config, extensions page, dashboard) to its disabled-state behavior, plus the legacy ``ap.box_service`` distinguisher note Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip The previous commit hard-coded a BoxUnavailableNotice banner above the ``local-agent`` stage card. That works, but it shouts at the user about every field in that stage when in reality only one field — ``box-session-id-template`` — depends on the sandbox. Use the dynamic-form schema's existing variable-injection mechanism (``__system.`` references via ``systemContext``) and add a sibling to ``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays visible, becomes inert, and an info icon next to its label exposes the reason on hover. The rest of the AI tab is left untouched. - entities/form/dynamic.ts: extend IDynamicFormItemSchema with ``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject`` - DynamicFormComponent: evaluate disable_if with the same resolver as show_if; OR the result into isFieldDisabled; render an Info tooltip trigger next to the label when the condition matches - ai.yaml metadata: attach disable_if (__system.box_available eq false) and a localized disabled_tooltip to box-session-id-template - PipelineFormComponent: drop the BoxUnavailableNotice import and the per-stage banner; pass ``systemContext={ box_available: boxAvailable }`` only for the local-agent stage so other stages aren't paying the re-render cost Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(mcp): friendly UI message when stdio MCP refused by Box state Previously the MCP detail dialog dumped the raw RuntimeError text from ``_init_stdio_python_server`` — English-only, prefixed with "Failed after 4 attempts", and exposing internal config names. The retry wrapper also kept retrying a refusal that is deterministically going to fail again, polluting logs. Replace the raw text with a structured signal: - New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio refusal path sets it before raising and uses a short opaque discriminator (``box_disabled_in_config`` / ``box_unavailable``) as the message body — never user-facing - ``_lifecycle_loop_with_retry`` short-circuits on ``BOX_UNAVAILABLE``: surfaces the error immediately, no retries, no "Failed after N attempts" prefix. Silences the warning storm seen during smoke-testing - ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``, ``retry_count``, ``box_session_id``, ``box_enabled`` to match what the backend already returns in get_runtime_info_dict() - Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and ``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect ``error_phase === 'box_unavailable'`` and render a two-line localized notice: state line ("Box disabled / unreachable") plus remediation line ("enable Box or switch to http/sse") - 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get ``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``, ``mcp.boxStdioRefusedSuggestion`` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp-web): block stdio MCP creation at the form when Box is unavailable When Box is disabled in config (``box.enabled = false``) or unreachable, saving a new MCP server in stdio mode produced one that could never start — the user would only learn that from the runtime error on the detail page. Stop the user before they save instead. Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog ``MCPFormDialog.tsx``) now: - Disable the ``stdio`` option in the mode select when Box is unavailable, with a small "(requires Box)" suffix so the reason is obvious. Existing stdio configs still display their current value - Show ``BoxUnavailableNotice`` inline under the mode select when the currently-selected mode is stdio and Box is unavailable, so editing a stale stdio config makes the cause visible - Disable the Save / Submit button while stdio is selected under that condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop so the parent ``MCPDetailContent`` can disable both its Submit and Save buttons. ``MCPFormDialog`` disables its Save button locally - Refuse the submit handler too (Enter-key path) with a toast carrying the same i18n message i18n: ``mcp.boxRequired`` (short tag in the disabled option) and ``mcp.stdioBlockedByBoxToast`` added to all 8 locales. Backend runtime gate (``_init_stdio_python_server`` refusal + ``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place as the last line of defence for API bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(web): prevent plugin config form overflow * refactor(skill): remove all local-filesystem fallbacks; Box is the sole source Skills now flow exclusively through the Box runtime. Every read and write method funnels through ``_box_service()``; when Box is unavailable (disabled in config, connection failed, or simply not installed) the operation either returns an empty surface (``list_skills`` → []) or raises with a clear ``Box runtime ... not initialised / disabled / unavailable: ...`` message via the new ``_require_box(action)`` helper. Why: the legacy local-fallback path scanned ``data/skills/``, but Box manages its own ``box.local.skills_root`` (default ``data/box/skills/``). The two diverging directories caused stale / phantom skill lists when Box flapped, and the local-fallback writes silently bypassed all the sandboxing the operator had configured. SkillService (``api/http/service/skill.py``): - New ``_require_box(action)`` returns the box service or raises a structured ValueError. ``_require_box_for_write`` kept as alias - ``list_skills`` → returns [] when Box is down so the UI can render the disabled banner cleanly - ``get_skill`` / ``get_skill_by_name`` → return None - All read-file / write-file / scan-dir / create / update / delete / install / preview methods → ``_require_box`` then box delegate. Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview pipelines) removed entirely SkillManager (``pkg/skill/manager.py``): - ``reload_skills`` returns early with empty cache when Box is down. data/skills/ discovery loop removed - ``refresh_skill_from_disk`` now just reports cache presence; the on-disk re-parse is gone since Box is the only writer Tests: - Drop 11 obsolete test_skill_service.py tests that exercised the removed local-fallback paths (create/install/file/delete/update) - Add list-empty + read-refused tests; flip the legacy-allow test to legacy-refuses-too - Rewrite refresh_skill_from_disk test to match the new behaviour Several helper methods (_managed_skill_path, _resolve_skill_path, _preview_skill_candidates, _install_preview_candidates, etc.) are now unreachable; a follow-up commit will prune them so this diff stays reviewable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(skill): prune dead local-filesystem helpers left over from Box migration Follow-up to the Box-only refactor. The previous commit removed the local-fallback BRANCHES from every public method; this one removes the HELPERS those branches called, which are now unreachable. SkillService (service/skill.py): 787 → 449 lines Removed: scan_directory (sync), _read_skill_package, _write_skill_md, _resolve_create_field, _managed_skill_path, _managed_install_root_for_package, _normalize_package_root, _resolve_skill_path, _find_skill_entry, _discover_skill_directories, _safe_extract_zip, _extract_uploaded_skill_to_temp, _download_github_skill_to_temp, _resolve_github_source_root, _build_preview_target_dir, _preview_skill_candidates, _select_preview_candidates, _install_preview_candidates, _preview_source_root, _resolve_installed_skills, plus the module-level _FRONTMATTER_FIELDS and _build_skill_md. Kept (still needed by the surviving GitHub-import path): _download_github_asset, _download_github_skill_directory_as_zip, _find_github_skill_archive_entry, _copy_github_skill_directory_to_zip, _is_github_skill_md_url, _parse_github_skill_md_url, _resolve_github_skill_md_package_name, _validate_github_asset_url, _uploaded_skill_target_stem, _validate_skill_name. Imports dropped: shutil, tempfile, yaml, ....utils.paths. SkillManager (skill/manager.py): 187 → 88 lines Removed: get_managed_skills_root, _discover_skill_directories, _find_skill_entry, _load_skill_file, _normalize_package_root. Imports dropped: datetime, parse_frontmatter, paths. Tests: - test_skill_service.py: drop the 3 sync scan_directory tests + skill_service fixture + _create_skill_file helper - test_skill_tools.py: drop test_load_skill_file_success; rename TestSkillManagerPackageLoading → TestSkillManagerCache Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skill): re-inject skill index into local-agent system prompt The contributor's original PR (#1917) appended an ``Available Skills`` index to the system prompt before the LLM saw the user message, so the LLM could decide whether to activate a skill. ``7145447b`` removed the text-marker activation flow and, together with it, the entire system prompt injection — but the Tool Call replacement only put the available skills inside the ``activate`` tool's description. In practice the LLM ignores tool descriptions for selection and goes straight to native tools, so user-visible skill activation silently broke. Restore the injection, adapted for the Tool Call era: - SkillManager regains ``get_skill_index(bound_skills)`` and ``build_skill_aware_prompt_addition(bound_skills)``. The addendum carries only ``name (display_name): description`` for each pipeline-visible skill plus one instruction line pointing at the ``activate`` tool. No SKILL.md contents — KV cache stays clean - PreProcessor appends the addendum to the first system message (or inserts a new one) of ``query.prompt.messages`` for the local-agent runner. Handles plain-string and ContentElement[] bodies. Skips cleanly when no skills are visible - 3 new test_preproc cases: injection happens, bound-skills subset honoured, empty addendum touches nothing. 280 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(box): downgrade get_status.available when backend probed unavailable Until now ``BoxService.get_status`` returned ``available: true`` whenever the runtime connector was healthy, even if the runtime itself reported ``backend: { available: false }`` (operator selected nsjail without the binary, Docker daemon crashed mid-session, E2B credentials wrong, ...). The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the top-level flag and showed "connected" while every actual call to native exec or skill management would fail. The native-tool loader already polled ``status.backend.available`` independently and hid its tools correctly, but every other consumer (dashboard banner, the disabled-state hint, the LLM-facing message) disagreed with it. Combine the two in the payload: ``available = self._available AND status.backend.available``. When ``backend.available`` is false we now also surface a ``connector_error`` that names the backend ("Configured sandbox backend \"nsjail\" is unavailable") so the dialog shows the actionable reason instead of an empty error pane. The detailed ``backend`` object is preserved unchanged for the dialog. Internal ``box_service.available`` (used by ``skill_service`` writes, ``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally NOT changed — it still tracks connector health only, so a backend blip does not trigger spurious reconnect loops. Tests: - ``test_get_status_downgrades_available_when_backend_dead`` — exercise the new branch (connector OK, backend.available=false → top-level available=false, connector_error mentions the backend name) - ``test_get_status_keeps_available_true_when_backend_ok`` — guard against regressing the happy path Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary): ``GET /api/v1/box/status`` now returns ``available: false`` with the named connector_error, instead of the previous misleading ``available: true``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): surface the specific Box failure reason in unavailable banner When Box is configured but the runtime reports its backend is dead (e.g. ``box.backend = nsjail`` but the binary is missing, or Docker daemon crashed), the backend now returns a structured ``connector_error`` like ``Configured sandbox backend "nsjail" is unavailable``. The previous notice only said "Box sandbox is unavailable" + a generic "enable Box" hint, hiding the actionable detail. - ``useBoxStatus``: derive ``reason`` from ``status.connector_error``. Only exposed for the failed-state (``hint === 'boxUnavailable'``), since the disabled-by-config message already carries its reason - ``BoxUnavailableNotice``: insert the reason as a small monospaced line between the state message and the action hint. The disabled variant is unchanged (operator chose the state) - Wire ``reason`` through every existing call site (Skills page + detail, PipelineExtension, both MCP forms). Old unused ``context`` prop dropped Net layout (3 lines, still compact): ⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ... Configured sandbox backend "nsjail" is unavailable This feature requires the Box runtime. Enable it in config ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: reconcile master's unit tests with feat/sandbox refactors The merge from master brought in new unit tests that target pre-refactor APIs on feat/sandbox. Reconcile each: - factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills dict + inert prompt-addition builder) and a Mock pipeline_service so the PreProcessor skill-index injection branch can run end-to-end in tests. - pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so pipeline.stage is fully initialised before any individual stage test (preproc, longtext, ...) tries to lazy-load it. Without this preload, running test_preproc.py in isolation hit a circular-import error via the stage -> app -> pipelinemgr -> stage chain. - provider/test_tool_manager.py: ToolManager now probes four loaders (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in the execute_func_call fixture and assert all four shutdowns fire. - utils/test_paths.py: drop the three cwd-dependent _check_if_source_install cases. The refactor walks Path(__file__).resolve().parents looking for pyproject.toml + main.py, so cwd no longer factors in and there's no file read to mock-fail. The positive case and caching test still apply. - utils/test_version.py: delete entirely. is_newer and compare_version_str were removed when VersionManager was refactored to use the Space API for release checks (`1b4107a9`); the tests targeted a surface that no longer exists. * refactor(box): launch box runtime via the lbp CLI subcommand Mirror the plugin runtime: box is now started through the same CLI entry point (langbot_plugin.cli) instead of the box module directly. - docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box` (WebSocket is the default transport, no flag needed — matches `rt`). - box/connector.py: both subprocess launch sites (_start_local_stdio and the Windows _start_subprocess_then_ws path) invoke `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport. - docs/review: update stale `-m langbot_plugin.box[.server]` references. Pairs with the SDK change that removes box's direct-launch entry points (python -m langbot_plugin.box / .box.server) and the legacy --mode flag. * chore: bump langbot-plugin beta 1 * fix(ci): resolve langbot-plugin from PyPI and clear lint failures CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image. Root causes and fixes: - pyproject.toml had a [tool.uv.sources] editable override pinning langbot-plugin to ../langbot-plugin-sdk. That path only exists in a paired local checkout, so `uv sync` failed on every CI runner ("Distribution not found"). Remove the override and regenerate uv.lock so langbot-plugin==0.4.0b1 resolves from PyPI, matching master. - tests/integration/api/test_pipelines.py: the pipeline extensions endpoint now calls ap.skill_service.list_skills(); add the missing skill_service mock to the fake_pipeline_app fixture (the test came from master, the endpoint change from feat/sandbox). - Apply ruff format to three src files and prettier to three web files that had committed formatting drift, failing `ruff format --check` and `pnpm lint`. * chore: bump beta version * docs: remove BOX_BACKEND override reference * fix(pipelines): stop attributing dashboard debug WS to bound web_page_bot The dashboard pipeline-debug WebSocket (/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket (/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths, but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when the same pipeline happened to be bound to a web_page_bot, passed that bot as `owner_bot` into `handle_websocket_message`. The adapter then used the page bot's listeners + adapter for the request, so debug sessions were logged as "page bot" activity in the dashboard. Debug sessions must always run under the built-in websocket_proxy_bot. Remove `_find_owner_bot`, drop the `owner_bot` parameter from the debug-path `_handle_receive`, and call `handle_websocket_message` without it so the adapter takes its default proxy-bot branch. The embed handler still resolves and passes its `runtime_bot` for the page-bot path, so attribution there is unchanged. * fix(plugin): install marketplace MCP from canonical mode + extra_args _install_mcp_from_marketplace read the dropped `mcp_data.config` field and reconstructed mode/extra_args by guessing from the URL — which lost stdio's command/args/env/box entirely, so stdio MCP installs from the marketplace always failed. Use the Space record's canonical `mode` and `extra_args` directly (the same shape stored in mcp_servers), and gate the install on `mode` instead of the removed `config`. After a successful install, best-effort POST to the marketplace install endpoint to bump install_count. * feat(web): show recommendation lists in plugin market; mixed-type icons The marketplace recommendation lists (curated rows from Space) were never mounted in the plugin market page. Wire them in: - fetch recommendation lists on mount and render them above the extension grid, only when no search/filter is active. Recommendation lists now mix plugins, MCPs and skills, so resolve each card's icon by type (plugin / mcp / skill marketplace icon URL) instead of always using the plugin icon endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): auto-open install dialog from one-click deep link Accept a deep link from LangBot Space's one-click install: /home/add-extension?install=1&extension_type=<plugin\|mcp\|skill>&author=&name=&version= On mount, populate the install info, open the confirm dialog directly, and strip the params from the URL. Reuses the existing marketplace install flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: push marketplace URL to runtime; fix market client base race - On connecting to the plugin runtime, push the configured space.url via the new SET_RUNTIME_CONFIG action so the runtime downloads plugins from the same Space, instead of relying on its own CLOUD_SERVICE_URL env/default. Wrapped in try/except so an older SDK without the action degrades gracefully. - web: the plugin market fetched recommendation lists (and listings) via the sync cloud client before its baseURL was resolved from system info, so it hit the default space.langbot.app. Await getCloudServiceClient() before the initial fetches and for the recommendation list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): don't show MCP "connection failed" while still connecting The MCP status UI rendered "连接失败" for any non-connected state, so during a normal connection attempt the subtitle showed "连接失败" while the status pill below it showed "连接中..." — contradictory. Only treat an explicit ERROR (or box-unavailable) status as failed; a CONNECTING or initial/unresolved status now shows "连接中". Applied to the MCP detail form (subtitle + StatusDisplay) and the MCP server card. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): type-aware install dialog + refresh sidebar after install The marketplace install confirm dialog was hardcoded to "安装插件 / 确定要安装插件 X 吗" for every type. Make it type-aware (plugin / MCP / skill) and show more info: type chip, author/name id, and version when present. Also refresh all sidebar extension lists (plugins, MCP servers, skills) when an install task completes, so the newly-installed extension appears immediately regardless of type (previously only refreshPlugins ran). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): richer install dialog (icon + name + description), drop redundant type row The install dialog already states the type in its title, so the "类型" row was redundant. Replace the info box with the extension's icon (avatar), display name, author/name id + version, and description — built from the PluginV4 for in-app installs and from the icon endpoint by type for the one-click deep link. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): TDZ crash in add-extension (installIconURL before installInfo) installIconURL was computed above the useState declaration of installInfo, causing "Cannot access 'installInfo' before initialization" (500) on the add-extension page. Move the computation below the state declarations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): redesign install-progress dialog for MCP/skill The progress dialog showed plugin-only stages (download + dependency install) for every type. MCP/skill have no such steps, so show a single "installing → done/failed" row for them (MCP: adding & connecting the server; skill: installing the package) while keeping the detailed download/deps stages for plugins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(web): add missing market.componentName i18n keys The marketplace component filter (and component badges) used market.componentName.{Tool,Command,EventListener,KnowledgeEngine,Parser,Page} but those keys only existed under plugins.componentName, so the market UI showed raw keys. Add a componentName block to the market namespace (zh-Hans + en-US; other locales fall back to zh-Hans). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(web): sidebar extensions refresh button + full-name tooltip - Add a refresh button to the installed-extensions category header in the sidebar; it re-fetches plugins + MCP servers + skills and spins while loading. - The sidebar item tooltip now shows the extension's full name (with the description below when present), so truncated MCP/extension names are readable on hover. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(plugin-market): rename component filter to "插件组件" with hint tooltip + persist filters - Rename the in-app plugin market component filter label to "插件组件" / "Plugin Component" - Add an Info icon tooltip explaining what plugin components are (Tool / Command / EventListener, etc.) - Persist filter selections (type / component / tags / sort) in localStorage so they survive reloads; restored on mount (URL type param still wins) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): restore missing "页面"(Page) component filter option The market component-filter list on this branch was a diverged rewrite that dropped the Page component kind master had added. The i18n key (market.componentName.Page) already existed; re-add the Page entry to the componentOptions list so plugins providing Page components can be filtered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(i18n): reword plugin component filter hint Drop the redundant "插件组件是" lead-in and mention that components extend LangBot's capabilities; mirror the wording in en-US. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(i18n): backfill missing market/addExtension keys in 6 locales check-i18n surfaced that market.componentName., market.filterByComponentHint and the addExtension.install keys existed only in en-US/zh-Hans. Backfill them for es-ES, ja-JP, ru-RU, th-TH, vi-VN and zh-Hant (reusing each locale's existing component-name translations) and align the filterByComponent label with the new "Plugin Component" wording. check-i18n now passes for all locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(plugins): relabel "group by type" as "group by format" The installed-extensions grouping is by extension format (plugin / MCP / skill), so rename the toggle label accordingly across all 8 locales (key unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plugin-market): cursor-pointer on tag filter trigger The TagsFilter Select trigger used the default cursor; add cursor-pointer so the tag filter is clearly clickable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(sidebar): show edition badge (Community / Cloud) in logo area Add a small badge next to the LangBot name in the sidebar header that reflects systemInfo.edition: a neutral "Community" badge for the community edition and a blue "Cloud" badge for the cloud edition. Adds sidebar.editionCommunity / sidebar.editionCloud across all 8 locales. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * i18n(sidebar): unify zh-Hans cloud edition label to 云端版 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sidebar): edition badge - drop hover, use "Cloud" in all locales The edition badge is not interactive, so remove the hover background on the cloud badge. Also use the literal "Cloud" label uniformly across all locales instead of localized variants (云端版/クラウド版/...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(box): cap tool-call loop and run workspace-quota walk off the event loop Two robustness fixes that bite under normal sandbox usage (not just attack), hardening the self-hosted community edition before release: - localagent: cap the tool-call loop at MAX_TOOL_CALL_ROUNDS (128). A looping or adversarial model could otherwise emit tool calls indefinitely (each potentially a sandbox exec), producing a non-terminating request and runaway cost. The cap is generous enough not to interrupt legitimate multi-step agentic workflows. - box.service: make _enforce_workspace_quota async and run the recursive workspace scan via asyncio.to_thread. It ran on every quota-enforced exec and a large workspace would block the whole asyncio runtime (all bots/pipelines). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): refresh box docs; trim issue list to SaaS blockers only Community self-hosted edition is release-ready, so the box review docs are updated to current state (date 2026-06-02 + status note) and box-issues.md is rewritten to keep only the SaaS / multi-tenant / network-exposed release blockers (S1-S8): unauthenticated control plane, no per-pipeline exec authorization, unbounded sessions + no reaper, no kernel-level quota, mount validation gaps (/ + extra_mounts), missing container hardening, lock-around- cold-start, and the lower-severity follow-ups. Resolved items (tool-call loop cap, async quota scan, host_path mount allowlist, _is_path_under dedup) moved to a short "resolved before community release" record; community-only and pure-cleanup items dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(deps): pin langbot-plugin to 0.4.0 Track the stable SDK release (0.4.0b1 -> 0.4.0); regenerate uv.lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: WangCham <651122857@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: fdc310 <82008029+fdc310@users.noreply.github.com> Co-authored-by: Junyan Qin <rockchinq@gmail.com>	2026-06-03 11:12:39 +08:00
huanghuoguoguo	17bbc8bf10	Feat/test build (#2174 ) * fix(ci): update unit-test workflow paths to match current source layout Replace stale pkg/ filter with src/langbot/ and add uv.lock. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(tests): update README to reflect current test layout - Fix stale paths: tests/pipeline → tests/unit_tests/pipeline - Update CI Python versions: 3.11, 3.12, 3.13 - Add test directory structure for box, config, platform, plugin, provider, storage - Document pytest markers and uv commands - Mention planned E2E tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add shared test factories package Create tests/factories/ with reusable test factories: - FakeApp: mock application with all dependencies - Message chains: text_chain, mention_chain, image_chain - Query factories: text_query, group_text_query, command_query, etc. No test changes - maintains backward compatibility. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake provider factory Add tests/factories/provider.py with: - FakeProvider: deterministic fake LLM provider - Error simulation: timeout, auth, rate-limit, malformed - Request capture for assertions - fake_model: mock model with attached provider Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake platform factory Add tests/factories/platform.py with: - FakePlatform: simulated platform adapter - Inbound message construction: friend/group/image - Mention-bot flag simulation - Outbound message capture for assertions - Streaming output support simulation - Send failure simulation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add comprehensive message/query factories Extend tests/factories/message.py with: - file_query: file attachment query - unsupported_query: unknown message segment - voice_query: audio/voice query - at_all_query: group @All mention - query_with_session: query with session object - query_with_config: query with custom pipeline config Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake message flow smoke test Create tests/smoke/test_fake_message_flow.py: - TestFakeMessageFlow: factory verification tests - TestMessageFlowIntegration: minimal flow smoke test - Tests FakeApp, FakeProvider, FakePlatform, query factories - Verifies LANGBOT_FAKE_PONG marker response - Captures outbound messages for assertions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer test-quick command Add scripts/test-quick.sh and Makefile with: - test-quick: runs ruff check + unit tests + smoke tests - No real provider keys or platform accounts required - Suitable for local branch self-test Update tests/README.md: - Document test-quick command - Document test factories package - Add smoke tests and factories directory structure Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): make test-quick reliable as developer gate Fixes for D-001验收问题: 1. test-quick.sh: use set -euo pipefail, uv run ruff, no tail pipe 2. Remove unused imports in factories (app.py, platform.py, provider.py) 3. Fix unused variable in smoke test 4. Add noqa: E402 to test_n8nsvapi.py lazy imports 5. Update smoke test docs: "minimal fake flow" not full pipeline Now test-quick is a reliable gate: lint failures exit 1, test failures propagate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add preproc and taskmgr unit tests U-001: Pipeline Preprocessor tests - Normal text message processing - Empty message handling - Image segment with/without vision model - Model selection and fallback - Variable extraction U-004: Core Task Manager tests (pattern-based) - Task creation and tracking patterns - Task cancellation patterns - Scope-based cancellation - Task type filtering - Pruning completed tasks - Wait all tasks Taskmgr tests use pattern-based approach to avoid circular import in source code (taskmgr → app → http_controller → migration → taskmgr). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add config loader unit tests U-005: Config Loader tests - Valid YAML config loading - Valid JSON config loading - Invalid YAML/JSON error behavior - Missing config file creation from template - Template completion for missing keys - ConfigManager load/dump operations - Exists check for both YAML and JSON All tests use tmp_path fixture, no real project config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add chat and command handler pattern tests U-002: Chat Handler tests (pattern-based) - Normal message event emission pattern - prevent_default handling - User message alteration pattern - Runner selection pattern - Streaming/non-streaming response patterns - Exception handling modes (show-error, show-hint, hide) - Message history update pattern - Telemetry payload pattern U-003: Command Handler tests (pattern-based) - Command parsing and text extraction - Event creation pattern - Privilege/admin check pattern - Command result handling (text, error, image) - prevent_default handling - String truncation helper Uses pattern-based testing to avoid circular import issues in source code. Direct imports of handler modules trigger circular import chain. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: fix unused imports after ruff auto-fix Remove unused imports in test files: - test_config_loader.py: remove unused os - test_taskmgr.py: remove unused Mock - test_preproc.py: remove unused unsupported_query, image_chain Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): improve taskmgr tests to test real classes U-004 improved: Tests now import and test actual classes: - TaskContext: new(), trace(), to_dict(), placeholder() - TaskWrapper: task creation, context, exception/result capture, cancel, to_dict - AsyncTaskManager: create_task, create_user_task, cancel_task, cancel_by_scope - Task pruning behavior Uses pre-mocking technique: - Mock langbot.pkg.core.app before import (breaks circular chain) - Mock langbot.pkg.core.entities with proper Enum All 24 tests now test real class behavior, not patterns. taskmgr.py coverage should improve significantly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(test): consolidate FakeApp and add sys.modules isolation utility - Extract tests/utils/import_isolation.py with isolated_sys_modules context manager - Extend tests/factories/app.py FakeApp with handler-specific attributes - Refactor test_chat_handler.py to use centralized FakeApp and cached imports - Refactor test_command_handler.py with mock_execute_factory fixture - Refactor test_smoke.py to move import-time sys.modules manipulation into fixture - Add SQLite migration integration tests (G-002) - Add HTTP API smoke integration tests (G-005) - Update CI workflow to call pytest for SQLite migrations (G-004) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer quality gate consolidation (G-007) - Add scripts/test-integration-fast.sh for fast integration tests - Add scripts/test-coverage.sh with 12% baseline threshold - Update Makefile with test-integration-fast, test-coverage, test-all-local - Update CI workflow with integration and coverage jobs - Add smoke marker to pytest.ini - Update tests/README.md with quality gate layers documentation - Add tests/integration/pipeline/ for pipeline stage-chain tests Quality gate layers: - Quick: ruff + unit + smoke (~2 min) - Fast Integration: SQLite/API/Pipeline (~3 min) - Coverage: 12% threshold gate (~8 min) - Full Local: all three combined Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add PostgreSQL migration slow integration tests (G-003) - Add tests/integration/persistence/test_migrations_postgres.py - All tests marked with @pytest.mark.slow - Tests skip when TEST_POSTGRES_URL is not set (no local PostgreSQL) - Database isolation via clean_tables and clean_alembic_version fixtures - Update CI workflow to use pytest instead of inline Python script - Remove TODO(G-003) comment - Update tests/README.md with PostgreSQL test documentation Covered scenarios: - Baseline stamp sets revision - Upgrade from baseline to head - Upgrade idempotent - Get current on unstamped DB returns None Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): Phase 1.5 coverage expansion - COV-001 to COV-013 Coverage baseline raised from 13.65% to 26% (+12.35%) Gate raised from 12% to 18% Tasks completed: - COV-001: Command system unit tests (100% coverage) - COV-002: API service unit tests batch 1 (user/apikey/model/provider) - COV-003: Provider model manager unit tests - COV-004: Pipeline remaining stage tests (aggregator/cntfilter/longtext/msgtrun) - COV-005: Storage and utils coverage pass - COV-006: Gate ratchet 12%→15% - COV-007: Gate ratchet 15%→18% - COV-008: API service batch 2 (bot/pipeline/webhook/space/maintenance/mcp) - COV-009: Blocked - API controller circular import issue documented - COV-010: Plugin runtime unit tests (+0.08%) - COV-011: RAG and vector unit tests (+0.68%) - COV-012: Core boot and migration unit tests - COV-013: Provider requester logic unit tests (+0.62%) Key additions: - tests/utils/import_isolation.py: sys.modules isolation for circular imports - Provider requester mock tests: proved HTTP-dependent code can be tested locally - Vector filter utilities: 100% coverage on pure functions - API services: fake persistence pattern for unit testing Blocked issue COV-009 documented in langbot-test-plan/1.5/issues/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase1): add unit tests for telemetry, plugin, rag, persistence Add initial unit tests for Phase 1 of test coverage improvement: - telemetry: test initialization, payload sanitization, early returns (14.3% → 62.9%) - plugin: test _parse_plugin_id static method - rag: test _to_i18n_name static method - persistence: test serialize_model with datetime handling Overall core coverage: 41.9% → 42.2% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase2): add unit tests for core, persistence, plugin, utils - Add test_handler_helpers.py for plugin handler helpers (7 tests) - Add test_mgr_methods.py for persistence manager (5 tests) - Add test_app_config_validation.py for core app config (12 tests) - Add test_knowledge_service.py for API knowledge service (22 tests) - Add test_kbmgr.py for RAG knowledge base manager (39 tests) - Add test_survey_manager.py for survey manager (22 tests) - Add test_connector_methods.py for plugin connector (24 tests) - Add test_funcschema.py for utils function schema (9 tests) - Add test_platform.py for utils platform detection (7 tests) - Add test_extract_deps.py for plugin deps extraction (7 tests) - Add test_database_decorator.py for persistence decorator (7 tests) - Add test_load_config.py for core config loading (19 tests) - Add COVERAGE_EXCLUSIONS.md documenting external adapter exclusions - Fix test_chat_session_limit.py path for portability Coverage: core 28% → 30%, persistence 24% → 24.4%, plugin 27% → 28% Total: 1082 tests passed, core module coverage 45.5% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add API controller integration tests - Add test_pipelines.py (10 tests) covering pipelines CRUD operations - GET/POST/PUT/DELETE on /api/v1/pipelines - Extensions endpoint - Metadata endpoint - Coverage: pipelines controller 27% → 80% - Add test_providers.py (10 tests) covering provider/model management - Provider CRUD with model counts - LLM model CRUD - Coverage: providers controller 23% → 81%, models 29% → 45% Tests use Quart TestClient with mocked services for real HTTP behavior without external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add knowledge, bots, and model endpoints tests - Add test_knowledge.py (10 tests) covering knowledge base management - CRUD operations on /api/v1/knowledge/bases - Files management endpoints - Retrieve endpoint with validation - Coverage: knowledge/base.py 26% → 91% - Add test_bots.py (9 tests) covering bot management - CRUD operations on /api/v1/platform/bots - Logs endpoint - Send message endpoint with validation - Coverage: platform/bots.py 24% → 87% - Extend test_providers.py (+4 tests) for embedding/rerank models - Embedding models CRUD - Rerank models CRUD - Coverage: provider/models.py 29% → 60% Total integration tests: 53 (smoke 12 + pipelines 10 + providers 14 + knowledge 10 + bots 9) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add embed and monitoring endpoint tests Add integration tests for embed widget and monitoring API endpoints: - test_embed.py: 15 tests for widget.js, logo, turnstile, messages, reset, feedback - test_monitoring.py: 15 tests for overview, messages, llm-calls, sessions, errors, export Coverage improvements: - embed.py: 17% → 56% - monitoring.py: 17% → 93% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(e2e): add minimal startup E2E tests Add E2E tests for LangBot startup flow: - tests/e2e/utils/config_factory.py: minimal config generation - tests/e2e/utils/process_manager.py: LangBot subprocess management - tests/e2e/conftest.py: E2E fixtures (session-scoped process) - tests/e2e/test_startup.py: 12 tests for startup verification Tests verify: - boot.py + stages execution - database initialization (SQLite) - API availability - migrations applied Uses embedded databases (SQLite, Chroma) - no external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(quality): fix fake tests and add missing coverage P0 fixes: - telemetry: rewrite fake tests with real behavior verification (25 tests) - config: delete copied-source tests, use proper imports (2 deleted) - persistence: fix try-except pass to verify specific errors P1 fixes: - pipeline: add real FixedWindowAlgo tests instead of mocks (12 tests) - provider: add SessionManager and ToolManager tests (25 tests) - storage: add S3StorageProvider tests with moto mock (16 tests) - plugin: add handler action tests for setting inheritance (15 tests) - rag: add file storage and ZIP processing tests (21 tests) - vector: add VDB filter conversion tests (30 tests) P2 fixes: - pipeline/msgtrun: strengthen assertions for exact message count - api: add response structure validation in integration tests New test files: - provider/test_session_manager.py - provider/test_tool_manager.py - storage/test_s3storage.py - plugin/test_handler_actions.py - rag/test_file_storage.py - vector/test_vdb_filter_conversion.py Source code bugs documented: - provider: TokenManager.next_token() ZeroDivisionError - telemetry: send_tasks class variable shared state - command: empty command IndexError, unused parameters - utils: funcschema KeyError - entity: vector.py independent declarative_base Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(test): update coverage stats and test structure - Update coverage from 22% to 30% - Add new test files to structure: - provider: session_manager, tool_manager - storage: s3storage - plugin: handler_actions - rag: file_storage - vector: vdb_filter_conversion - telemetry: rewritten tests - Update module coverage percentages Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: add 105 new unit tests for untested core functionality Add comprehensive tests for B-class issues (core functionality untested): Pipeline: - test_pool.py: QueryPool ID generation, caching, async context (12 tests) - test_ratelimit.py: Fixed timing-sensitive test tolerance - test_pipelinemgr.py: Use real Pydantic StageProcessResult instead of Mock Utils: - test_version.py: Version comparison functions (20 tests) - test_logcache.py: Log page management and retrieval (18 tests) - test_httpclient.py: HTTP session pool management (10 tests) - test_proxy.py: Proxy configuration from env and config (10 tests) - test_image.py: URL parsing and base64 extraction (12 tests) - test_pkgmgr.py: Pip command generation (8 tests) Discover: - test_engine.py: I18nString, Metadata, Component manifest (15 tests) Test count: 1193 → 1298 (+105 tests) Note: Some B-class issues cannot be tested due to circular import bugs filed as GitHub issues #2175 (pipeline) and #2176 (persistence). * test: tighten phase 1 coverage contracts * test: align ci integration isolation --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-16 12:05:54 +08:00
huanghuoguoguo	885320e9ae	fix(utils): preserve QQ image URL scheme (#2188 )	2026-05-16 11:29:31 +08:00
huanghuoguoguo	ed02ac4710	fix(utils): classify runner URLs safely (#2191 ) * fix(utils): classify runner URLs safely * fix(utils): keep runner parse failures unknown	2026-05-16 11:28:34 +08:00
huanghuoguoguo	e4841edbaf	fix pkgmgr install requirements default (#2190 )	2026-05-16 11:26:49 +08:00
huanghuoguoguo	0a669c7016	fix(utils): handle missing funcschema parameter docs (#2186 )	2026-05-16 11:20:32 +08:00

6 Commits