Feat/sandbox (#2072)

* feat: add mcp and skills

* feat: add filter

* feat: modify frontend

* feat(box): add sandbox_exec tool loop for local-agent calculations

* feat(box): add host workspace mounting and sandbox_exec guidance

* feat(box): add BoxProfile with resource limits and improved output truncation

  - Implement head+tail output truncation (60/40 split) so LLM sees both
    beginning and final results; add streaming byte-limited reads in backend
    to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
  - Define BoxProfile model with locked fields and max_timeout_sec clamping
  - Add four built-in profiles: default, offline_readonly, network_basic,
    network_extended with differentiated resource and security constraints
  - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
    read_only_rootfs) and pass corresponding container CLI flags
    (--cpus, --memory, --pids-limit, --read-only, --tmpfs)
  - Profile loaded from config (box.profile), applied in service layer
    before BoxSpec validation; locked fields cannot be overridden by
    tool-call parameters

* feat(box): add obs

* refactor(box): unify box service lifecycle and local runtime
  management

* refactor(box): remove legacy in-process runtime code and clean up smells

After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:

- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
  production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
  default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
  access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
  tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
  `box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
  (auto-detected by code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add test

* fix: fix box intergration test

* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security

  ## Summary

  When Podman/Docker is available, all stdio-mode MCP servers now automatically
  run inside Box containers with dependency installation, path rewriting, and
  lifecycle management. When no container runtime exists, LangBot starts normally
  and stdio MCP falls back to host-direct execution.

  ## What changed

  ### MCP stdio → Box integration (mcp.py)
  - Add `MCPServerBoxConfig` pydantic model for structured box configuration
    with validation and defaults (network, host_path_mode, timeouts, resources)
  - Auto-infer `host_path` from command/args with venv detection: recognizes
    `.venv/bin/python` patterns and walks up to the project root
  - Rewrite host paths to container `/workspace` paths transparently
  - Replace venv python commands with container-native `python`
  - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
    `pip install` inside the container before starting the MCP server
  - Copy project to `/tmp` before install to handle read-only mounts
  - Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
  - Add Box managed process health monitoring (poll every 5s)
  - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
    block of `_lifecycle_loop`, covering all exit paths
  - Fix retry logic: `_ready_event` is only set after all retries exhaust
    or on success, not on first failure
  - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`

  ### Box security (security.py — new)
  - `validate_sandbox_security()` blocks dangerous host paths:
    `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
    docker.sock, podman socket
  - Called at the start of `CLISandboxBackend.start_session()`

  ### Box models (models.py)
  - Add `BoxHostMountMode.NONE` — skips volume mount entirely
  - Adjust `validate_host_mount_consistency` to allow arbitrary workdir
    when `host_path_mode=NONE`

  ### Box backend (backend.py)
  - Add `validate_sandbox_security()` call in `start_session()`
  - Add `langbot.box.config_hash` label on containers for drift detection
  - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
  - Add `cleanup_orphaned_containers()` to base class (no-op default) and
    CLI implementation (single batched `rm -f` command)

  ### Box runtime (runtime.py)
  - Call `cleanup_orphaned_containers()` during `initialize()` to remove
    lingering containers from previous runs

  ### Box service (service.py)
  - Graceful degradation: `initialize()` catches runtime errors and sets
    `available=False` instead of crashing LangBot startup
  - Add `available` property and guard on `execute_sandbox_tool()`
  - Add `skip_host_mount_validation` parameter to `build_spec()` and
    `create_session()` — MCP paths are admin-configured and trusted,
    bypassing `allowed_host_mount_roots` restrictions meant for
    LLM-generated sandbox_exec commands

  ### Default behavior
  - stdio MCP servers automatically use Box when `box_service.available`
    is True (Podman/Docker detected); no explicit `box` config needed
  - When no container runtime exists, falls back to host-direct stdio
  - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
    (for site-packages), `host_path_mode=ro`, `startup_timeout=120s`

  ### Tests
  - `test_box_security.py`: blocked paths, safe paths, subpath rejection
  - `test_mcp_box_integration.py`: config model, path rewriting, venv
    unwrap, host_path inference, payload building, runtime info, box
    availability check
  - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests

* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests

  ## Changes

  ### Precise orphan container cleanup
  - Runtime generates a unique instance_id on startup
  - Every container gets a `langbot.box.instance_id` label
  - `cleanup_orphaned_containers()` only removes containers from
    previous instances, preserving containers owned by the current one
  - Containers from older versions (no label) are also cleaned up
  - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
    a no-op default method, removing hasattr duck-typing

  ### Fine-grained MCP error classification
  - New `MCPSessionErrorPhase` enum with 7 phases: session_create,
    dep_install, process_start, relay_connect, mcp_init, runtime,
    tool_call
  - Each phase in `_init_box_stdio_server()` sets the error phase
    before re-raising, enabling precise failure diagnosis
  - `retry_count` tracked across retry attempts
  - `get_runtime_info_dict()` exposes `error_phase` and `retry_count`

  ### GET /v1/sessions/{id} API
  - `BoxRuntime.get_session()` returns session details including
    managed process info when present
  - `handle_get_session` HTTP handler + route in server.py
  - `BoxRuntimeClient.get_session()` abstract method + remote impl

  ### stdio defaults to Box when runtime is available
  - `_uses_box_stdio()` checks `box_service.available` instead of
    requiring explicit `box` key in server_config
  - `BoxService.initialize()` catches runtime errors gracefully,
    sets `available=False` instead of crashing LangBot startup
  - When no container runtime exists, stdio MCP falls back to
    host-direct execution

  ### Code quality (from /simplify review)
  - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
  - Removed dead `_box_network_mode()` method and unused `bc` variable
  - Fixed broken import `from ....box.models` → `from ...box.models`
  - Cached `_resolve_host_path()` result — computed once, passed through
  - Config hash now includes `host_path` field
  - Batched orphan cleanup into single `rm -f` command

  ### Session leak fix
  - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
    finally block, covering all exit paths (normal shutdown, error,
    retry, final failure)

  ### Integration tests
  - 6 end-to-end tests covering managed process lifecycle, WebSocket
    stdio bidirectional IO, session cleanup verification, single
    session query, process exit detection, and orphan cleanup safety

* refactor: use rpc

* fix: import

* refactor(box): clean up sandbox subsystem code quality and efficiency

  - Fix O(n²) stderr trimming in runtime.py with running length tracker
  - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
    unused config_hash computation, unused imports
  - Deduplicate connection callback in BoxRuntimeConnector, parse URL once
  - Use enum comparison instead of stringly-typed spec.network.value check
  - Replace manual _result_to_dict/_session_to_dict with model_dump()
  - Cache NativeToolLoader tool definition and sandbox system guidance
  - Extract _is_path_under() helper to eliminate duplicated path checks
  - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
  - Add JSON startswith guard in logging_utils to skip futile json.loads
  - Fix ruff lint errors (F401 unused imports, F841 unused variables)

* fix: ruff

* refactor(sandbox): keep box logic out of pipeline and localagent

  - Move sandbox system-prompt guidance from LocalAgentRunner into
    BoxService.get_system_guidance() so all box domain knowledge stays
    in the box module.
  - Remove standalone logging_utils.py; merge format_result_log() into
    MessageHandler base class alongside cut_str().
  - Strip sandbox-specific JSON parsing from log formatting; tool
    results now use generic truncation.
  - Revert TYPE_CHECKING changes in stage.py and runner.py that were
    unrelated to this feature.
  - Skip two test files affected by a pre-existing circular import
    (runner ↔ app) until the import cycle is resolved in a separate PR.

* fix: ruff

* refactor(box): move box runtime to langbot-plugin-sdk

  Extract self-contained box runtime modules (actions, backend, client,
  errors, models, runtime, security, server) to langbot-plugin-sdk and
  update all imports to use `langbot_plugin.box.*`. Keep only service
  and
  connector in LangBot core as they depend on the Application context.

  - Update docker-compose to use `langbot_plugin.box.server` entry
  point
  - Update pyproject.toml to use local SDK via `tool.uv.sources`
  - Remove migrated source files and their unit/integration tests
  - Update remaining test imports to match new module paths

* fix: ruff

* feat: enhance sandbox api

* refactor(box): derive paths from shared host root

* fix(box): tighten sandbox exposure and restore box integration coverage

* refactor(types): remove quoted annotations under postponed evaluation

* feat(box): unify native agent tools around exec/read/write/edit

* chore(sandbox): move MCP loader changes to follow-up branch

* feat(box): add session workspace quota enforcement and SDK quota metadata

* feat(skills): add Agent Skills management system (#1917)

* feat(skills): add Agent Skills management system

Implement comprehensive skills management feature inspired by agentskills spec:

Backend:
- Add Skill and SkillPipelineBinding database entities
- Add database migration (dbm018) for skills tables
- Implement SkillManager for skill loading, matching, and resolution
- Implement SkillService for CRUD operations
- Add skills API endpoints for skill and pipeline binding management
- Integrate skill index injection into pipeline preprocessor
- Add skill activation detection in LocalAgentRunner

Frontend:
- Add Skills page with listing, search, and type filter
- Add SkillDetailDialog for create/edit with preview
- Add SkillCard and SkillForm components
- Add skills API methods to BackendClient
- Add skills entry to sidebar navigation
- Add i18n translations (en-US, zh-Hans)

Features:
- Support skill and workflow types
- Sub-skill composition via {{INVOKE_SKILL: name}} syntax
- Progressive disclosure (index in prompt, full instructions on activation)
- Pipeline-specific skill bindings with priority

* fix: resolve cherry-pick conflicts for agentskills onto sandbox

- Remove non-existent external_kb service import
- Add skill_mgr mock to localagent sandbox_exec tests
- Keep database version at 24 (sandbox branch's latest)

* feat(skills): upgrade to package-backed skills with sandbox execution

  Evolve the skills system from pure prompt-based to package-backed with
  sandbox tool execution support:

  - Add source_type/package_root/entry_file/skill_tools fields to Skill entity
  - SkillManager loads SKILL.md from local package directories
  - SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped)
  - LocalAgent injects skill tools into use_funcs on skill activation
  - BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params)
  - Skill tool names auto-namespaced as skill__{skill}__{tool}
  - API validation for package_root allowlist and entry path traversal
  - Frontend source_type toggle, package_root input, skill_tools editor
  - Migration renumbered to 025 with ALTER TABLE fallback for existing DBs
  - Fix unclosed limitation section in i18n files
  - Fix skills API methods misplaced outside BackendClient class

* fix: test info

* feat(skills): switch skills to package-backed storage and add import tooling
  - skills 从 inline/package 双轨收敛成 package-first
  - instructions 改为写入并读取 SKILL.md
  - 新增本地目录扫描和 GitHub 安装 skill
  - 前端把 skills 整合进 plugins 页,新增 SkillsComponent 和 GitHub 导入弹窗
  - skill form 去掉 source_type / type 筛选,改成目录扫描驱动
  - Box skill tool 挂载模式从 ro 改成 rw
  - 测试和中英文文案同步更新

* feat: simplify langbot skill create and import

* refactor(skills): clean up legacy skill API and harden activation flow

* refactor(skills): remove skill dependency expansion and add skill_get

* fix: lint

* fix: delete

* fix(skills): align tool manager loader initialization

* refactor: remove sandbox execute skill

* fix(skills): hide activation markers and isolate skill activation flow

* refactor(skills): switch skill model to filesystem-backed packages

* refactor(skills): switch skill model to filesystem-backed packages

* refactor(skills): unify runtime skill access around filesystem paths

* refactor(skills): unify runtime skill access around filesystem paths

* feat(skills): align rw package design and fix skill activation, visibility, and lint issues

* refactor(skills): replace rich authoring API with import/reload flow and update
  Box design doc

* feat(box): add sandbox_exec tool loop for local-agent calculations

* feat(box): add host workspace mounting and sandbox_exec guidance

* feat(box): add BoxProfile with resource limits and improved output truncation

  - Implement head+tail output truncation (60/40 split) so LLM sees both
    beginning and final results; add streaming byte-limited reads in backend
    to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
  - Define BoxProfile model with locked fields and max_timeout_sec clamping
  - Add four built-in profiles: default, offline_readonly, network_basic,
    network_extended with differentiated resource and security constraints
  - Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
    read_only_rootfs) and pass corresponding container CLI flags
    (--cpus, --memory, --pids-limit, --read-only, --tmpfs)
  - Profile loaded from config (box.profile), applied in service layer
    before BoxSpec validation; locked fields cannot be overridden by
    tool-call parameters

* feat(box): add obs

* refactor(box): unify box service lifecycle and local runtime
  management

* refactor(box): remove legacy in-process runtime code and clean up smells

After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:

- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
  production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
  default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
  access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
  tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
  `box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
  (auto-detected by code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security

  ## Summary

  When Podman/Docker is available, all stdio-mode MCP servers now automatically
  run inside Box containers with dependency installation, path rewriting, and
  lifecycle management. When no container runtime exists, LangBot starts normally
  and stdio MCP falls back to host-direct execution.

  ## What changed

  ### MCP stdio → Box integration (mcp.py)
  - Add `MCPServerBoxConfig` pydantic model for structured box configuration
    with validation and defaults (network, host_path_mode, timeouts, resources)
  - Auto-infer `host_path` from command/args with venv detection: recognizes
    `.venv/bin/python` patterns and walks up to the project root
  - Rewrite host paths to container `/workspace` paths transparently
  - Replace venv python commands with container-native `python`
  - Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
    `pip install` inside the container before starting the MCP server
  - Copy project to `/tmp` before install to handle read-only mounts
  - Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
  - Add Box managed process health monitoring (poll every 5s)
  - Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
    block of `_lifecycle_loop`, covering all exit paths
  - Fix retry logic: `_ready_event` is only set after all retries exhaust
    or on success, not on first failure
  - Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`

  ### Box security (security.py — new)
  - `validate_sandbox_security()` blocks dangerous host paths:
    `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
    docker.sock, podman socket
  - Called at the start of `CLISandboxBackend.start_session()`

  ### Box models (models.py)
  - Add `BoxHostMountMode.NONE` — skips volume mount entirely
  - Adjust `validate_host_mount_consistency` to allow arbitrary workdir
    when `host_path_mode=NONE`

  ### Box backend (backend.py)
  - Add `validate_sandbox_security()` call in `start_session()`
  - Add `langbot.box.config_hash` label on containers for drift detection
  - Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
  - Add `cleanup_orphaned_containers()` to base class (no-op default) and
    CLI implementation (single batched `rm -f` command)

  ### Box runtime (runtime.py)
  - Call `cleanup_orphaned_containers()` during `initialize()` to remove
    lingering containers from previous runs

  ### Box service (service.py)
  - Graceful degradation: `initialize()` catches runtime errors and sets
    `available=False` instead of crashing LangBot startup
  - Add `available` property and guard on `execute_sandbox_tool()`
  - Add `skip_host_mount_validation` parameter to `build_spec()` and
    `create_session()` — MCP paths are admin-configured and trusted,
    bypassing `allowed_host_mount_roots` restrictions meant for
    LLM-generated sandbox_exec commands

  ### Default behavior
  - stdio MCP servers automatically use Box when `box_service.available`
    is True (Podman/Docker detected); no explicit `box` config needed
  - When no container runtime exists, falls back to host-direct stdio
  - MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
    (for site-packages), `host_path_mode=ro`, `startup_timeout=120s`

  ### Tests
  - `test_box_security.py`: blocked paths, safe paths, subpath rejection
  - `test_mcp_box_integration.py`: config model, path rewriting, venv
    unwrap, host_path inference, payload building, runtime info, box
    availability check
  - `test_box_service.py`: `BoxHostMountMode.NONE` validation tests

* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests

  ## Changes

  ### Precise orphan container cleanup
  - Runtime generates a unique instance_id on startup
  - Every container gets a `langbot.box.instance_id` label
  - `cleanup_orphaned_containers()` only removes containers from
    previous instances, preserving containers owned by the current one
  - Containers from older versions (no label) are also cleaned up
  - `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
    a no-op default method, removing hasattr duck-typing

  ### Fine-grained MCP error classification
  - New `MCPSessionErrorPhase` enum with 7 phases: session_create,
    dep_install, process_start, relay_connect, mcp_init, runtime,
    tool_call
  - Each phase in `_init_box_stdio_server()` sets the error phase
    before re-raising, enabling precise failure diagnosis
  - `retry_count` tracked across retry attempts
  - `get_runtime_info_dict()` exposes `error_phase` and `retry_count`

  ### GET /v1/sessions/{id} API
  - `BoxRuntime.get_session()` returns session details including
    managed process info when present
  - `handle_get_session` HTTP handler + route in server.py
  - `BoxRuntimeClient.get_session()` abstract method + remote impl

  ### stdio defaults to Box when runtime is available
  - `_uses_box_stdio()` checks `box_service.available` instead of
    requiring explicit `box` key in server_config
  - `BoxService.initialize()` catches runtime errors gracefully,
    sets `available=False` instead of crashing LangBot startup
  - When no container runtime exists, stdio MCP falls back to
    host-direct execution

  ### Code quality (from /simplify review)
  - Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
  - Removed dead `_box_network_mode()` method and unused `bc` variable
  - Fixed broken import `from ....box.models` → `from ...box.models`
  - Cached `_resolve_host_path()` result — computed once, passed through
  - Config hash now includes `host_path` field
  - Batched orphan cleanup into single `rm -f` command

  ### Session leak fix
  - `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
    finally block, covering all exit paths (normal shutdown, error,
    retry, final failure)

  ### Integration tests
  - 6 end-to-end tests covering managed process lifecycle, WebSocket
    stdio bidirectional IO, session cleanup verification, single
    session query, process exit detection, and orphan cleanup safety

* refactor: use rpc

* fix: import

* refactor(box): clean up sandbox subsystem code quality and efficiency

  - Fix O(n²) stderr trimming in runtime.py with running length tracker
  - Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
    unused config_hash computation, unused imports
  - Deduplicate connection callback in BoxRuntimeConnector, parse URL once
  - Use enum comparison instead of stringly-typed spec.network.value check
  - Replace manual _result_to_dict/_session_to_dict with model_dump()
  - Cache NativeToolLoader tool definition and sandbox system guidance
  - Extract _is_path_under() helper to eliminate duplicated path checks
  - Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
  - Add JSON startswith guard in logging_utils to skip futile json.loads
  - Fix ruff lint errors (F401 unused imports, F841 unused variables)

* fix: ruff

* refactor(sandbox): keep box logic out of pipeline and localagent

  - Move sandbox system-prompt guidance from LocalAgentRunner into
    BoxService.get_system_guidance() so all box domain knowledge stays
    in the box module.
  - Remove standalone logging_utils.py; merge format_result_log() into
    MessageHandler base class alongside cut_str().
  - Strip sandbox-specific JSON parsing from log formatting; tool
    results now use generic truncation.
  - Revert TYPE_CHECKING changes in stage.py and runner.py that were
    unrelated to this feature.
  - Skip two test files affected by a pre-existing circular import
    (runner ↔ app) until the import cycle is resolved in a separate PR.

* refactor(box): move box runtime to langbot-plugin-sdk

  Extract self-contained box runtime modules (actions, backend, client,
  errors, models, runtime, security, server) to langbot-plugin-sdk and
  update all imports to use `langbot_plugin.box.*`. Keep only service
  and
  connector in LangBot core as they depend on the Application context.

  - Update docker-compose to use `langbot_plugin.box.server` entry
  point
  - Update pyproject.toml to use local SDK via `tool.uv.sources`
  - Remove migrated source files and their unit/integration tests
  - Update remaining test imports to match new module paths

* fix: ruff

* fix(box): tighten sandbox exposure and restore box integration coverage

* refactor(types): remove quoted annotations under postponed evaluation

* chore(sandbox): move MCP loader changes to follow-up branch

* refactor(plugins): simplify GitHub install flow to default master archive

* revert(api): restore plugin GitHub import flow in plugins controller

* Improve data-root handling and skill install previews

* Add managed skill authoring tools for local agents

* Refactor the skills UI around sidebar detail pages

* Document why managed skill authoring tools bypass box

* fix: lint

* feat(web): refactor plugin/skill install flows and fix skills page

- Fix sidebar skill icon
- Add skills route and error page component
- Refactor plugin GitHub install from dialog modal to inline card
- Add skill install dropdown menu (create/upload/github) in sidebar
- Wire sidebar → skills page communication via pendingSkillInstallAction context
- Add i18n keys for error page and skill install actions

* fix(web): persist sidebar collapsible section open state on navigation

Sections opened via sub-item navigation now retain their expanded state
when the user switches to a different section, instead of collapsing
because the isActive fallback becomes false.

---------

Co-authored-by: youhuanghe <1051233107@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>

* feat(sandbox): add MCP box integration on top of sandbox base (#2083)

* refactor(mcp): extract box stdio runtime helper

* refactor(box): introduce reusable workspace session helper

* refactor(box): run Box Runtime as subprocess inside LangBot container

  Remove the separate langbot_box_runtime Docker service. Box Runtime
  now always launches as a local stdio subprocess, regardless of whether
  LangBot runs in Docker or not. The WebSocket transport path is kept
  only for explicit runtime_url configuration (remote deployment).

  This simplifies deployment by eliminating cross-container path mapping
  and network hops. Box Runtime is a pure scheduling process (talks to
  Docker socket / nsjail), it does not execute user code or touch the
  filesystem, so container isolation is unnecessary — unlike Plugin
  Runtime.

* fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor

When switching runner (e.g. local-agent → n8n), the newly mounted stage's
first emit would re-capture the saved snapshot, erasing the dirty state
caused by the runner change. The save button would incorrectly go dim.

- Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty
- Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent)
- Use stable onSubmitRef to prevent useEffect subscription churn
- Add previousInitialValues guard to prevent initialValues echo loops

* style(web): align plugin list header button heights

* docs(review): update Box architecture review documents

Replace old review docs with 5 focused documents:
- box-architecture.md: deep architecture analysis (LangBot + SDK)
- box-issues.md: 22 issues rated P0/P1/P2
- box-test-coverage.md: test coverage analysis
- box-tob-analysis.md: toB commercialization analysis
- box-vs-plugin-runtime.md: Box vs Plugin runtime comparison

* feat(web): improve login error layout and add Terms of Service link

- Improve backend connection error display with bordered container,
  inline icon, and better visual hierarchy
- Extract actual error message from axios response object
- Add Terms of Service link (https://langbot.app/terms) to login footer
- Add termsOfService i18n key for all 7 locales

* refactor(web): replace all hardcoded SVG icons with lucide-react

Unify icon usage across the entire frontend by replacing 67 hardcoded
SVG icons with lucide-react components across ~25 files. This improves
consistency, maintainability, and reduces bundle duplication.

Key replacements:
- Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc.
- MCP forms: Loader2, XCircle, Trash2
- Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc.
- Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc.
- Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile

Zero hardcoded <svg> tags remain in .tsx files.

* fix(web): stop polling plugin tasks when no active installs

The PluginInstallTaskProvider was unconditionally polling
getAsyncTasks every 3s on all /home/* routes. Now it only
syncs once on mount and starts periodic polling only when
there are active (non-terminal) install tasks.

* fix(deps): update langbot-plugin version and add new dependencies

* refactor: use Space API for release checks and stop idle polling

- version.py: switch release list API from GitHub to space.langbot.app,
  remove unused in-place update logic (update_all, compare_version_str),
  translate all comments/logs to English
- PluginInstallTaskContext: only poll when active install tasks exist

* feat(box): add --standalone-box flag and 3-way transport decision for Box runtime

Align Box runtime connection logic with Plugin runtime's pattern:
- Docker: WebSocket to langbot_box container (ws://langbot_box:5411)
- --standalone-box: WebSocket to external Box process (ws://localhost:5411)
- Windows: subprocess + WebSocket (workaround for async stdio limitation)
- Unix/macOS: subprocess + stdio pipe (unchanged)

BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess
lifecycle reuse. Add langbot_box service to docker-compose.yaml.

* refactor(box): use single port with path-based routing for Box WS

Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411.
Update review docs to reflect the single-port architecture.

* feat(web): show Box runtime status in plugin debug info popover

Add Box status section to the debug info popover on the plugin list page,
displaying connection status, backend info, profile, active sessions,
and recent error count. Fetched from GET /api/v1/box/status in parallel
with plugin debug info. Includes i18n for all 8 supported languages.

* fix(web): remove ephemeral sandbox count from Box status display

The active_sessions count reflects transient sandbox containers that
expire after 5 minutes of inactivity, making it misleading in the UI.
Keep only connection status, backend, profile, and error count.

* feat(box): configurable sandbox scope and unified skill containers

Replace the per-message session_id with a template-based system
configurable per pipeline via 'Sandbox Scope' in the local-agent panel.
Default scope is per-chat ({launcher_type}_{launcher_id}).

Unify skill exec into the same container as default exec — skills are
mounted at /workspace/.skills/{name}/ via extra_mounts instead of
getting separate containers. All pipeline-bound skills are injected
at container creation time.

- Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages)
- Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService
- Rewrite native.py skill exec path to use execute_tool with shared session
- Update tests for new session_id format
- Add design doc: docs/review/box-session-scope.md

* feat(web): show active sandbox details in Box status popover

Display sandbox count and a detailed list of active sessions including
session ID, image, backend, resources (CPU/memory), network mode, and
last used time. Fetched from GET /api/v1/box/sessions in parallel.
Includes i18n for all 8 supported languages.

* feat(box): add startup and availability logging for sandbox tools

Log Box runtime initialization result (success with profile info, or
failure warning). Log native tool availability status at ToolManager
startup so it's immediately clear whether exec/read/write/edit tools
are registered for the LLM.

* feat(box): support custom sandbox container image via config.yaml

Add 'image' field to box config section. When set, it overrides the
profile default image (python:3.11-slim) for all sandbox containers.
Priority: caller-specified > config.yaml image > profile default.

* feat(box): add heartbeat and reconnection for Box runtime connector

Add 20-second heartbeat ping loop to detect silent Box runtime
disconnections. On disconnect, set available=false and attempt
reconnection after 3 seconds via the disconnect callback chain.

- BoxRuntimeConnector: heartbeat loop, disconnect callback parameter,
  disconnect detection in connection callback and WS failure handler
- BoxService: wire disconnect callback to toggle available state and
  re-initialize the connector on reconnection

* feat(web): move runtime status to dashboard, clean up plugin debug popover

Add SystemStatusCards component to the monitoring dashboard showing
Plugin Runtime and Box Runtime connection status with details (backend,
profile, sandbox count). Remove all Box/session status from the plugin
page debug popover — it now only shows debug URL and key.

Includes i18n for all 8 supported languages.

* refactor(web): compact system status into a single card alongside metrics

Replace the separate two-card row with a single compact 'System Status'
card placed as the 5th column in the metrics grid. Shows green/red dots
for Plugin Runtime and Box Runtime. Click to expand a popover with
connection details (backend, profile, sandbox count).

* feat: show connector error details for Plugin and Box runtime status

Record Box connector error in BoxService and expose it as
'connector_error' in GET /api/v1/box/status when unavailable.
Display error messages in the dashboard System Status popover
for both Plugin Runtime (plugin_connector_error) and Box Runtime
(connector_error) when they are disconnected.

* fix(web): auto-refresh system status and show disconnect errors in real time

Poll Plugin Runtime and Box Runtime status every 30 seconds so the
dashboard reflects disconnections without a manual page refresh.
Also re-fetch when the popover is opened for immediate feedback.

* fix(box): handle RPC failure in get_status/get_sessions gracefully

When the Box runtime disconnects, there is a race between the heartbeat
flipping _available=false and the frontend polling get_status(). If the
poll arrives first, client.get_status() throws a ConnectionClosedError
which propagated as a 500, causing the frontend to show a grey dot
(null status) instead of a red dot with error details.

Now get_status() catches RPC errors and returns available=false with
the exception message as connector_error. get_sessions() returns an
empty list when unavailable or on RPC failure.

* fix(box): add persistent reconnection loop with exponential backoff

The previous disconnect handler only retried once and then gave up.
Now spawns a background task that retries with exponential backoff
(3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again.
Uses a _reconnecting guard to prevent duplicate loops. Calls
connector.dispose() before each retry to clean up stale tasks.

* fix(box): detect disconnect when handler.run() returns normally

The generic Handler.run() catches ConnectionClosedError and breaks out
of its loop (normal return) instead of raising, because it has no
disconnect_callback. The old code only triggered reconnection in the
except branch, so a clean WebSocket close was never detected.

Now treat handler.run() returning normally (after successful handshake)
as a disconnect event, triggering the reconnection callback.

* fix(web): refresh system status card when clicking Refresh Data button

Pass a refreshKey prop through OverviewCards to SystemStatusCard that
increments on each Refresh Data click, triggering a re-fetch of Plugin
and Box runtime status alongside the monitoring data refresh.

* fix(web): fix system status card stuck in loading state

fetchStatus(showLoading=false) never called setLoading(false), so the
initial loading=true was never cleared. Simplify to always setLoading
in the finally block — the spinner only shows on the very first load
since subsequent fetches complete near-instantly.

* feat(web): show active sandbox details in dashboard Box status popover

Fetch box sessions alongside status and display each active sandbox
in the popover with session ID, image, resources (CPU/memory), and
last used time.

* feat(box): add global sandbox scope option

Add a 'Global (shared by all)' option to the sandbox scope selector.
Uses a constant '{global}' template variable that always resolves to
'global', so all users and chats share one sandbox container.

* refactor(web): replace popover with dialog for system status details

Replace the dropdown popover with a proper Dialog for runtime status
details. Add a small info button on the System Status card that opens
the dialog. Session details now show in a spacious 2-column grid layout
with full image name, backend, CPU/memory, network, mount path, and
created/last-used timestamps.

* fix(web): widen system status dialog and fix scroll border issue

Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move
overflow-y-auto to an inner container with overflow-hidden on
DialogContent to prevent padding bleed at scroll edges.

* feat(web): add tooltips for truncated fields in system status dialog

Wrap session_id, image, and mount path fields with Tooltip components
so hovering over truncated text shows the full value.

* feat: add download button

* feat: successfully install

* feat: delete old filter

* feat: youhua frontend

* fix: align box runtime launch args

* feat: translate

* feat: refactor market

* feat: youhua qianduan

* chore: rename extension zh translation

* feat(extensions): unify extensions endpoint and refresh extensions page UX

- Rename /home/plugins route to /home/extensions and update all sidebar links.
- Add unified GET /api/v1/extensions returning plugins, MCP servers and skills,
  sorted by name; replace the three separate frontend fetches with this single call.
- Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/
  Switch/Label) and clean up hardcoded color tokens on the extension card.
- Add a localStorage-persisted "Group by type" switch that, when enabled in the
  All Types tab, renders extensions grouped by type with a compact section header.
- Show a spinner while loading and rename the empty-state copy from
  "No plugins installed" to "No extensions installed".
- Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extensions): fallback lucide icon when extension icon is missing

Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension
card when the icon URL is empty or the image fails to load. Picked icons
distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to
avoid visual collision with plugin component badges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(sidebar): unify installed-extensions list with plugins, MCP and skills

- Render plugins, MCP servers and skills together under the "Installed
  Extensions" sidebar entry, alphabetically sorted to match the list page.
- Resolve per-item routes by extension type (plugin -> /home/extensions,
  mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover
  context menu on extensionType === 'plugin'.
- Lift the "group by type" toggle into SidebarDataContext (still persisted
  in localStorage) so the sidebar groups items with section headers
  whenever the list page has the toggle enabled.
- Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the
  LangBot blue for MCP, skill, and missing-icon plugin items, overriding
  the SidebarMenuSubButton svg color rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extensions): mobile-friendly layout for extensions and add-extension pages

- Stack the extensions page header vertically on small screens, let the
  filter Tabs scroll horizontally if they overflow, hide the debug
  button label below sm and let the install/debug controls wrap.
- Constrain the debug popover and its inputs to the viewport width so
  they no longer overflow on phone-sized screens.
- Drop the card grid from a fixed 30rem column to a min(100%, 22rem)
  column at base / 28rem at sm, and reduce the gap, so cards render
  cleanly at 360px+ widths in both flat and grouped views.
- Make the add-extension header actions wrap on lg- viewports and the
  install dialog responsive instead of a hard 500px box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: change ui

* feat: delete version for mcp and skills

* fix: constrain home page content width

* fix: preserve monitoring card borders under sticky filters

* fix(box): restore sandbox config and shared mcp runtime

* fix(box): harden sandbox session isolation

* fix(skill): remove auto activation setting

* feat(skill): align skill system with Claude Code's Tool Call design

- Replace text marker activation with `activate` tool (Tool Call mechanism)
- Replace 7 authoring tools with 2: `activate` + `register_skill`
- Add builtin skills loading from templates/skills/
- Add create-skill as first builtin skill
- Remove SKILL_ACTIVATION_MARKER and text detection methods
- Tool Result returns SKILL.md content (protects KV Cache)

This aligns with Claude Code's progressive disclosure pattern:
- Metadata (name+description) always visible in tool description
- SKILL.md body loaded on activate via Tool Call
- Bundled resources accessible through virtual path mapping

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(tools): add glob and grep native sandbox tools

Add file discovery and content search capabilities to the sandbox:
- glob: Find files by pattern (supports ** recursive matching)
- grep: Search file contents with regex patterns

Both tools respect skill package paths and include safety limits
(max 100 files for glob, max 200 matches for grep).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(skill): add skill file browsing capability

- Add API endpoints for listing/reading/writing skill files
- Add FileTree component in SkillForm for directory browsing
- Users can now view scripts/, references/, assets/ directories
- Files can be selected and edited in the instructions textarea
- Add translations for new file browsing features

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(skill): copy builtin skills to data/skills on startup

- Builtin skills (templates/skills/) are now copied to data/skills/
- Users can view and manage builtin skills in the UI
- Rename SkillAuthoringToolLoader to SkillToolLoader

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(skill): improve file browsing and fix path handling

- Fix nested directory display in skill file tree (preserve root entries)
- Fix file content display when clicking files in skill browser
- Add skill manager and tool manager as proper package modules
- Separate fileContent state to allow editing non-SKILL.md files

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(toolmgr): correct skill_tool_loader attribute name

Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call
and shutdown methods to match the attribute defined in initialize().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(native): update tool descriptions to use register_skill

Replace references to removed import_skill_from_directory with
register_skill in exec/write/edit tool descriptions.

* feat(toolmgr): enhance tool initialization with backend availability checks

* refactor: remove unused imports and clean up code in various files

* feat: polish extension detail pages

* feat: persist sidebar list expansion

* fix: refine extension ui and backend errors

* fix: align add extension marketplace ui

* feat: manage skills through box runtime

* feat: support github skill installation

* fix: import github skill directories

* feat: install market extensions from card click

* feat(web): improve skill import flow

* feat: polish extension import flow

* fix(mcp): stabilize shared box managed processes

* fix(web): improve backend retry and sidebar scrolling

* docs(review): refresh box architecture review for feat/sandbox

Sync the docs/review/ suite to the current state of the feat/sandbox branch
(both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review.

- box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b}
  config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill
  Tool Call activation, shared multi-process MCP container, SkillManager,
  BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect
- box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail
  image conflict, frontend monitoring card) into a Resolved section; add
  new P0 (INIT/backend ordering), P1 (extra_mounts immutability after
  container creation), P2 (skill_store test gap, integration tests not in CI)
- box-session-scope.md: add §0 Implementation Status — Phase 1 shipped,
  MCP unification landed earlier than originally scoped
- box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC),
  add 7 new test files including SDK backend_selection/e2b/skill_store
- box-tob-analysis.md: connection recovery now满足基本要求; add E2B and
  backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat
- box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned
  with Plugin Runtime; revise remaining gaps (WS auth, shared base class)

* refactor(box): use unified env-override mechanism for box.local config

The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two
places (connector._get_box_config and service._local_config), duplicating
logic that LoadConfigStage._apply_env_overrides_to_config already provides
generically via the SECTION__SUBSECTION__KEY convention.

- Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight
  from instance_config (the unified BOX__LOCAL__* overrides are already
  applied before BoxService initializes)
- Harden _load_allowed_mount_roots to accept a comma-separated string,
  since the generic mechanism stores a freshly-created key as a raw
  string when config.yaml has no box.local.allowed_mount_roots entry
- docker-compose: rename the langbot container env vars to
  BOX__LOCAL__* (the canonical convention); remove them entirely from
  the langbot_box container — the Box runtime never reads box.local from
  env/config.yaml, it is configured via the INIT RPC action

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: repair stale skill/sandbox tests for feat/sandbox

The skill subsystem moved to Tool-Call activation and a Box-managed
skill store; several tests still asserted removed APIs and a sys.modules
stub leaked across the suite. Full unit suite now green (was 23 failing).

- test_skill_tools: drop TestSkillManagerActivation (text-marker API
  removed); rewrite TestSkillActivationHelper around the current
  skill.activation.register_activated_skill; replace the CRUD
  TestSkillAuthoringToolLoader with TestSkillToolLoader covering the
  current activate/register_skill tools and sandbox-availability gating
- test_tool_manager_native: ToolManager attr is skill_tool_loader (not
  skill_authoring_tool_loader); native loader now exposes 6 tools
  (exec/read/write/edit/glob/grep) and requires initialize() with a
  backend-available get_status()
- test_localagent_sandbox_exec: remove obsolete activation-marker
  leakage tests and their helper providers
- test_model_service / pipeline conftest: give the mocks skill_mgr=None
  so PreProcessor's local-agent skill-binding guard short-circuits
- test_n8nsvapi: stop permanently overwriting sys.modules
  ('langbot.pkg.provider.runner' etc.); save and restore around the
  import so other modules get the real LocalAgentRunner base class

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(tests): run unit tests on every push to feat/** branches

- Add feat/** to push branches so long-lived feature branches are
  tested on every push (they accumulate large changes before a PR)
- Drop the push path filter entirely: every push to master/develop/
  feat/** now runs the full unit suite (the old 'pkg/**' filter never
  matched the real source path 'src/langbot/pkg/**', so backend-only
  pushes silently skipped tests)
- Fix the same broken path glob on the pull_request trigger
  ('pkg/**' -> 'src/langbot/pkg/**')

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skill): harden mount/reload paths and HTTP errors against stale skill cache

The Box backends behave inconsistently when extra_mounts reference a
missing host directory (nsjail aborts the entire sandbox start, Docker
silently creates a root-owned empty dir on the host, E2B silently skips
the upload). The cache in skill_mgr.skills is only refreshed on
in-process mutations, so out-of-band changes — container rebuilds,
manual rm in the box volume, anything the LangBot API didn't drive —
leave a stale skill that later produces one of those bad mount paths.

- box/service.py: build_skill_extra_mounts now filters skills whose
  package_root is not isdir on the LangBot-visible filesystem and logs
  a warning, instead of passing the bad mount through to the backend
- skill/manager.py: reload_skills (Box path) drops skills whose
  package_root is missing on the LangBot-side filesystem before they
  reach the in-memory cache, with a summary warning
- api/http/controller/groups/skills.py: file/CRUD handlers now also
  catch BoxError (RuntimeError subclass, previously slipping past
  ``except ValueError`` and surfacing as 500); list/get handlers gain
  a try/except so a transient Box RPC failure becomes a clean 400
  instead of a stack trace

Tests added for build_skill_extra_mounts (skip missing, skip empty,
no skill manager) and SkillManager.reload_skills (drop missing on Box
path). Full unit suite: 279 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(box): add box.enabled toggle and gate consumers on availability

Make the Box sandbox runtime optional. When ``box.enabled`` is false in
config (or when an enabled Box fails to connect), every dependent feature
degrades to the same disabled-state UX rather than crashing or silently
falling back to less safe code paths.

Backend:

- config.yaml: new top-level ``box.enabled: true`` flag (default true)
- BoxService:
  - Read box.enabled on construction
  - initialize() short-circuits when disabled — no remote WS connect, no
    stdio subprocess fork
  - _on_runtime_disconnect is a no-op when disabled (no reconnect loop
    on a deliberately-off service)
  - get_status() now exposes ``enabled`` so the frontend can tell
    "disabled in config" from "configured but failed"
- MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to
  be available, not just installed
- MCP _init_stdio_python_server: when ap.box_service exists but is
  unavailable, refuse the stdio server with an actionable error instead
  of silently falling through to host-stdio (which bypasses the sandbox
  the operator asked for). Setups without ap.box_service installed at
  all keep the legacy host-stdio fallback for pre-Box dev mode
- SkillService._require_box_for_write: refuses create/update/install/
  write_skill_file when ap.box_service is installed but unavailable.
  Distinguishes disabled vs failed in the error message so the UI can
  surface the right hint. Legacy setups (no ap.box_service) keep the
  local fallback path — that distinction is what keeps the existing
  local-skills tests valid

Tests:
- Box disabled-state behavior (4 cases)
- Skill write refusal in disabled & failed states (7 cases)
- MCP stdio runtime info policy updated to match new refuse-when-down
  behavior

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(web): surface Box disabled/unavailable state across consumers

When Box is disabled in config (``box.enabled = false``) or fails to
connect, every dependent UI surface now degrades visibly:

- ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``,
  ``disabled`` (config-off) and a single ``hint`` key so callers don't
  have to re-derive the three states
- ``BoxUnavailableNotice`` reusable Alert banner driven by that hint
- Dashboard SystemStatusCards: three-state dot + label
  (connected / disabled-gray / disconnected-red); disabled state shows
  the ``boxDisabled`` hint, failed state continues to show the connector
  error. Plugin block kept untouched
- Skills page (create view) and SkillDetailContent (edit view):
  Save button disabled and banner inserted above the form when Box is
  unavailable — matches the backend gate added in the previous commit
- PipelineExtension skill section: ``enable_all_skills`` switch, Add
  Skill button and Remove buttons all gate on Box availability;
  banner inline under the section header
- PipelineFormComponent: banner above the ``local-agent`` stage card
  when Box is unavailable, since that stage carries the sandbox-bound
  ``box-session-id-template`` field
- Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale
  files updated with ``boxDisabled`` / ``boxUnavailable`` /
  ``boxRequiredHint`` strings

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(box): document the box.enabled toggle and gate behavior matrix

- docker-compose: move ``langbot_box`` under compose profiles
  (``box`` and ``all``) so ``docker compose up`` no longer requires
  the sandbox container. Inline comment explains how to pair the
  profile choice with ``box.enabled`` so the langbot service does not
  thrash trying to reach a runtime that was never started
- docs/review/box-architecture.md:
  - Annotate ``box.enabled`` in the config.yaml example, listing the
    exact side effects (no remote/stdio connect; tools/skills/MCP
    stdio off; reads still work)
  - Replace the bare compose snippet with the actual profile-driven
    invocation and the BOX__ENABLED pairing
  - New "关闭/连接失败时的行为矩阵" section: a single table mapping
    every consumer (native tools, activate/register_skill, stdio MCP,
    skill list/CRUD, pipeline AI config, extensions page, dashboard)
    to its disabled-state behavior, plus the legacy ``ap.box_service``
    distinguisher note

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip

The previous commit hard-coded a BoxUnavailableNotice banner above the
``local-agent`` stage card. That works, but it shouts at the user about
every field in that stage when in reality only one field —
``box-session-id-template`` — depends on the sandbox.

Use the dynamic-form schema's existing variable-injection mechanism
(``__system.*`` references via ``systemContext``) and add a sibling to
``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays
visible, becomes inert, and an info icon next to its label exposes the
reason on hover. The rest of the AI tab is left untouched.

- entities/form/dynamic.ts: extend IDynamicFormItemSchema with
  ``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject``
- DynamicFormComponent: evaluate disable_if with the same resolver as
  show_if; OR the result into isFieldDisabled; render an Info tooltip
  trigger next to the label when the condition matches
- ai.yaml metadata: attach disable_if (__system.box_available eq false)
  and a localized disabled_tooltip to box-session-id-template
- PipelineFormComponent: drop the BoxUnavailableNotice import and the
  per-stage banner; pass ``systemContext={ box_available: boxAvailable }``
  only for the local-agent stage so other stages aren't paying the
  re-render cost

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp): friendly UI message when stdio MCP refused by Box state

Previously the MCP detail dialog dumped the raw RuntimeError text from
``_init_stdio_python_server`` — English-only, prefixed with "Failed
after 4 attempts", and exposing internal config names. The retry
wrapper also kept retrying a refusal that is deterministically going
to fail again, polluting logs.

Replace the raw text with a structured signal:

- New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio
  refusal path sets it before raising and uses a short opaque
  discriminator (``box_disabled_in_config`` / ``box_unavailable``) as
  the message body — never user-facing
- ``_lifecycle_loop_with_retry`` short-circuits on
  ``BOX_UNAVAILABLE``: surfaces the error immediately, no retries,
  no "Failed after N attempts" prefix. Silences the warning storm
  seen during smoke-testing
- ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``,
  ``retry_count``, ``box_session_id``, ``box_enabled`` to match what
  the backend already returns in get_runtime_info_dict()
- Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and
  ``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect
  ``error_phase === 'box_unavailable'`` and render a two-line
  localized notice: state line ("Box disabled / unreachable") plus
  remediation line ("enable Box or switch to http/sse")
- 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get
  ``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``,
  ``mcp.boxStdioRefusedSuggestion``

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp-web): block stdio MCP creation at the form when Box is unavailable

When Box is disabled in config (``box.enabled = false``) or unreachable,
saving a new MCP server in stdio mode produced one that could never
start — the user would only learn that from the runtime error on the
detail page. Stop the user before they save instead.

Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog
``MCPFormDialog.tsx``) now:

- Disable the ``stdio`` option in the mode select when Box is
  unavailable, with a small "(requires Box)" suffix so the reason is
  obvious. Existing stdio configs still display their current value
- Show ``BoxUnavailableNotice`` inline under the mode select when the
  currently-selected mode is stdio and Box is unavailable, so editing
  a stale stdio config makes the cause visible
- Disable the Save / Submit button while stdio is selected under that
  condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop
  so the parent ``MCPDetailContent`` can disable both its Submit and
  Save buttons. ``MCPFormDialog`` disables its Save button locally
- Refuse the submit handler too (Enter-key path) with a toast carrying
  the same i18n message

i18n: ``mcp.boxRequired`` (short tag in the disabled option) and
``mcp.stdioBlockedByBoxToast`` added to all 8 locales.

Backend runtime gate (``_init_stdio_python_server`` refusal +
``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place
as the last line of defence for API bypass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(web): prevent plugin config form overflow

* refactor(skill): remove all local-filesystem fallbacks; Box is the sole source

Skills now flow exclusively through the Box runtime. Every read and write
method funnels through ``_box_service()``; when Box is unavailable
(disabled in config, connection failed, or simply not installed) the
operation either returns an empty surface (``list_skills`` → []) or
raises with a clear ``Box runtime ... not initialised / disabled /
unavailable: ...`` message via the new ``_require_box(action)`` helper.

Why: the legacy local-fallback path scanned ``data/skills/``, but Box
manages its own ``box.local.skills_root`` (default ``data/box/skills/``).
The two diverging directories caused stale / phantom skill lists when
Box flapped, and the local-fallback writes silently bypassed all the
sandboxing the operator had configured.

SkillService (``api/http/service/skill.py``):
- New ``_require_box(action)`` returns the box service or raises a
  structured ValueError. ``_require_box_for_write`` kept as alias
- ``list_skills`` → returns [] when Box is down so the UI can render
  the disabled banner cleanly
- ``get_skill`` / ``get_skill_by_name`` → return None
- All read-file / write-file / scan-dir / create / update / delete /
  install / preview methods → ``_require_box`` then box delegate.
  Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview
  pipelines) removed entirely

SkillManager (``pkg/skill/manager.py``):
- ``reload_skills`` returns early with empty cache when Box is down.
  data/skills/ discovery loop removed
- ``refresh_skill_from_disk`` now just reports cache presence; the
  on-disk re-parse is gone since Box is the only writer

Tests:
- Drop 11 obsolete test_skill_service.py tests that exercised the
  removed local-fallback paths (create/install/file/delete/update)
- Add list-empty + read-refused tests; flip the legacy-allow test to
  legacy-refuses-too
- Rewrite refresh_skill_from_disk test to match the new behaviour

Several helper methods (_managed_skill_path, _resolve_skill_path,
_preview_skill_candidates, _install_preview_candidates, etc.) are now
unreachable; a follow-up commit will prune them so this diff stays
reviewable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(skill): prune dead local-filesystem helpers left over from Box migration

Follow-up to the Box-only refactor. The previous commit removed the
local-fallback BRANCHES from every public method; this one removes the
HELPERS those branches called, which are now unreachable.

SkillService (service/skill.py): 787 → 449 lines
  Removed: scan_directory (sync), _read_skill_package, _write_skill_md,
  _resolve_create_field, _managed_skill_path,
  _managed_install_root_for_package, _normalize_package_root,
  _resolve_skill_path, _find_skill_entry, _discover_skill_directories,
  _safe_extract_zip, _extract_uploaded_skill_to_temp,
  _download_github_skill_to_temp, _resolve_github_source_root,
  _build_preview_target_dir, _preview_skill_candidates,
  _select_preview_candidates, _install_preview_candidates,
  _preview_source_root, _resolve_installed_skills, plus the
  module-level _FRONTMATTER_FIELDS and _build_skill_md.
  Kept (still needed by the surviving GitHub-import path):
  _download_github_asset, _download_github_skill_directory_as_zip,
  _find_github_skill_archive_entry, _copy_github_skill_directory_to_zip,
  _is_github_skill_md_url, _parse_github_skill_md_url,
  _resolve_github_skill_md_package_name, _validate_github_asset_url,
  _uploaded_skill_target_stem, _validate_skill_name.
  Imports dropped: shutil, tempfile, yaml, ....utils.paths.

SkillManager (skill/manager.py): 187 → 88 lines
  Removed: get_managed_skills_root, _discover_skill_directories,
  _find_skill_entry, _load_skill_file, _normalize_package_root.
  Imports dropped: datetime, parse_frontmatter, paths.

Tests:
  - test_skill_service.py: drop the 3 sync scan_directory tests +
    skill_service fixture + _create_skill_file helper
  - test_skill_tools.py: drop test_load_skill_file_success; rename
    TestSkillManagerPackageLoading → TestSkillManagerCache

Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skill): re-inject skill index into local-agent system prompt

The contributor's original PR (#1917) appended an ``Available Skills``
index to the system prompt before the LLM saw the user message, so the
LLM could decide whether to activate a skill. ``7145447b`` removed the
text-marker activation flow and, together with it, the entire system
prompt injection — but the Tool Call replacement only put the available
skills inside the ``activate`` tool's description. In practice the LLM
ignores tool descriptions for selection and goes straight to native
tools, so user-visible skill activation silently broke.

Restore the injection, adapted for the Tool Call era:

- SkillManager regains ``get_skill_index(bound_skills)`` and
  ``build_skill_aware_prompt_addition(bound_skills)``. The addendum
  carries only ``name (display_name): description`` for each
  pipeline-visible skill plus one instruction line pointing at the
  ``activate`` tool. No SKILL.md contents — KV cache stays clean
- PreProcessor appends the addendum to the first system message (or
  inserts a new one) of ``query.prompt.messages`` for the local-agent
  runner. Handles plain-string and ContentElement[] bodies. Skips
  cleanly when no skills are visible
- 3 new test_preproc cases: injection happens, bound-skills subset
  honoured, empty addendum touches nothing. 280 passed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(box): downgrade get_status.available when backend probed unavailable

Until now ``BoxService.get_status`` returned ``available: true`` whenever
the runtime connector was healthy, even if the runtime itself reported
``backend: { available: false }`` (operator selected nsjail without the
binary, Docker daemon crashed mid-session, E2B credentials wrong, ...).
The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the
top-level flag and showed "connected" while every actual call to native
exec or skill management would fail.

The native-tool loader already polled ``status.backend.available``
independently and hid its tools correctly, but every other consumer
(dashboard banner, the disabled-state hint, the LLM-facing message)
disagreed with it.

Combine the two in the payload: ``available = self._available AND
status.backend.available``. When ``backend.available`` is false we now
also surface a ``connector_error`` that names the backend
("Configured sandbox backend \"nsjail\" is unavailable") so the dialog
shows the actionable reason instead of an empty error pane. The
detailed ``backend`` object is preserved unchanged for the dialog.

Internal ``box_service.available`` (used by ``skill_service`` writes,
``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally
NOT changed — it still tracks connector health only, so a backend blip
does not trigger spurious reconnect loops.

Tests:
- ``test_get_status_downgrades_available_when_backend_dead`` — exercise
  the new branch (connector OK, backend.available=false → top-level
  available=false, connector_error mentions the backend name)
- ``test_get_status_keeps_available_true_when_backend_ok`` — guard
  against regressing the happy path

Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary):
``GET /api/v1/box/status`` now returns ``available: false`` with the
named connector_error, instead of the previous misleading
``available: true``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(web): surface the specific Box failure reason in unavailable banner

When Box is configured but the runtime reports its backend is dead
(e.g. ``box.backend = nsjail`` but the binary is missing, or Docker
daemon crashed), the backend now returns a structured
``connector_error`` like ``Configured sandbox backend "nsjail" is
unavailable``. The previous notice only said "Box sandbox is
unavailable" + a generic "enable Box" hint, hiding the actionable
detail.

- ``useBoxStatus``: derive ``reason`` from ``status.connector_error``.
  Only exposed for the failed-state (``hint === 'boxUnavailable'``),
  since the disabled-by-config message already carries its reason
- ``BoxUnavailableNotice``: insert the reason as a small monospaced
  line between the state message and the action hint. The disabled
  variant is unchanged (operator chose the state)
- Wire ``reason`` through every existing call site (Skills page +
  detail, PipelineExtension, both MCP forms). Old unused ``context``
  prop dropped

Net layout (3 lines, still compact):

  ⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ...
    Configured sandbox backend "nsjail" is unavailable
    This feature requires the Box runtime. Enable it in config ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: reconcile master's unit tests with feat/sandbox refactors

The merge from master brought in new unit tests that target pre-refactor
APIs on feat/sandbox. Reconcile each:

- factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills
  dict + inert prompt-addition builder) and a Mock pipeline_service so the
  PreProcessor skill-index injection branch can run end-to-end in tests.

- pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so
  pipeline.stage is fully initialised before any individual stage test
  (preproc, longtext, ...) tries to lazy-load it. Without this preload,
  running test_preproc.py in isolation hit a circular-import error via the
  stage -> app -> pipelinemgr -> stage chain.

- provider/test_tool_manager.py: ToolManager now probes four loaders
  (native -> plugin -> mcp -> skill). Inject inert native + skill mocks in
  the execute_func_call fixture and assert all four shutdowns fire.

- utils/test_paths.py: drop the three cwd-dependent _check_if_source_install
  cases. The refactor walks Path(__file__).resolve().parents looking for
  pyproject.toml + main.py, so cwd no longer factors in and there's no
  file read to mock-fail. The positive case and caching test still apply.

- utils/test_version.py: delete entirely. is_newer and compare_version_str
  were removed when VersionManager was refactored to use the Space API for
  release checks (1b4107a9); the tests targeted a surface that no longer
  exists.

* refactor(box): launch box runtime via the lbp CLI subcommand

Mirror the plugin runtime: box is now started through the same CLI entry
point (langbot_plugin.cli) instead of the box module directly.

- docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box`
  (WebSocket is the default transport, no flag needed — matches `rt`).
- box/connector.py: both subprocess launch sites (_start_local_stdio and
  the Windows _start_subprocess_then_ws path) invoke
  `langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport.
- docs/review: update stale `-m langbot_plugin.box[.server]` references.

Pairs with the SDK change that removes box's direct-launch entry points
(python -m langbot_plugin.box / .box.server) and the legacy --mode flag.

* chore: bump langbot-plugin beta 1

* fix(ci): resolve langbot-plugin from PyPI and clear lint failures

CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image.
Root causes and fixes:

- pyproject.toml had a [tool.uv.sources] editable override pinning
  langbot-plugin to ../langbot-plugin-sdk. That path only exists in a
  paired local checkout, so `uv sync` failed on every CI runner
  ("Distribution not found"). Remove the override and regenerate uv.lock
  so langbot-plugin==0.4.0b1 resolves from PyPI, matching master.

- tests/integration/api/test_pipelines.py: the pipeline extensions
  endpoint now calls ap.skill_service.list_skills(); add the missing
  skill_service mock to the fake_pipeline_app fixture (the test came
  from master, the endpoint change from feat/sandbox).

- Apply ruff format to three src files and prettier to three web files
  that had committed formatting drift, failing `ruff format --check`
  and `pnpm lint`.

* chore: bump beta version

* docs: remove BOX_BACKEND override reference

* fix(pipelines): stop attributing dashboard debug WS to bound web_page_bot

The dashboard pipeline-debug WebSocket
(/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket
(/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths,
but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when
the same pipeline happened to be bound to a web_page_bot, passed that
bot as `owner_bot` into `handle_websocket_message`. The adapter then
used the page bot's listeners + adapter for the request, so debug
sessions were logged as "page bot" activity in the dashboard.

Debug sessions must always run under the built-in websocket_proxy_bot.
Remove `_find_owner_bot`, drop the `owner_bot` parameter from the
debug-path `_handle_receive`, and call `handle_websocket_message`
without it so the adapter takes its default proxy-bot branch. The
embed handler still resolves and passes its `runtime_bot` for the
page-bot path, so attribution there is unchanged.

* fix(plugin): install marketplace MCP from canonical mode + extra_args

_install_mcp_from_marketplace read the dropped `mcp_data.config` field
and reconstructed mode/extra_args by guessing from the URL — which lost
stdio's command/args/env/box entirely, so stdio MCP installs from the
marketplace always failed.

Use the Space record's canonical `mode` and `extra_args` directly (the
same shape stored in mcp_servers), and gate the install on `mode`
instead of the removed `config`. After a successful install, best-effort
POST to the marketplace install endpoint to bump install_count.

* feat(web): show recommendation lists in plugin market; mixed-type icons

The marketplace recommendation lists (curated rows from Space) were never
mounted in the plugin market page. Wire them in:
- fetch recommendation lists on mount and render them above the extension
  grid, only when no search/filter is active.

Recommendation lists now mix plugins, MCPs and skills, so resolve each
card's icon by type (plugin / mcp / skill marketplace icon URL) instead of
always using the plugin icon endpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(web): auto-open install dialog from one-click deep link

Accept a deep link from LangBot Space's one-click install:
/home/add-extension?install=1&extension_type=<plugin|mcp|skill>&author=&name=&version=
On mount, populate the install info, open the confirm dialog directly, and
strip the params from the URL. Reuses the existing marketplace install flow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: push marketplace URL to runtime; fix market client base race

- On connecting to the plugin runtime, push the configured space.url via the
  new SET_RUNTIME_CONFIG action so the runtime downloads plugins from the same
  Space, instead of relying on its own CLOUD_SERVICE_URL env/default. Wrapped
  in try/except so an older SDK without the action degrades gracefully.
- web: the plugin market fetched recommendation lists (and listings) via the
  sync cloud client before its baseURL was resolved from system info, so it
  hit the default space.langbot.app. Await getCloudServiceClient() before the
  initial fetches and for the recommendation list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(web): don't show MCP "connection failed" while still connecting

The MCP status UI rendered "连接失败" for any non-connected state, so during a
normal connection attempt the subtitle showed "连接失败" while the status pill
below it showed "连接中..." — contradictory.

Only treat an explicit ERROR (or box-unavailable) status as failed; a
CONNECTING or initial/unresolved status now shows "连接中". Applied to the MCP
detail form (subtitle + StatusDisplay) and the MCP server card.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(web): type-aware install dialog + refresh sidebar after install

The marketplace install confirm dialog was hardcoded to "安装插件 / 确定要安装
插件 X 吗" for every type. Make it type-aware (plugin / MCP / skill) and show
more info: type chip, author/name id, and version when present.

Also refresh all sidebar extension lists (plugins, MCP servers, skills) when
an install task completes, so the newly-installed extension appears
immediately regardless of type (previously only refreshPlugins ran).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(web): richer install dialog (icon + name + description), drop redundant type row

The install dialog already states the type in its title, so the "类型" row was
redundant. Replace the info box with the extension's icon (avatar), display
name, author/name id + version, and description — built from the PluginV4 for
in-app installs and from the icon endpoint by type for the one-click deep link.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(web): TDZ crash in add-extension (installIconURL before installInfo)

installIconURL was computed above the useState declaration of installInfo,
causing "Cannot access 'installInfo' before initialization" (500) on the
add-extension page. Move the computation below the state declarations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(web): redesign install-progress dialog for MCP/skill

The progress dialog showed plugin-only stages (download + dependency install)
for every type. MCP/skill have no such steps, so show a single
"installing → done/failed" row for them (MCP: adding & connecting the server;
skill: installing the package) while keeping the detailed download/deps
stages for plugins.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(web): add missing market.componentName i18n keys

The marketplace component filter (and component badges) used
market.componentName.{Tool,Command,EventListener,KnowledgeEngine,Parser,Page}
but those keys only existed under plugins.componentName, so the market UI
showed raw keys. Add a componentName block to the market namespace (zh-Hans +
en-US; other locales fall back to zh-Hans).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(web): sidebar extensions refresh button + full-name tooltip

- Add a refresh button to the installed-extensions category header in the
  sidebar; it re-fetches plugins + MCP servers + skills and spins while
  loading.
- The sidebar item tooltip now shows the extension's full name (with the
  description below when present), so truncated MCP/extension names are
  readable on hover.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(plugin-market): rename component filter to "插件组件" with hint tooltip + persist filters

- Rename the in-app plugin market component filter label to "插件组件" /
  "Plugin Component"
- Add an Info icon tooltip explaining what plugin components are (Tool /
  Command / EventListener, etc.)
- Persist filter selections (type / component / tags / sort) in localStorage
  so they survive reloads; restored on mount (URL type param still wins)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(plugin-market): restore missing "页面"(Page) component filter option

The market component-filter list on this branch was a diverged rewrite that
dropped the Page component kind master had added. The i18n key
(market.componentName.Page) already existed; re-add the Page entry to the
componentOptions list so plugins providing Page components can be filtered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(i18n): reword plugin component filter hint

Drop the redundant "插件组件是" lead-in and mention that components extend
LangBot's capabilities; mirror the wording in en-US.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(i18n): backfill missing market/addExtension keys in 6 locales

check-i18n surfaced that market.componentName.*, market.filterByComponentHint
and the addExtension.install* keys existed only in en-US/zh-Hans. Backfill
them for es-ES, ja-JP, ru-RU, th-TH, vi-VN and zh-Hant (reusing each locale's
existing component-name translations) and align the filterByComponent label
with the new "Plugin Component" wording. check-i18n now passes for all locales.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* i18n(plugins): relabel "group by type" as "group by format"

The installed-extensions grouping is by extension format (plugin / MCP / skill),
so rename the toggle label accordingly across all 8 locales (key unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(plugin-market): cursor-pointer on tag filter trigger

The TagsFilter Select trigger used the default cursor; add cursor-pointer so the
tag filter is clearly clickable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(sidebar): show edition badge (Community / Cloud) in logo area

Add a small badge next to the LangBot name in the sidebar header that reflects
systemInfo.edition: a neutral "Community" badge for the community edition and a
blue "Cloud" badge for the cloud edition. Adds sidebar.editionCommunity /
sidebar.editionCloud across all 8 locales.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* i18n(sidebar): unify zh-Hans cloud edition label to 云端版

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sidebar): edition badge - drop hover, use "Cloud" in all locales

The edition badge is not interactive, so remove the hover background on the
cloud badge. Also use the literal "Cloud" label uniformly across all locales
instead of localized variants (云端版/クラウド版/...).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(box): cap tool-call loop and run workspace-quota walk off the event loop

Two robustness fixes that bite under normal sandbox usage (not just attack),
hardening the self-hosted community edition before release:

- localagent: cap the tool-call loop at MAX_TOOL_CALL_ROUNDS (128). A looping
  or adversarial model could otherwise emit tool calls indefinitely (each
  potentially a sandbox exec), producing a non-terminating request and runaway
  cost. The cap is generous enough not to interrupt legitimate multi-step
  agentic workflows.
- box.service: make _enforce_workspace_quota async and run the recursive
  workspace scan via asyncio.to_thread. It ran on every quota-enforced exec and
  a large workspace would block the whole asyncio runtime (all bots/pipelines).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(review): refresh box docs; trim issue list to SaaS blockers only

Community self-hosted edition is release-ready, so the box review docs are
updated to current state (date 2026-06-02 + status note) and box-issues.md is
rewritten to keep only the SaaS / multi-tenant / network-exposed release
blockers (S1-S8): unauthenticated control plane, no per-pipeline exec
authorization, unbounded sessions + no reaper, no kernel-level quota, mount
validation gaps (/ + extra_mounts), missing container hardening, lock-around-
cold-start, and the lower-severity follow-ups. Resolved items (tool-call loop
cap, async quota scan, host_path mount allowlist, _is_path_under dedup) moved to
a short "resolved before community release" record; community-only and
pure-cleanup items dropped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(deps): pin langbot-plugin to 0.4.0

Track the stable SDK release (0.4.0b1 -> 0.4.0); regenerate uv.lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: WangCham <651122857@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: fdc310 <82008029+fdc310@users.noreply.github.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
This commit is contained in:
huanghuoguoguo
2026-06-03 11:12:39 +08:00
committed by GitHub
parent 4054ba2a76
commit 96b041846d
161 changed files with 22518 additions and 4029 deletions

View File

@@ -15,7 +15,7 @@ class FakeApp:
def __init__(
self,
*,
command_prefix: list[str] = ["/", "!"],
command_prefix: list[str] = ['/', '!'],
command_enable: bool = True,
pipeline_concurrency: int = 10,
admins: list[str] | None = None,
@@ -40,6 +40,8 @@ class FakeApp:
self.telemetry = self._create_mock_telemetry()
self.survey = None
self.cmd_mgr = self._create_mock_cmd_mgr()
self.skill_mgr = self._create_mock_skill_mgr()
self.pipeline_service = self._create_mock_pipeline_service()
# Apply any extra attributes for specific test scenarios
for name, value in extra_attrs.items():
@@ -98,9 +100,9 @@ class FakeApp:
):
instance_config = Mock()
instance_config.data = {
"command": {"prefix": command_prefix, "enable": command_enable},
"concurrency": {"pipeline": pipeline_concurrency},
"admins": admins,
'command': {'prefix': command_prefix, 'enable': command_enable},
'concurrency': {'pipeline': pipeline_concurrency},
'admins': admins,
}
return instance_config
@@ -119,6 +121,20 @@ class FakeApp:
cmd_mgr.execute = AsyncMock()
return cmd_mgr
def _create_mock_skill_mgr(self):
"""Mock SkillManager that returns no skill index addition by default."""
skill_mgr = Mock()
skill_mgr.skills = {}
skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
skill_mgr.get_skill_index = Mock(return_value=[])
return skill_mgr
def _create_mock_pipeline_service(self):
"""Mock PipelineService.get_pipeline returning empty extensions prefs."""
pipeline_service = AsyncMock()
pipeline_service.get_pipeline = AsyncMock(return_value={'extensions_preferences': {}})
return pipeline_service
def capture_message(self, message):
"""Capture an outbound message for test assertions."""
self._outbound_messages.append(message)
@@ -134,4 +150,4 @@ class FakeApp:
def fake_app(**kwargs) -> FakeApp:
"""Create a FakeApp instance with optional overrides."""
return FakeApp(**kwargs)
return FakeApp(**kwargs)

View File

@@ -20,6 +20,7 @@ pytestmark = pytest.mark.integration
# ============== FIXTURE FOR SYS.MODULES ISOLATION ==============
@pytest.fixture(scope='module')
def mock_circular_import_chain():
"""Break circular import chain for API controller."""
@@ -53,21 +54,25 @@ def mock_circular_import_chain():
):
# Import groups after mocking to populate preregistered_groups
import langbot.pkg.api.http.controller.groups.pipelines.pipelines as _pipelines # noqa: E402, F401
yield
# ============== FAKE APPLICATION WITH PIPELINE SERVICES ==============
@pytest.fixture(scope='module')
def fake_pipeline_app():
"""Create FakeApp with pipeline-specific services (module scope for reuse)."""
app = FakeApp()
# Pipeline config
app.instance_config.data.update({
'api': {'port': 5300},
'system': {'allow_modify_login_info': True, 'limitation': {}},
})
app.instance_config.data.update(
{
'api': {'port': 5300},
'system': {'allow_modify_login_info': True, 'limitation': {}},
}
)
# Auth services
app.user_service = Mock()
@@ -79,25 +84,31 @@ def fake_pipeline_app():
# Pipeline service
app.pipeline_service = Mock()
app.pipeline_service.get_pipeline_metadata = AsyncMock(return_value=[
{'name': 'trigger', 'stages': []},
{'name': 'ai', 'stages': []},
])
app.pipeline_service.get_pipelines = AsyncMock(return_value=[
{
app.pipeline_service.get_pipeline_metadata = AsyncMock(
return_value=[
{'name': 'trigger', 'stages': []},
{'name': 'ai', 'stages': []},
]
)
app.pipeline_service.get_pipelines = AsyncMock(
return_value=[
{
'uuid': 'test-pipeline-uuid',
'name': 'Test Pipeline',
'description': 'Test description',
'created_at': '2024-01-01T00:00:00',
'updated_at': '2024-01-01T00:00:00',
'is_default': False,
}
]
)
app.pipeline_service.get_pipeline = AsyncMock(
return_value={
'uuid': 'test-pipeline-uuid',
'name': 'Test Pipeline',
'description': 'Test description',
'created_at': '2024-01-01T00:00:00',
'updated_at': '2024-01-01T00:00:00',
'is_default': False,
'config': {},
}
])
app.pipeline_service.get_pipeline = AsyncMock(return_value={
'uuid': 'test-pipeline-uuid',
'name': 'Test Pipeline',
'config': {},
})
)
app.pipeline_service.create_pipeline = AsyncMock(return_value={'uuid': 'new-pipeline-uuid'})
app.pipeline_service.update_pipeline = AsyncMock(return_value={})
app.pipeline_service.delete_pipeline = AsyncMock()
@@ -112,6 +123,10 @@ def fake_pipeline_app():
app.mcp_service = Mock()
app.mcp_service.get_mcp_servers = AsyncMock(return_value=[])
# Skill service (for extensions endpoint)
app.skill_service = Mock()
app.skill_service.list_skills = AsyncMock(return_value=[])
# Plugin connector (for extensions endpoint)
app.plugin_connector.list_plugins = AsyncMock(return_value=[])
@@ -130,6 +145,7 @@ async def quart_test_client(fake_pipeline_app, http_controller_cls):
# ============== PIPELINE ENDPOINT TESTS ==============
@pytest.mark.usefixtures('mock_circular_import_chain')
class TestPipelineMetadataEndpoint:
"""Tests for /api/v1/pipelines/_/metadata endpoint."""
@@ -138,8 +154,7 @@ class TestPipelineMetadataEndpoint:
async def test_get_pipeline_metadata_success(self, quart_test_client):
"""GET /api/v1/pipelines/_/metadata returns metadata list."""
response = await quart_test_client.get(
'/api/v1/pipelines/_/metadata',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines/_/metadata', headers={'Authorization': 'Bearer test_token'}
)
assert response.status_code == 200
@@ -162,10 +177,7 @@ class TestPipelinesListEndpoint:
@pytest.mark.asyncio
async def test_get_pipelines_success(self, quart_test_client):
"""GET /api/v1/pipelines returns pipeline list."""
response = await quart_test_client.get(
'/api/v1/pipelines',
headers={'Authorization': 'Bearer test_token'}
)
response = await quart_test_client.get('/api/v1/pipelines', headers={'Authorization': 'Bearer test_token'})
assert response.status_code == 200
data = await response.get_json()
@@ -176,8 +188,7 @@ class TestPipelinesListEndpoint:
async def test_get_pipelines_with_sort_param(self, quart_test_client):
"""GET pipelines with sort parameter."""
response = await quart_test_client.get(
'/api/v1/pipelines?sort_by=created_at&sort_order=DESC',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines?sort_by=created_at&sort_order=DESC', headers={'Authorization': 'Bearer test_token'}
)
assert response.status_code == 200
@@ -193,8 +204,7 @@ class TestPipelinesCRUDEndpoints:
async def test_get_single_pipeline_success(self, quart_test_client):
"""GET /api/v1/pipelines/{uuid} returns pipeline."""
response = await quart_test_client.get(
'/api/v1/pipelines/test-pipeline-uuid',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
)
assert response.status_code == 200
@@ -208,7 +218,7 @@ class TestPipelinesCRUDEndpoints:
response = await quart_test_client.post(
'/api/v1/pipelines',
headers={'Authorization': 'Bearer test_token'},
json={'name': 'New Pipeline', 'config': {}}
json={'name': 'New Pipeline', 'config': {}},
)
assert response.status_code == 200
@@ -222,7 +232,7 @@ class TestPipelinesCRUDEndpoints:
response = await quart_test_client.put(
'/api/v1/pipelines/test-pipeline-uuid',
headers={'Authorization': 'Bearer test_token'},
json={'name': 'Updated Pipeline'}
json={'name': 'Updated Pipeline'},
)
assert response.status_code == 200
@@ -233,8 +243,7 @@ class TestPipelinesCRUDEndpoints:
async def test_delete_pipeline_success(self, quart_test_client):
"""DELETE /api/v1/pipelines/{uuid} deletes pipeline."""
response = await quart_test_client.delete(
'/api/v1/pipelines/test-pipeline-uuid',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
)
assert response.status_code == 200
@@ -245,8 +254,7 @@ class TestPipelinesCRUDEndpoints:
async def test_copy_pipeline_success(self, quart_test_client):
"""POST /api/v1/pipelines/{uuid}/copy copies pipeline."""
response = await quart_test_client.post(
'/api/v1/pipelines/test-pipeline-uuid/copy',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines/test-pipeline-uuid/copy', headers={'Authorization': 'Bearer test_token'}
)
assert response.status_code == 200
@@ -263,8 +271,7 @@ class TestPipelineExtensionsEndpoint:
async def test_get_extensions(self, quart_test_client):
"""GET /api/v1/pipelines/{uuid}/extensions."""
response = await quart_test_client.get(
'/api/v1/pipelines/test-pipeline-uuid/extensions',
headers={'Authorization': 'Bearer test_token'}
'/api/v1/pipelines/test-pipeline-uuid/extensions', headers={'Authorization': 'Bearer test_token'}
)
# Should return 200 if pipeline found

View File

View File

View File

@@ -0,0 +1,329 @@
"""Integration tests for LangBot Box.
These tests verify the end-to-end behavior of the Box sandbox execution
system. Tests decorated with ``requires_container`` need a real container
runtime (Podman or Docker) and are skipped otherwise.
CI only runs ``tests/unit_tests/``, so these tests never execute in the
CI pipeline. Run them locally with::
pytest tests/integration_tests/ -v
"""
from __future__ import annotations
import asyncio
import logging
import shutil
import socket
import subprocess
from types import SimpleNamespace
import pytest
from langbot.pkg.box.service import BoxService
from langbot_plugin.box.backend import BaseSandboxBackend
from langbot_plugin.box.client import ActionRPCBoxClient
from langbot_plugin.box.errors import BoxBackendUnavailableError
from langbot_plugin.box.models import BoxExecutionStatus, BoxNetworkMode, BoxSpec
from langbot_plugin.box.runtime import BoxRuntime
from langbot_plugin.box.server import BoxServerHandler
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
_logger = logging.getLogger('test.box.integration')
# Default image for integration tests — small and fast to pull.
_TEST_IMAGE = 'alpine:latest'
# ── Skip helpers ──────────────────────────────────────────────────────
def _has_container_runtime() -> bool:
for cmd in ('podman', 'docker'):
if shutil.which(cmd) is None:
continue
try:
result = subprocess.run(
[cmd, 'info'],
capture_output=True,
timeout=10,
)
if result.returncode == 0:
return True
except Exception:
continue
return False
def _can_open_test_socket() -> bool:
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except OSError:
return False
sock.close()
return True
requires_container = pytest.mark.skipif(
not _has_container_runtime(),
reason='no container runtime (podman/docker) available',
)
requires_socket = pytest.mark.skipif(
not _can_open_test_socket(),
reason='local test environment does not permit opening TCP sockets',
)
# ── Helpers ──────────────────────────────────────────────────────────
class _QueueConnection:
"""In-process Connection backed by asyncio Queues — no real IO."""
def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
self._rx = rx
self._tx = tx
async def send(self, message: str) -> None:
await self._tx.put(message)
async def receive(self) -> str:
return await self._rx.get()
async def close(self) -> None:
pass
async def _make_rpc_pair(runtime: BoxRuntime):
"""Create an in-process (ActionRPCBoxClient, server_task, client_task) connected via queues."""
from langbot_plugin.runtime.io.handler import Handler
c2s: asyncio.Queue[str] = asyncio.Queue()
s2c: asyncio.Queue[str] = asyncio.Queue()
client_conn = _QueueConnection(rx=s2c, tx=c2s)
server_conn = _QueueConnection(rx=c2s, tx=s2c)
server_handler = BoxServerHandler(server_conn, runtime)
server_task = asyncio.create_task(server_handler.run())
client_handler = Handler.__new__(Handler)
Handler.__init__(client_handler, client_conn)
client_task = asyncio.create_task(client_handler.run())
client = ActionRPCBoxClient(logger=_logger)
client.set_handler(client_handler)
return client, server_task, client_task
# ── Fixtures ──────────────────────────────────────────────────────────
@pytest.fixture
async def box_client():
"""Yield an ActionRPCBoxClient backed by a real BoxRuntime via in-process RPC."""
runtime = BoxRuntime(logger=_logger)
await runtime.initialize()
client, server_task, client_task = await _make_rpc_pair(runtime)
yield client
server_task.cancel()
client_task.cancel()
await runtime.shutdown()
# ── 1. Simple command execution ───────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_exec_simple_command(box_client: ActionRPCBoxClient):
"""Box starts a simple command and returns stdout."""
spec = BoxSpec(
cmd='echo hello-box',
session_id='int-simple',
workdir='/tmp',
image=_TEST_IMAGE,
)
result = await box_client.execute(spec)
assert result.status == BoxExecutionStatus.COMPLETED
assert result.exit_code == 0
assert 'hello-box' in result.stdout
# ── 2. Session file persistence ───────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_session_persists_files(box_client: ActionRPCBoxClient):
"""Write a file in one exec, read it back in a second exec on the same session."""
sid = 'int-persist'
write_result = await box_client.execute(
BoxSpec(
cmd='echo "hello from file" > /tmp/testfile.txt',
session_id=sid,
workdir='/tmp',
image=_TEST_IMAGE,
)
)
assert write_result.exit_code == 0
read_result = await box_client.execute(
BoxSpec(
cmd='cat /tmp/testfile.txt',
session_id=sid,
workdir='/tmp',
image=_TEST_IMAGE,
)
)
assert read_result.exit_code == 0
assert 'hello from file' in read_result.stdout
# ── 3. Timeout handling ───────────────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_timeout_kills_command(box_client: ActionRPCBoxClient):
"""A long-running command is killed after timeout_sec."""
session_id = 'int-timeout'
spec = BoxSpec(
cmd='sleep 120',
session_id=session_id,
workdir='/tmp',
timeout_sec=3,
image=_TEST_IMAGE,
)
result = await box_client.execute(spec)
assert result.status == BoxExecutionStatus.TIMED_OUT
assert result.exit_code is None
sessions = await box_client.get_sessions()
assert all(session['session_id'] != session_id for session in sessions)
# ── 4. Network isolation ─────────────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_offline_cannot_reach_network(box_client: ActionRPCBoxClient):
"""With network=OFF the sandbox cannot reach the internet."""
spec = BoxSpec(
cmd='wget -q -O /dev/null --timeout=3 http://1.1.1.1 2>&1; exit $?',
session_id='int-offline',
workdir='/tmp',
network=BoxNetworkMode.OFF,
image=_TEST_IMAGE,
)
result = await box_client.execute(spec)
assert result.exit_code != 0
# ── 5. Backend unavailable ───────────────────────────────────────────
class _UnavailableBackend(BaseSandboxBackend):
"""A backend that always reports itself as unavailable."""
name = 'unavailable'
def __init__(self):
super().__init__(logging.getLogger('test'))
async def is_available(self) -> bool:
return False
async def start_session(self, spec):
raise NotImplementedError
async def exec(self, session, spec):
raise NotImplementedError
async def stop_session(self, session):
pass
@requires_socket
@pytest.mark.asyncio
async def test_backend_unavailable_returns_error():
"""When no backend is available the full RPC path returns BoxBackendUnavailableError."""
runtime = BoxRuntime(logger=_logger, backends=[_UnavailableBackend()])
await runtime.initialize()
client, server_task, client_task = await _make_rpc_pair(runtime)
try:
spec = BoxSpec(
cmd='echo hello',
session_id='int-no-backend',
workdir='/tmp',
)
with pytest.raises(BoxBackendUnavailableError):
await client.execute(spec)
finally:
server_task.cancel()
client_task.cancel()
await runtime.shutdown()
# ── 6. Full service-to-runtime path ──────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_full_service_to_remote_runtime(tmp_path):
"""BoxService -> ActionRPCBoxClient -> RPC -> BoxRuntime -> real backend."""
runtime = BoxRuntime(logger=_logger)
await runtime.initialize()
client, server_task, client_task = await _make_rpc_pair(runtime)
try:
host_dir = tmp_path / 'workspace'
host_dir.mkdir()
mock_ap = SimpleNamespace(
logger=_logger,
instance_config=SimpleNamespace(
data={
'box': {
'backend': 'local',
'runtime': {'endpoint': ''},
'local': {
'profile': 'default',
'allowed_mount_roots': [str(tmp_path)],
'default_workspace': str(host_dir),
},
'e2b': {'api_key': '', 'api_url': '', 'template': ''},
}
}
),
)
service = BoxService(mock_ap, client=client)
await service.initialize()
query = pipeline_query.Query.model_construct(query_id=42)
result = await service.execute_tool(
{'command': 'echo service-path'},
query,
)
assert result['ok'] is True
assert result['status'] == 'completed'
assert 'service-path' in result['stdout']
assert result['session_id'] == 'query_42'
finally:
server_task.cancel()
client_task.cancel()
await runtime.shutdown()

View File

@@ -0,0 +1,368 @@
"""Integration tests for Box MCP-related features.
These tests verify managed process lifecycle, WebSocket stdio attach,
session cleanup, and the single-session query API using a real container
runtime.
CI only runs ``tests/unit_tests/``, so these tests never execute in the
CI pipeline. Run them locally with::
pytest tests/integration_tests/box/test_box_mcp_integration.py -v
"""
from __future__ import annotations
import asyncio
import logging
import shutil
import socket
import subprocess
import aiohttp
import pytest
from aiohttp.test_utils import TestServer
from langbot_plugin.box.client import ActionRPCBoxClient
from langbot_plugin.box.errors import BoxManagedProcessNotFoundError, BoxSessionNotFoundError
from langbot_plugin.box.models import BoxManagedProcessSpec, BoxManagedProcessStatus, BoxSpec
from langbot_plugin.box.runtime import BoxRuntime
from langbot_plugin.box.server import BoxServerHandler, create_ws_relay_app
_logger = logging.getLogger('test.box.mcp_integration')
_TEST_IMAGE = 'alpine:latest'
# ── Skip helpers ──────────────────────────────────────────────────────
def _has_container_runtime() -> bool:
for cmd in ('podman', 'docker'):
if shutil.which(cmd) is None:
continue
try:
result = subprocess.run([cmd, 'info'], capture_output=True, timeout=10)
if result.returncode == 0:
return True
except Exception:
continue
return False
def _can_open_test_socket() -> bool:
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except OSError:
return False
sock.close()
return True
requires_container = pytest.mark.skipif(
not _has_container_runtime(),
reason='no container runtime (podman/docker) available',
)
requires_socket = pytest.mark.skipif(
not _can_open_test_socket(),
reason='local test environment does not permit opening TCP sockets',
)
# ── Helpers ──────────────────────────────────────────────────────────
class _QueueConnection:
"""In-process Connection backed by asyncio Queues — no real IO."""
def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
self._rx = rx
self._tx = tx
async def send(self, message: str) -> None:
await self._tx.put(message)
async def receive(self) -> str:
return await self._rx.get()
async def close(self) -> None:
pass
async def _make_rpc_pair(runtime: BoxRuntime):
"""Create an in-process RPC pair connected via queues."""
from langbot_plugin.runtime.io.handler import Handler
c2s: asyncio.Queue[str] = asyncio.Queue()
s2c: asyncio.Queue[str] = asyncio.Queue()
client_conn = _QueueConnection(rx=s2c, tx=c2s)
server_conn = _QueueConnection(rx=c2s, tx=s2c)
server_handler = BoxServerHandler(server_conn, runtime)
server_task = asyncio.create_task(server_handler.run())
client_handler = Handler.__new__(Handler)
Handler.__init__(client_handler, client_conn)
client_task = asyncio.create_task(client_handler.run())
client = ActionRPCBoxClient(logger=_logger)
client.set_handler(client_handler)
return client, server_task, client_task
# ── Fixtures ──────────────────────────────────────────────────────────
@pytest.fixture
async def box_server():
"""Yield a (ws_relay_url, ActionRPCBoxClient) backed by a real BoxRuntime."""
runtime = BoxRuntime(logger=_logger)
await runtime.initialize()
# Start ws relay for managed process attach
ws_app = create_ws_relay_app(runtime)
ws_server = TestServer(ws_app)
await ws_server.start_server()
client, server_task, client_task = await _make_rpc_pair(runtime)
ws_relay_url = str(ws_server.make_url(''))
yield ws_relay_url, client
server_task.cancel()
client_task.cancel()
await runtime.shutdown()
await ws_server.close()
# ── 1. Managed process lifecycle ─────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_managed_process_start_and_query(box_server):
"""Start a managed process and query its status."""
ws_relay_url, client = box_server
# Create session
spec = BoxSpec(
cmd='',
session_id='mcp-int-lifecycle',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Start a managed process that stays alive
proc_spec = BoxManagedProcessSpec(
command='sh',
args=['-c', 'while true; do sleep 1; done'],
cwd='/tmp',
)
info = await client.start_managed_process('mcp-int-lifecycle', proc_spec)
assert info.status == BoxManagedProcessStatus.RUNNING
# Query it
info2 = await client.get_managed_process('mcp-int-lifecycle')
assert info2.status == BoxManagedProcessStatus.RUNNING
assert info2.command == 'sh'
# Stop only the managed process while keeping the session available
await client.stop_managed_process('mcp-int-lifecycle')
with pytest.raises(BoxManagedProcessNotFoundError):
await client.get_managed_process('mcp-int-lifecycle')
session_info = await client.get_session('mcp-int-lifecycle')
assert session_info['session_id'] == 'mcp-int-lifecycle'
# Cleanup
await client.delete_session('mcp-int-lifecycle')
# ── 2. WebSocket stdio attach ────────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_ws_stdio_attach_echo(box_server):
"""Attach to a managed process via WebSocket and verify bidirectional IO."""
ws_relay_url, client = box_server
spec = BoxSpec(
cmd='',
session_id='mcp-int-ws',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Start a cat process (echoes stdin to stdout)
proc_spec = BoxManagedProcessSpec(
command='cat',
args=[],
cwd='/tmp',
)
await client.start_managed_process('mcp-int-ws', proc_spec)
# Connect via WebSocket (ws relay)
ws_url = client.get_managed_process_websocket_url('mcp-int-ws', ws_relay_url)
session = aiohttp.ClientSession()
try:
async with session.ws_connect(ws_url) as ws:
# Send a line
await ws.send_str('hello from test')
# Expect to receive it back (cat echoes)
msg = await asyncio.wait_for(ws.receive(), timeout=5)
assert msg.type == aiohttp.WSMsgType.TEXT
assert 'hello from test' in msg.data
finally:
await session.close()
await client.delete_session('mcp-int-ws')
# ── 3. Session cleanup removes container ─────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_delete_session_cleans_up(box_server):
"""After deleting a session, it should no longer exist."""
ws_relay_url, client = box_server
spec = BoxSpec(
cmd='',
session_id='mcp-int-cleanup',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Start a process
proc_spec = BoxManagedProcessSpec(
command='sleep',
args=['3600'],
cwd='/tmp',
)
await client.start_managed_process('mcp-int-cleanup', proc_spec)
# Delete
await client.delete_session('mcp-int-cleanup')
# Session should be gone
with pytest.raises(BoxSessionNotFoundError):
await client.get_session('mcp-int-cleanup')
# ── 4. GET session details ────────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_get_session_returns_details(box_server):
"""Get single session returns session details and managed process info."""
ws_relay_url, client = box_server
spec = BoxSpec(
cmd='',
session_id='mcp-int-get',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Query without managed process
info = await client.get_session('mcp-int-get')
assert info['session_id'] == 'mcp-int-get'
assert info['image'] == _TEST_IMAGE
assert 'managed_process' not in info
# Start a process and query again
proc_spec = BoxManagedProcessSpec(
command='sleep',
args=['3600'],
cwd='/tmp',
)
await client.start_managed_process('mcp-int-get', proc_spec)
info2 = await client.get_session('mcp-int-get')
assert info2['session_id'] == 'mcp-int-get'
assert 'managed_process' in info2
assert info2['managed_process']['status'] == BoxManagedProcessStatus.RUNNING.value
await client.delete_session('mcp-int-get')
# ── 5. Process exit detected ────────────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_process_exit_detected(box_server):
"""When a managed process exits, its status should reflect EXITED."""
ws_relay_url, client = box_server
spec = BoxSpec(
cmd='',
session_id='mcp-int-exit',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Start a process that exits immediately
proc_spec = BoxManagedProcessSpec(
command='sh',
args=['-c', 'echo done && exit 0'],
cwd='/tmp',
)
await client.start_managed_process('mcp-int-exit', proc_spec)
# Wait a bit for process to exit
await asyncio.sleep(2)
info = await client.get_managed_process('mcp-int-exit')
assert info.status == BoxManagedProcessStatus.EXITED
assert info.exit_code == 0
await client.delete_session('mcp-int-exit')
# ── 6. Instance ID orphan cleanup ───────────────────────────────────
@requires_container
@requires_socket
@pytest.mark.asyncio
async def test_orphan_cleanup_preserves_own_containers(box_server):
"""Orphan cleanup should not remove containers belonging to the current instance."""
ws_relay_url, client = box_server
# Create a session (container gets current instance ID label)
spec = BoxSpec(
cmd='',
session_id='mcp-int-orphan',
workdir='/tmp',
image=_TEST_IMAGE,
)
await client.create_session(spec)
# Verify session exists
sessions = await client.get_sessions()
assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
# Trigger status check (which doesn't clean up own containers)
status = await client.get_status()
assert status['active_sessions'] >= 1
# Our session should still exist
sessions = await client.get_sessions()
assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
await client.delete_session('mcp-int-orphan')

View File

@@ -0,0 +1,106 @@
from __future__ import annotations
from types import SimpleNamespace
from unittest.mock import Mock
import pytest
from langbot_plugin.box.client import ActionRPCBoxClient
from langbot.pkg.box.connector import BoxRuntimeConnector
def make_app(logger: Mock, runtime_endpoint: str = ''):
return SimpleNamespace(
logger=logger,
instance_config=SimpleNamespace(
data={
'box': {
'backend': 'local',
'runtime': {'endpoint': runtime_endpoint},
'local': {
'profile': 'default',
'allowed_mount_roots': [],
'default_workspace': '',
},
'e2b': {'api_key': '', 'api_url': '', 'template': ''},
}
}
),
)
def test_box_runtime_connector_stdio_when_no_url(monkeypatch: pytest.MonkeyPatch):
"""Without runtime.endpoint, on a non-Docker Unix platform, use stdio."""
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
connector = BoxRuntimeConnector(make_app(Mock()))
assert connector._uses_websocket() is False
assert isinstance(connector.client, ActionRPCBoxClient)
def test_box_runtime_connector_ws_when_url_configured(monkeypatch: pytest.MonkeyPatch):
"""With an explicit runtime.endpoint, always use WebSocket."""
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
logger = Mock()
connector = BoxRuntimeConnector(make_app(logger, runtime_endpoint='http://box-runtime:5410'))
assert connector._uses_websocket() is True
assert isinstance(connector.client, ActionRPCBoxClient)
def test_box_runtime_connector_ws_in_docker(monkeypatch: pytest.MonkeyPatch):
"""Inside Docker (no explicit URL), use WebSocket to reach a sibling container."""
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'docker')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
connector = BoxRuntimeConnector(make_app(Mock()))
assert connector._uses_websocket() is True
assert connector.ws_relay_base_url == 'http://langbot_box:5410'
def test_box_runtime_connector_ws_with_standalone_flag(monkeypatch: pytest.MonkeyPatch):
"""With --standalone-box flag, use WebSocket even on a local Unix platform."""
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', True)
connector = BoxRuntimeConnector(make_app(Mock()))
assert connector._uses_websocket() is True
def test_box_runtime_connector_ws_relay_url_default(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
connector = BoxRuntimeConnector(make_app(Mock()))
assert connector.ws_relay_base_url == 'http://127.0.0.1:5410'
def test_box_runtime_connector_ws_relay_url_explicit(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
connector = BoxRuntimeConnector(make_app(Mock(), runtime_endpoint='http://box-runtime:5410'))
assert connector.ws_relay_base_url == 'http://box-runtime:5410'
def test_box_runtime_connector_dispose_terminates_subprocess(monkeypatch: pytest.MonkeyPatch):
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
logger = Mock()
connector = BoxRuntimeConnector(make_app(logger))
subprocess = Mock()
subprocess.returncode = None
handler_task = Mock()
ctrl_task = Mock()
connector._subprocess = subprocess
connector._handler_task = handler_task
connector._ctrl_task = ctrl_task
connector.dispose()
subprocess.terminate.assert_called_once()
handler_task.cancel.assert_called_once()
ctrl_task.cancel.assert_called_once()
assert connector._handler_task is None
assert connector._ctrl_task is None

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,147 @@
from __future__ import annotations
import os
import tempfile
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
from langbot.pkg.box.workspace import (
BoxWorkspaceSession,
classify_python_workspace,
infer_workspace_host_path,
rewrite_mounted_path,
wrap_python_command_with_env,
)
def test_rewrite_mounted_path_translates_host_prefix():
result = rewrite_mounted_path('/tmp/demo/project/app.py', '/tmp/demo/project')
assert result == '/workspace/app.py'
def test_infer_workspace_host_path_unwraps_virtualenv_bin_dir():
with tempfile.TemporaryDirectory() as tmpdir:
project_root = os.path.join(tmpdir, 'project')
os.makedirs(os.path.join(project_root, '.venv', 'bin'))
python_bin = os.path.join(project_root, '.venv', 'bin', 'python')
script = os.path.join(project_root, 'server.py')
with open(python_bin, 'w', encoding='utf-8') as handle:
handle.write('')
with open(script, 'w', encoding='utf-8') as handle:
handle.write('print("ok")\n')
result = infer_workspace_host_path(python_bin, [script])
assert result == os.path.realpath(project_root)
def test_classify_python_workspace_detects_package_and_requirements():
with tempfile.TemporaryDirectory() as tmpdir:
assert classify_python_workspace(tmpdir) is None
with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as handle:
handle.write('requests\n')
assert classify_python_workspace(tmpdir) == 'requirements'
with open(os.path.join(tmpdir, 'pyproject.toml'), 'w', encoding='utf-8') as handle:
handle.write('[project]\nname = "demo"\n')
assert classify_python_workspace(tmpdir) == 'package'
def test_wrap_python_command_with_env_contains_bootstrap_and_command():
command = wrap_python_command_with_env('python script.py')
assert 'python -m venv "$_LB_VENV_DIR"' in command
assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
assert command.rstrip().endswith('python script.py')
@pytest.mark.asyncio
async def test_workspace_session_execute_for_query_uses_session_payload():
box_service = SimpleNamespace(execute_spec_payload=AsyncMock(return_value={'ok': True}))
workspace = BoxWorkspaceSession(
box_service,
'skill-person_123-demo',
host_path='/tmp/project',
host_path_mode='rw',
env={'FOO': 'bar'},
)
query = SimpleNamespace(query_id='q1')
result = await workspace.execute_for_query(query, 'python run.py', workdir='/workspace', timeout_sec=30)
assert result == {'ok': True}
payload = box_service.execute_spec_payload.await_args.args[0]
assert payload == {
'session_id': 'skill-person_123-demo',
'workdir': '/workspace',
'env': {'FOO': 'bar'},
'persistent': False,
'host_path': '/tmp/project',
'host_path_mode': 'rw',
'cmd': 'python run.py',
'timeout_sec': 30,
}
@pytest.mark.asyncio
async def test_workspace_session_start_managed_process_rewrites_command_and_args():
box_service = SimpleNamespace(start_managed_process=AsyncMock(return_value={'status': 'running'}))
workspace = BoxWorkspaceSession(
box_service,
'mcp-u1',
host_path='/tmp/project',
host_path_mode='ro',
)
result = await workspace.start_managed_process(
'/tmp/project/.venv/bin/python',
['/tmp/project/server.py', '--config', '/tmp/project/config.json'],
env={'TOKEN': '1'},
)
assert result == {'status': 'running'}
session_id = box_service.start_managed_process.await_args.args[0]
payload = box_service.start_managed_process.await_args.args[1]
assert session_id == 'mcp-u1'
assert payload == {
'command': 'python',
'args': ['/workspace/server.py', '--config', '/workspace/config.json'],
'env': {'TOKEN': '1'},
'cwd': '/workspace',
'process_id': 'default',
}
def test_workspace_session_build_session_payload_keeps_generic_workspace_shape():
workspace = BoxWorkspaceSession(
Mock(),
'workspace-1',
host_path='/tmp/project',
host_path_mode='rw',
env={'FOO': 'bar'},
network='on',
read_only_rootfs=False,
image='python:3.11',
cpus=1.0,
memory_mb=512,
pids_limit=128,
)
assert workspace.build_session_payload() == {
'session_id': 'workspace-1',
'workdir': '/workspace',
'env': {'FOO': 'bar'},
'persistent': False,
'network': 'on',
'read_only_rootfs': False,
'host_path': '/tmp/project',
'host_path_mode': 'rw',
'image': 'python:3.11',
'cpus': 1.0,
'memory_mb': 512,
'pids_limit': 128,
}

View File

@@ -12,6 +12,12 @@ from __future__ import annotations
import pytest
from unittest.mock import AsyncMock, Mock
# Preload pipelinemgr so the pipeline.stage module is fully initialised before
# any individual stage test (e.g. preproc, longtext) tries to import it. Without
# this, running a stage test in isolation triggers a circular-import error:
# stage.py → core.app → pipelinemgr → stage.stage_class (not yet bound).
import langbot.pkg.pipeline.pipelinemgr # noqa: F401
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
import langbot_plugin.api.entities.builtin.platform.message as platform_message
import langbot_plugin.api.entities.builtin.platform.events as platform_events
@@ -34,6 +40,9 @@ class MockApplication:
self.query_pool = self._create_mock_query_pool()
self.instance_config = self._create_mock_instance_config()
self.task_mgr = self._create_mock_task_manager()
# Skill manager is optional; PreProcessor only touches it for the
# local-agent runner. None keeps the skill-binding branch inert.
self.skill_mgr = None
def _create_mock_logger(self):
logger = Mock()

View File

@@ -0,0 +1,78 @@
from __future__ import annotations
from unittest.mock import Mock
import pytest
import langbot_plugin.api.entities.builtin.provider.message as provider_message
# TODO: unskip once the handler ↔ app circular import is resolved
pytest.skip(
'circular import in handler ↔ app; will be unblocked once resolved',
allow_module_level=True,
)
from langbot.pkg.pipeline.process.handler import MessageHandler # noqa: E402
class _StubHandler(MessageHandler):
async def handle(self, query):
raise NotImplementedError
handler = _StubHandler(ap=Mock())
def test_chat_handler_formats_tool_call_request_log():
result = provider_message.Message(
role='assistant',
content='',
tool_calls=[
provider_message.ToolCall(
id='call-1',
type='function',
function=provider_message.FunctionCall(name='exec', arguments='{}'),
)
],
)
summary = handler.format_result_log(result)
assert summary == 'assistant: requested tools: exec'
def test_chat_handler_formats_tool_result_log():
result = provider_message.Message(
role='tool',
content='{"status":"completed","exit_code":0,"backend":"podman","stdout":"42\\n"}',
tool_call_id='call-1',
)
summary = handler.format_result_log(result)
# Tool results use generic cut_str truncation
assert summary is not None
assert summary.startswith('tool: {"status":"com')
assert summary.endswith('...')
def test_chat_handler_formats_tool_error_log():
result = provider_message.MessageChunk(
role='tool',
content='err: host_path must point to an existing directory on the host',
tool_call_id='call-1',
is_final=True,
)
summary = handler.format_result_log(result)
assert summary is not None
assert summary.startswith('tool error: err: host_path must')
assert summary.endswith('...')
def test_chat_handler_skips_empty_assistant_log():
result = provider_message.Message(role='assistant', content='')
summary = handler.format_result_log(result)
assert summary is None

View File

@@ -14,33 +14,43 @@ import json
import sys
from unittest.mock import AsyncMock, MagicMock, Mock, patch
# Break the circular import chain before importing n8nsvapi:
import pytest
import langbot_plugin.api.entities.builtin.provider.message as provider_message
# Break the circular import chain while importing n8nsvapi:
# n8nsvapi → runner → app → pipelinemgr → all runners → runner (partially init)
_mock_runner = MagicMock()
_mock_runner.runner_class = lambda name: (lambda cls: cls) # no-op decorator
_mock_runner.RequestRunner = object
_mocked_imports = {
'langbot.pkg.provider.runner': _mock_runner,
# The stubs are restored in a ``finally`` block so this module does NOT pollute
# sys.modules for other test modules (e.g. ones importing the real
# LocalAgentRunner, which would otherwise inherit ``object`` and break).
# Mirrors master's intent but uses try/finally so a raised import doesn't
# leave the global namespace in a stubbed state, and includes
# ``langbot.pkg.utils.httpclient`` which master didn't stub.
_runner_stub = MagicMock()
_runner_stub.runner_class = lambda name: (lambda cls: cls) # no-op decorator
_runner_stub.RequestRunner = object
_import_stubs = {
'langbot.pkg.provider.runner': _runner_stub,
'langbot.pkg.core.app': MagicMock(),
'langbot.pkg.utils.httpclient': MagicMock(),
}
_original_imports = {name: sys.modules.get(name) for name in _mocked_imports}
sys.modules.update(_mocked_imports)
import pytest # noqa: E402
import langbot_plugin.api.entities.builtin.provider.message as provider_message # noqa: E402
from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner # noqa: E402
for _name, _original in _original_imports.items():
if _original is None:
sys.modules.pop(_name, None)
else:
sys.modules[_name] = _original
_saved_modules = {name: sys.modules.get(name) for name in _import_stubs}
for _name, _stub in _import_stubs.items():
sys.modules[_name] = _stub
try:
from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner
finally:
for _name, _original in _saved_modules.items():
if _original is None:
sys.modules.pop(_name, None)
else:
sys.modules[_name] = _original
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def make_runner(output_key: str = 'response') -> N8nServiceAPIRunner:
ap = Mock()
ap.logger = Mock()
@@ -83,6 +93,7 @@ async def collect_chunks(runner: N8nServiceAPIRunner, chunks: list[bytes | str])
# _process_response: stream format (type:item/end)
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_stream_format_single_item():
"""Single item + end in one chunk yields final chunk with full content."""
@@ -165,6 +176,7 @@ async def test_stream_format_no_spurious_empty_yield():
# _process_response: plain JSON fallback
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_plain_json_with_output_key():
"""Plain JSON with matching output_key extracts value via output_key."""
@@ -235,6 +247,7 @@ async def test_invalid_json_returns_raw_text():
# _call_webhook: output type depends on is_stream
# ---------------------------------------------------------------------------
def make_query(is_stream: bool):
"""Build a minimal Query mock."""
query = Mock()

View File

@@ -0,0 +1,242 @@
from __future__ import annotations
import json
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
import langbot_plugin.api.entities.builtin.provider.message as provider_message
import langbot_plugin.api.entities.builtin.provider.session as provider_session
from langbot.pkg.provider.runners.localagent import LocalAgentRunner
class RecordingProvider:
def __init__(self):
self.requests: list[dict] = []
async def invoke_llm(self, query, model, messages, funcs, extra_args=None, remove_think=None):
self.requests.append(
{
'messages': list(messages),
'funcs': list(funcs),
'remove_think': remove_think,
}
)
if len(self.requests) == 1:
return provider_message.Message(
role='assistant',
content='Let me calculate that exactly.',
tool_calls=[
provider_message.ToolCall(
id='call-1',
type='function',
function=provider_message.FunctionCall(
name='exec',
arguments=json.dumps(
{'command': ("python - <<'PY'\nnums = [1, 2, 3, 4]\nprint(sum(nums) / len(nums))\nPY")}
),
),
)
],
)
tool_result = json.loads(messages[-1].content)
return provider_message.Message(
role='assistant',
content=f'The average is {tool_result["stdout"]}.',
)
class RecordingStreamProvider:
def __init__(self):
self.stream_requests: list[dict] = []
def invoke_llm_stream(self, query, model, messages, funcs, extra_args=None, remove_think=None):
self.stream_requests.append(
{
'messages': list(messages),
'funcs': list(funcs),
'remove_think': remove_think,
}
)
async def _stream():
if len(self.stream_requests) == 1:
yield provider_message.MessageChunk(
role='assistant',
tool_calls=[
provider_message.ToolCall(
id='call-1',
type='function',
function=provider_message.FunctionCall(
name='exec',
arguments=json.dumps({'command': "python -c 'print(1)'"}),
),
)
],
is_final=True,
)
return
yield provider_message.MessageChunk(
role='assistant',
content='Tool execution failed.',
is_final=True,
)
return _stream()
def make_query() -> pipeline_query.Query:
adapter = AsyncMock()
adapter.is_stream_output_supported = AsyncMock(return_value=False)
return pipeline_query.Query.model_construct(
query_id='avg-query',
launcher_type=provider_session.LauncherTypes.PERSON,
launcher_id=12345,
sender_id=12345,
message_chain=[],
message_event=None,
adapter=adapter,
pipeline_uuid='pipeline-uuid',
bot_uuid='bot-uuid',
pipeline_config={
'ai': {
'runner': {'runner': 'local-agent'},
'local-agent': {'model': {'primary': 'test-model-uuid', 'fallbacks': []}, 'prompt': 'test-prompt'},
},
'output': {'misc': {'remove-think': False}},
},
prompt=SimpleNamespace(messages=[]),
messages=[],
user_message=provider_message.Message(
role='user',
content='Please calculate the average of 1, 2, 3, and 4.',
),
use_funcs=[SimpleNamespace(name='exec')],
use_llm_model_uuid='test-model-uuid',
variables={},
)
@pytest.mark.asyncio
async def test_localagent_uses_exec_for_exact_calculation():
provider = RecordingProvider()
model = SimpleNamespace(
provider=provider,
model_entity=SimpleNamespace(
uuid='test-model-uuid',
name='test-model',
abilities=['func_call'],
extra_args={},
),
)
tool_manager = SimpleNamespace(
execute_func_call=AsyncMock(
return_value={
'session_id': 'avg-query',
'backend': 'podman',
'status': 'completed',
'ok': True,
'exit_code': 0,
'stdout': '2.5',
'stderr': '',
'duration_ms': 18,
}
)
)
app = SimpleNamespace(
logger=Mock(),
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
tool_mgr=tool_manager,
rag_mgr=SimpleNamespace(),
box_service=SimpleNamespace(
get_system_guidance=Mock(
return_value=(
'When the exec tool is available, use it for exact calculations, statistics, '
'structured data parsing, and code execution instead of estimating mentally. '
'Unless the user explicitly asks for the script, code, or implementation details, '
'do not include the generated script in the final answer. '
'A default workspace is mounted at /workspace for file tasks.'
)
),
),
skill_mgr=SimpleNamespace(
get_skills_for_pipeline=AsyncMock(return_value=[]),
detect_skill_activation=AsyncMock(return_value=None),
build_activation_prompt=Mock(return_value=None),
),
)
runner = LocalAgentRunner(app, pipeline_config={})
query = make_query()
results = [message async for message in runner.run(query)]
assert [message.role for message in results] == ['assistant', 'tool', 'assistant']
assert results[-1].content == 'The average is 2.5.'
tool_manager.execute_func_call.assert_awaited_once()
tool_name, tool_parameters = tool_manager.execute_func_call.await_args.args[:2]
assert tool_name == 'exec'
assert 'print(sum(nums) / len(nums))' in tool_parameters['command']
first_request = provider.requests[0]
assert any(
message.role == 'system'
and 'exec' in str(message.content)
and 'exact calculations' in str(message.content)
and 'Unless the user explicitly asks for the script' in str(message.content)
and '/workspace' in str(message.content)
for message in first_request['messages']
)
assert [tool.name for tool in first_request['funcs']] == ['exec']
@pytest.mark.asyncio
async def test_localagent_streaming_tool_error_yields_message_chunks():
provider = RecordingStreamProvider()
model = SimpleNamespace(
provider=provider,
model_entity=SimpleNamespace(
uuid='test-model-uuid',
name='test-model',
abilities=['func_call'],
extra_args={},
),
)
adapter = AsyncMock()
adapter.is_stream_output_supported = AsyncMock(return_value=True)
query = make_query()
query.adapter = adapter
app = SimpleNamespace(
logger=Mock(),
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
tool_mgr=SimpleNamespace(execute_func_call=AsyncMock(side_effect=RuntimeError('boom'))),
rag_mgr=SimpleNamespace(),
box_service=SimpleNamespace(
get_system_guidance=Mock(return_value='sandbox guidance'),
),
skill_mgr=SimpleNamespace(
get_skills_for_pipeline=AsyncMock(return_value=[]),
detect_skill_activation=AsyncMock(return_value=None),
build_activation_prompt=Mock(return_value=None),
),
)
runner = LocalAgentRunner(app, pipeline_config={})
results = [message async for message in runner.run(query)]
assert all(isinstance(message, provider_message.MessageChunk) for message in results)
assert any(message.role == 'tool' and message.content == 'err: boom' for message in results)

View File

@@ -0,0 +1,712 @@
"""Tests for MCP Box integration: path rewriting, host_path inference, config model, payloads.
Uses importlib.util.spec_from_file_location to load mcp.py directly without
triggering the circular import chain through the app module.
"""
from __future__ import annotations
import importlib
import importlib.util
import os
import sys
import tempfile
import types
from contextlib import asynccontextmanager
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
# ---------------------------------------------------------------------------
# Load mcp.py directly from file path, with stub dependencies
# ---------------------------------------------------------------------------
def _stub_module(fqn: str, attrs: dict | None = None, is_package: bool = False):
"""Create or return a stub module and register it in sys.modules."""
if fqn in sys.modules:
mod = sys.modules[fqn]
else:
mod = types.ModuleType(fqn)
mod.__spec__ = importlib.machinery.ModuleSpec(fqn, None, is_package=is_package)
if is_package:
mod.__path__ = []
sys.modules[fqn] = mod
parts = fqn.rsplit('.', 1)
if len(parts) == 2 and parts[0] in sys.modules:
setattr(sys.modules[parts[0]], parts[1], mod)
if attrs:
for k, v in attrs.items():
setattr(mod, k, v)
return mod
@pytest.fixture(scope='module', autouse=True)
def mcp_module():
"""Load mcp.py with minimal stubs to avoid circular imports."""
saved = {}
def _save_and_stub(name, attrs=None, is_package=False):
saved[name] = sys.modules.get(name)
# Don't overwrite modules that already exist (from other test modules)
if name in sys.modules:
return
_stub_module(name, attrs, is_package)
# Stub entire dependency chains as packages / modules
_save_and_stub('langbot_plugin', is_package=True)
_save_and_stub('langbot_plugin.api', is_package=True)
_save_and_stub('langbot_plugin.api.entities', is_package=True)
_save_and_stub('langbot_plugin.api.entities.events', is_package=True)
_save_and_stub('langbot_plugin.api.entities.events.pipeline_query', {})
_save_and_stub('langbot_plugin.api.entities.builtin', is_package=True)
_save_and_stub('langbot_plugin.api.entities.builtin.resource', is_package=True)
_save_and_stub(
'langbot_plugin.api.entities.builtin.resource.tool',
{
'LLMTool': type('LLMTool', (), {}),
},
)
_save_and_stub('langbot_plugin.api.entities.builtin.provider', is_package=True)
_save_and_stub('langbot_plugin.api.entities.builtin.provider.message', {})
_save_and_stub('sqlalchemy', {'select': Mock()})
_save_and_stub('httpx', {'AsyncClient': Mock()})
_save_and_stub('mcp', {'ClientSession': Mock, 'StdioServerParameters': Mock}, is_package=True)
_save_and_stub('mcp.client', is_package=True)
_save_and_stub('mcp.client.stdio', {'stdio_client': Mock()})
_save_and_stub('mcp.client.sse', {'sse_client': Mock()})
_save_and_stub('mcp.client.streamable_http', {'streamable_http_client': Mock()})
_save_and_stub('mcp.client.websocket', {'websocket_client': Mock()})
# Stub the provider.tools.loader (source of circular import)
_save_and_stub('langbot', is_package=True)
_save_and_stub('langbot.pkg', is_package=True)
_save_and_stub('langbot.pkg.provider', is_package=True)
_save_and_stub('langbot.pkg.provider.tools', is_package=True)
_save_and_stub(
'langbot.pkg.provider.tools.loader',
{
'ToolLoader': type('ToolLoader', (), {'__init__': lambda self, ap: None}),
},
)
_save_and_stub('langbot.pkg.provider.tools.loaders', is_package=True)
_save_and_stub('langbot.pkg.core', is_package=True)
_save_and_stub('langbot.pkg.core.app', {'Application': type('Application', (), {})})
_save_and_stub('langbot.pkg.entity', is_package=True)
_save_and_stub('langbot.pkg.entity.persistence', is_package=True)
_save_and_stub('langbot.pkg.entity.persistence.mcp', {})
# box models
import enum as _enum
class _BPS(str, _enum.Enum):
RUNNING = 'running'
EXITED = 'exited'
_save_and_stub('langbot_plugin.box', is_package=True)
_save_and_stub('langbot_plugin.box.models', {'BoxManagedProcessStatus': _BPS})
# Now load mcp.py via spec_from_file_location
mod_fqn = 'langbot.pkg.provider.tools.loaders.mcp'
sys.modules.pop(mod_fqn, None)
mcp_path = os.path.join(
os.path.dirname(__file__),
'..',
'..',
'..',
'src',
'langbot',
'pkg',
'provider',
'tools',
'loaders',
'mcp.py',
)
mcp_path = os.path.normpath(mcp_path)
pkg_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(mcp_path))))
sys.modules['langbot.pkg'].__path__ = [pkg_root]
sys.modules['langbot.pkg.provider.tools.loaders'].__path__ = [os.path.dirname(mcp_path)]
spec = importlib.util.spec_from_file_location(mod_fqn, mcp_path)
mod = importlib.util.module_from_spec(spec)
sys.modules[mod_fqn] = mod
spec.loader.exec_module(mod)
yield mod
# Cleanup
sys.modules.pop(mod_fqn, None)
sys.modules.pop('langbot.pkg.provider.tools.loaders.mcp_stdio', None)
sys.modules.pop('langbot.pkg.box.workspace', None)
for name in reversed(list(saved)):
if saved[name] is None:
sys.modules.pop(name, None)
else:
sys.modules[name] = saved[name]
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_ap():
ap = Mock()
ap.logger = Mock()
ap.box_service = Mock()
return ap
def _make_session(mcp_module, server_config: dict, ap=None):
if ap is None:
ap = _make_ap()
return mcp_module.RuntimeMCPSession(
server_name=server_config.get('name', 'test-server'),
server_config=server_config,
enable=True,
ap=ap,
)
# ── MCPServerBoxConfig ──────────────────────────────────────────────
class TestMCPServerBoxConfig:
def test_default_values(self, mcp_module):
cfg = mcp_module.MCPServerBoxConfig.model_validate({})
assert cfg.image is None
assert cfg.network == 'on'
assert cfg.host_path is None
assert cfg.host_path_mode == 'ro'
assert cfg.env == {}
assert cfg.startup_timeout_sec == 120
assert cfg.cpus is None
assert cfg.memory_mb is None
assert cfg.pids_limit is None
assert cfg.read_only_rootfs is None
def test_custom_values(self, mcp_module):
cfg = mcp_module.MCPServerBoxConfig.model_validate(
{
'image': 'node:20',
'network': 'on',
'host_path': '/home/user/mcp',
'host_path_mode': 'rw',
'env': {'FOO': 'bar'},
'startup_timeout_sec': 60,
'cpus': 2.0,
'memory_mb': 1024,
'pids_limit': 256,
'read_only_rootfs': False,
}
)
assert cfg.image == 'node:20'
assert cfg.network == 'on'
assert cfg.cpus == 2.0
assert cfg.memory_mb == 1024
def test_extra_fields_ignored(self, mcp_module):
cfg = mcp_module.MCPServerBoxConfig.model_validate(
{
'image': 'node:20',
'unknown_field': 'whatever',
}
)
assert cfg.image == 'node:20'
assert not hasattr(cfg, 'unknown_field')
# ── Path Rewriting ──────────────────────────────────────────────────
class TestRewritePath:
def test_no_host_path_returns_unchanged(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
assert s._rewrite_path('/some/path', None) == '/some/path'
def test_empty_path_returns_empty(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
assert s._rewrite_path('', '/home/user/mcp') == ''
def test_prefix_match_rewrites(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
result = s._rewrite_path('/home/user/mcp/server.py', '/home/user/mcp')
assert result == '/workspace/server.py'
def test_exact_match_rewrites_to_workspace(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
result = s._rewrite_path('/home/user/mcp', '/home/user/mcp')
assert result == '/workspace'
def test_non_matching_path_unchanged(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
result = s._rewrite_path('/opt/other/server.py', '/home/user/mcp')
assert result == '/opt/other/server.py'
def test_similar_prefix_not_rewritten(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
result = s._rewrite_path('/home/user/mcp-other/file.py', '/home/user/mcp')
assert result == '/home/user/mcp-other/file.py'
def test_nested_subpath_rewrites(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
result = s._rewrite_path('/home/user/mcp/src/lib/main.py', '/home/user/mcp')
assert result == '/workspace/src/lib/main.py'
# ── host_path Inference ─────────────────────────────────────────────
class TestInferHostPath:
def test_no_absolute_paths_returns_none(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': ['server.py'],
},
)
assert s._infer_host_path() is None
def test_nonexistent_path_returns_none(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': '/nonexistent/path/to/python',
'args': [],
},
)
assert s._infer_host_path() is None
def test_existing_absolute_path_infers_directory(self, mcp_module):
with tempfile.NamedTemporaryFile(suffix='.py') as f:
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [f.name],
},
)
result = s._infer_host_path()
assert result is not None
assert result == os.path.dirname(os.path.realpath(f.name))
# ── Build Box Session Payload ───────────────────────────────────────
class TestBuildBoxSessionPayload:
def test_minimal_config(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
payload = s._build_box_session_payload('session-123')
assert payload['session_id'] == 'session-123'
assert payload['workdir'] == '/workspace'
assert payload['env'] == {}
assert 'host_path' not in payload
def test_with_host_path(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
'box': {'host_path': '/home/user/mcp', 'host_path_mode': 'ro'},
},
)
payload = s._build_box_session_payload('session-123')
assert payload['host_path'] == '/home/user/mcp'
assert payload['host_path_mode'] == 'ro'
def test_optional_fields_included_when_set(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
'box': {'image': 'node:20', 'cpus': 2.0, 'memory_mb': 1024, 'pids_limit': 256},
},
)
payload = s._build_box_session_payload('session-123')
assert payload['image'] == 'node:20'
assert payload['cpus'] == 2.0
assert payload['memory_mb'] == 1024
assert payload['pids_limit'] == 256
def test_none_fields_excluded(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
payload = s._build_box_session_payload('session-123')
assert 'image' not in payload
assert 'cpus' not in payload
# ── Build Box Process Payload ───────────────────────────────────────
class TestBuildBoxProcessPayload:
def test_basic_payload(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': ['server.py'],
'env': {'KEY': 'val'},
},
)
payload = s._build_box_process_payload()
assert payload['command'] == 'python'
assert payload['args'] == ['server.py']
assert payload['env'] == {'KEY': 'val'}
assert payload['cwd'] == '/workspace'
def test_path_rewriting_applied(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': '/home/user/mcp/venv/bin/python',
'args': ['/home/user/mcp/server.py', '--config', '/home/user/mcp/config.json'],
'env': {},
'box': {'host_path': '/home/user/mcp'},
},
)
payload = s._build_box_process_payload()
# venv python is replaced with plain 'python' (deps installed in-container)
assert payload['command'] == 'python'
assert payload['args'] == ['/workspace/server.py', '--config', '/workspace/config.json']
def test_non_matching_args_not_rewritten(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': ['/opt/other/server.py', '--flag'],
'env': {},
'box': {'host_path': '/home/user/mcp'},
},
)
payload = s._build_box_process_payload()
assert payload['command'] == 'python'
assert payload['args'] == ['/opt/other/server.py', '--flag']
# ── get_runtime_info_dict ───────────────────────────────────────────
class TestGetRuntimeInfoDict:
def test_non_stdio_session(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'test-uuid',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
info = s.get_runtime_info_dict()
assert info['status'] == 'connecting'
assert 'box_session_id' not in info
def test_runtime_tools_include_parameters(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'test-uuid',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
s.functions = [
SimpleNamespace(
name='create-service',
description='Create a service',
parameters={
'type': 'object',
'properties': {
'project_id': {'type': 'string'},
},
'required': ['project_id'],
},
)
]
info = s.get_runtime_info_dict()
assert info['tools'][0]['parameters']['properties']['project_id']['type'] == 'string'
assert info['tools'][0]['parameters']['required'] == ['project_id']
def test_stdio_session_includes_box_info(self, mcp_module):
ap = _make_ap()
ap.box_service.available = True
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'test-uuid',
'mode': 'stdio',
'command': 'python',
'args': [],
},
ap=ap,
)
info = s.get_runtime_info_dict()
assert info['box_session_id'] == 'mcp-shared'
assert info['box_enabled'] is True
def test_stdio_session_refuses_when_box_unavailable(self, mcp_module):
"""Policy: when Box is configured but unavailable (disabled in config
OR connection failed), stdio MCP servers are NOT treated as box-stdio.
``_init_stdio_python_server`` will raise a clear refusal at start
time; until then, the runtime info simply omits box_session_id so the
UI can render the disabled state cleanly."""
ap = _make_ap()
ap.box_service.available = False
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'test-uuid',
'mode': 'stdio',
'command': 'python',
'args': [],
},
ap=ap,
)
info = s.get_runtime_info_dict()
assert 'box_session_id' not in info
assert 'box_enabled' not in info
def test_stdio_session_without_box_service_uses_local_stdio(self, mcp_module):
ap = _make_ap()
del ap.box_service
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'test-uuid',
'mode': 'stdio',
'command': 'python',
'args': [],
},
ap=ap,
)
info = s.get_runtime_info_dict()
assert 'box_session_id' not in info
# ── Box config parsing ──────────────────────────────────────────────
class TestBoxConfigParsing:
def test_box_config_parsed_from_server_config(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
'box': {'image': 'node:20', 'host_path': '/home/user/mcp'},
},
)
assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
assert s.box_config.image == 'node:20'
assert s.box_config.host_path == '/home/user/mcp'
def test_missing_box_key_uses_defaults(self, mcp_module):
s = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'sse',
'command': 'python',
'args': [],
},
)
assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
assert s.box_config.image is None
assert s.box_config.host_path_mode == 'ro'
@pytest.mark.asyncio
async def test_init_box_stdio_server_stages_host_path_in_shared_workspace(mcp_module, tmp_path):
mcp_stdio_module = sys.modules['langbot.pkg.provider.tools.loaders.mcp_stdio']
class FakeClientSession:
def __init__(self, *_args):
pass
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
return False
async def initialize(self):
return None
@asynccontextmanager
async def fake_websocket_client(_url: str):
yield ('read-stream', 'write-stream')
mcp_stdio_module.ClientSession = FakeClientSession
mcp_stdio_module.websocket_client = fake_websocket_client
ap = _make_ap()
ap.box_service.available = True
ap.box_service.default_workspace = str(tmp_path / 'shared-box-workspace')
ap.box_service.create_session = AsyncMock(return_value={})
ap.box_service.build_spec = Mock(return_value='validated-spec')
ap.box_service.client = SimpleNamespace(
execute=AsyncMock(return_value=SimpleNamespace(ok=True, stderr='', exit_code=0))
)
ap.box_service.start_managed_process = AsyncMock(return_value={})
ap.box_service.get_managed_process_websocket_url = Mock(return_value='ws://box.example/process')
host_path = tmp_path / 'mcp-source'
host_path.mkdir()
server_file = host_path / 'server.py'
server_file.write_text('print("hello")\n', encoding='utf-8')
session = _make_session(
mcp_module,
{
'name': 'test',
'uuid': 'u1',
'mode': 'stdio',
'command': str(host_path / '.venv' / 'bin' / 'python'),
'args': [str(server_file)],
'box': {'host_path': str(host_path)},
},
ap=ap,
)
await session._init_box_stdio_server()
await session.exit_stack.aclose()
assert ap.box_service.create_session.await_count == 1
session_payload = ap.box_service.create_session.await_args.args[0]
assert session_payload['session_id'] == 'mcp-shared'
assert 'host_path' not in session_payload
assert ap.box_service.build_spec.call_count == 1
assert ap.box_service.build_spec.call_args.kwargs.get('skip_host_mount_validation', False) is False
assert ap.box_service.build_spec.call_args.args[0]['host_path'] == str(host_path)
staged_file = tmp_path / 'shared-box-workspace' / '.mcp' / 'u1' / 'workspace' / 'server.py'
assert staged_file.read_text(encoding='utf-8') == 'print("hello")\n'
process_payload = ap.box_service.start_managed_process.await_args.args[1]
assert process_payload['process_id'] == 'u1'
assert process_payload['command'] == 'python'
assert process_payload['args'] == ['/workspace/.mcp/u1/workspace/server.py']
assert process_payload['cwd'] == '/workspace/.mcp/u1/workspace'

View File

@@ -169,6 +169,7 @@ async def test_updated_llm_model_is_immediately_usable_by_local_agent_pipeline()
ap.logger = Mock()
ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
ap.tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
ap.skill_mgr = None # PreProcessor only uses skill_mgr for the local-agent skill-binding branch
ap.plugin_connector = SimpleNamespace(
emit_event=AsyncMock(return_value=SimpleNamespace(event=SimpleNamespace(default_prompt=[], prompt=[])))
)

View File

@@ -0,0 +1,479 @@
from __future__ import annotations
import os
import tempfile
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
def _make_ap(logger=None):
ap = SimpleNamespace()
ap.logger = logger or Mock()
ap.persistence_mgr = Mock()
ap.persistence_mgr.execute_async = AsyncMock(return_value=Mock(all=Mock(return_value=[])))
ap.persistence_mgr.serialize_model = Mock(side_effect=lambda cls, row: row)
return ap
def _make_skill_data(
name='test-skill',
instructions='Do something',
package_root='',
entry_file='SKILL.md',
**kwargs,
):
return {
'name': name,
'display_name': kwargs.pop('display_name', name),
'description': kwargs.pop('description', f'Description of {name}'),
'instructions': instructions,
'package_root': package_root,
'entry_file': entry_file,
**kwargs,
}
class TestSkillManagerCache:
"""The Box runtime is the only source of truth — SkillManager just holds
an in-memory cache populated by ``reload_skills``. There is no local
filesystem reader anymore."""
def test_refresh_skill_from_disk_reports_cache_presence(self):
"""Box is the only source of truth for skill content. refresh_skill_from_disk
now just reports whether the skill is still in the in-memory cache —
the actual content refresh is driven by SkillService awaiting
``reload_skills`` after every Box mutation."""
from langbot.pkg.skill.manager import SkillManager
ap = _make_ap()
mgr = SkillManager(ap)
# Empty cache → returns False
assert mgr.refresh_skill_from_disk('test-skill') is False
# Cache populated → returns True; method does NOT mutate the cache
cached = _make_skill_data(name='test-skill', instructions='Cached')
mgr.skills['test-skill'] = cached
assert mgr.refresh_skill_from_disk('test-skill') is True
assert mgr.skills['test-skill'] is cached
assert mgr.refresh_skill_from_disk('') is False
@pytest.mark.asyncio
async def test_reload_skills_drops_box_skills_with_missing_package_root(self):
"""When Box reports a skill whose package_root is gone from the
LangBot-visible filesystem, the cache must drop it instead of
keeping a stale entry that would later produce a bad mount."""
from langbot.pkg.skill.manager import SkillManager
with tempfile.TemporaryDirectory() as live_dir:
ghost_dir = os.path.join(live_dir, '_does_not_exist')
box_service = SimpleNamespace(
available=True,
list_skills=AsyncMock(
return_value=[
_make_skill_data(name='alive', package_root=live_dir),
_make_skill_data(name='ghost', package_root=ghost_dir),
]
),
)
ap = _make_ap()
ap.box_service = box_service
mgr = SkillManager(ap)
await mgr.reload_skills()
assert list(mgr.skills) == ['alive']
# Warning fired with the dropped skill name so operators can see it.
warning_messages = [str(call.args[0]) for call in ap.logger.warning.call_args_list]
assert any('ghost' in msg and 'package_root missing' in msg for msg in warning_messages)
class TestSkillActivationHelper:
"""Skill activation is now Tool-Call based.
The legacy text-marker mechanism (``[ACTIVATE_SKILL: x]`` detection,
``build_activation_prompt_for_skills``, ``remove_activation_marker``,
``prepare_skill_activation``) has been removed. Activation now goes
through ``skill.activation.register_activated_skill``, invoked by the
``activate`` Tool Call.
"""
def test_register_activated_skill_records_known_skill(self):
from langbot.pkg.skill.activation import register_activated_skill
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
from langbot.pkg.skill.manager import SkillManager
ap = _make_ap()
mgr = SkillManager(ap)
mgr.skills = {
'primary': _make_skill_data(name='primary', instructions='Primary instructions'),
}
ap.skill_mgr = mgr
query = SimpleNamespace(variables={})
assert register_activated_skill(ap, query, 'primary') is True
assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'primary'}
assert query.variables[ACTIVATED_SKILLS_KEY]['primary']['name'] == 'primary'
def test_register_activated_skill_rejects_unknown_skill(self):
from langbot.pkg.skill.activation import register_activated_skill
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
from langbot.pkg.skill.manager import SkillManager
ap = _make_ap()
mgr = SkillManager(ap)
mgr.skills = {'primary': _make_skill_data(name='primary')}
ap.skill_mgr = mgr
query = SimpleNamespace(variables={})
assert register_activated_skill(ap, query, 'missing') is False
assert ACTIVATED_SKILLS_KEY not in query.variables
def test_register_activated_skill_without_skill_manager_returns_false(self):
from langbot.pkg.skill.activation import register_activated_skill
ap = _make_ap() # no skill_mgr attribute
query = SimpleNamespace(variables={})
assert register_activated_skill(ap, query, 'primary') is False
class TestSkillPathHelpers:
def test_get_visible_skills_filters_by_bound_names(self):
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY, get_visible_skills
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(
skills={
'visible': _make_skill_data(name='visible'),
'hidden': _make_skill_data(name='hidden'),
}
)
query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['visible']})
result = get_visible_skills(ap, query)
assert list(result.keys()) == ['visible']
def test_resolve_virtual_skill_path_allows_visible_skill_reads(self):
from langbot.pkg.provider.tools.loaders.skill import (
PIPELINE_BOUND_SKILLS_KEY,
resolve_virtual_skill_path,
)
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
skill, rewritten = resolve_virtual_skill_path(
ap,
query,
'/workspace/.skills/demo/SKILL.md',
include_visible=True,
include_activated=False,
)
assert skill['name'] == 'demo'
assert rewritten == '/workspace/SKILL.md'
def test_build_skill_session_id_uses_name_based_identifier(self):
from langbot.pkg.provider.tools.loaders.skill import build_skill_session_id
with_launcher = build_skill_session_id(
{'name': 'writer'},
SimpleNamespace(query_id=42, launcher_type='person', launcher_id='123'),
)
fallback = build_skill_session_id({'name': 'writer'}, SimpleNamespace(query_id=99))
assert with_launcher == 'skill-person_123-writer'
assert fallback == 'skill-99-writer'
def test_should_prepare_skill_python_env_detects_manifests_and_venv(self):
from langbot.pkg.provider.tools.loaders.skill import should_prepare_skill_python_env
with tempfile.TemporaryDirectory() as tmpdir:
assert should_prepare_skill_python_env(tmpdir) is False
with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as f:
f.write('requests==2.32.0\n')
assert should_prepare_skill_python_env(tmpdir) is True
with tempfile.TemporaryDirectory() as tmpdir:
os.makedirs(os.path.join(tmpdir, '.venv'))
assert should_prepare_skill_python_env(tmpdir) is True
def test_wrap_skill_command_with_python_env_bootstraps_then_runs_command(self):
from langbot.pkg.provider.tools.loaders.skill import wrap_skill_command_with_python_env
command = wrap_skill_command_with_python_env('python scripts/run.py')
assert 'python -m venv "$_LB_VENV_DIR"' in command
assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
assert command.rstrip().endswith('python scripts/run.py')
class TestSkillToolLoader:
"""The skill tool surface is now just ``activate`` + ``register_skill``.
The legacy CRUD authoring tools (create/list/get/update/delete/
import_skill_from_directory/reload_skills) were removed; skill CRUD is
handled by SkillService via the HTTP API / web UI instead.
"""
@pytest.mark.asyncio
async def test_activate_returns_instructions_and_registers_skill(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import (
ACTIVATE_SKILL_TOOL_NAME,
SkillToolLoader,
)
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
skill = _make_skill_data(name='demo', package_root='/data/skills/demo', instructions='Step 1')
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(
skills={'demo': skill},
get_skill_by_name=lambda name: skill if name == 'demo' else None,
)
loader = SkillToolLoader(ap)
query = SimpleNamespace(variables={})
result = await loader.invoke_tool(ACTIVATE_SKILL_TOOL_NAME, {'skill_name': 'demo'}, query)
assert result['activated'] is True
assert result['skill_name'] == 'demo'
assert result['mount_path'] == '/workspace/.skills/demo'
assert 'Step 1' in result['content']
assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'demo'}
@pytest.mark.asyncio
async def test_activate_unknown_skill_raises(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import (
ACTIVATE_SKILL_TOOL_NAME,
SkillToolLoader,
)
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(
skills={'demo': _make_skill_data(name='demo')},
get_skill_by_name=lambda name: None,
)
loader = SkillToolLoader(ap)
with pytest.raises(ValueError, match='not found'):
await loader.invoke_tool(
ACTIVATE_SKILL_TOOL_NAME,
{'skill_name': 'ghost'},
SimpleNamespace(variables={}),
)
@pytest.mark.asyncio
async def test_register_skill_scans_directory_and_creates_skill(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import (
REGISTER_SKILL_TOOL_NAME,
SkillToolLoader,
)
with tempfile.TemporaryDirectory() as tmpdir:
repo_dir = os.path.join(tmpdir, 'repo')
os.makedirs(repo_dir)
ap = _make_ap()
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
ap.skill_service = SimpleNamespace(
scan_directory_async=AsyncMock(
return_value={
'name': 'cloned-skill',
'display_name': 'Cloned Skill',
'description': 'Imported from clone',
'instructions': 'Do work',
}
),
create_skill=AsyncMock(
return_value=_make_skill_data(name='cloned-skill', package_root=os.path.realpath(repo_dir))
),
)
loader = SkillToolLoader(ap)
result = await loader.invoke_tool(
REGISTER_SKILL_TOOL_NAME,
{'path': '/workspace/repo'},
SimpleNamespace(),
)
ap.skill_service.scan_directory_async.assert_awaited_once_with(os.path.realpath(repo_dir))
ap.skill_service.create_skill.assert_awaited_once_with(
{
'name': 'cloned-skill',
'display_name': 'Cloned Skill',
'description': 'Imported from clone',
'instructions': 'Do work',
'package_root': os.path.realpath(repo_dir),
}
)
assert result['registered'] is True
assert result['skill_name'] == 'cloned-skill'
assert result['source_path'] == '/workspace/repo'
@pytest.mark.asyncio
async def test_register_skill_rejects_workspace_escape(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import (
REGISTER_SKILL_TOOL_NAME,
SkillToolLoader,
)
with tempfile.TemporaryDirectory() as tmpdir:
ap = _make_ap()
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
ap.skill_service = SimpleNamespace(scan_directory_async=AsyncMock(), create_skill=AsyncMock())
loader = SkillToolLoader(ap)
with pytest.raises(ValueError, match='escapes the workspace boundary'):
await loader.invoke_tool(
REGISTER_SKILL_TOOL_NAME,
{'path': '/workspace/../../etc'},
SimpleNamespace(),
)
@pytest.mark.asyncio
async def test_register_skill_requires_skill_service(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import (
REGISTER_SKILL_TOOL_NAME,
SkillToolLoader,
)
with tempfile.TemporaryDirectory() as tmpdir:
ap = _make_ap() # no skill_service attribute
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
loader = SkillToolLoader(ap)
with pytest.raises(ValueError, match='Skill service not available'):
await loader.invoke_tool(
REGISTER_SKILL_TOOL_NAME,
{'path': '/workspace/foo'},
SimpleNamespace(),
)
@pytest.mark.asyncio
async def test_tools_hidden_when_sandbox_backend_unavailable(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(skills={})
ap.box_service = SimpleNamespace(
available=True,
get_status=AsyncMock(return_value={'backend': {'available': False}}),
)
loader = SkillToolLoader(ap)
await loader.initialize()
assert await loader.get_tools() == []
assert await loader.has_tool('activate') is False
assert await loader.has_tool('register_skill') is False
@pytest.mark.asyncio
async def test_tools_exposed_when_sandbox_backend_available(self):
from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
ap = _make_ap()
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
ap.box_service = SimpleNamespace(
available=True,
get_status=AsyncMock(return_value={'backend': {'available': True}}),
)
loader = SkillToolLoader(ap)
await loader.initialize()
tools = await loader.get_tools()
assert sorted(tool.name for tool in tools) == ['activate', 'register_skill']
assert await loader.has_tool('activate') is True
assert await loader.has_tool('register_skill') is True
class TestNativeToolLoaderSkillPaths:
@pytest.mark.asyncio
async def test_read_visible_skill_file(self):
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
with tempfile.TemporaryDirectory() as tmpdir:
skill_md = os.path.join(tmpdir, 'SKILL.md')
with open(skill_md, 'w', encoding='utf-8') as f:
f.write('demo instructions')
ap = _make_ap()
ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
loader = NativeToolLoader(ap)
result = await loader.invoke_tool(
'read',
{'path': '/workspace/.skills/demo/SKILL.md'},
SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']}),
)
assert result == {'ok': True, 'content': 'demo instructions'}
@pytest.mark.asyncio
async def test_exec_in_activated_skill_mount_rewrites_command_and_refreshes(self):
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
from langbot.pkg.provider.tools.loaders.skill import register_activated_skill
with tempfile.TemporaryDirectory() as tmpdir:
ap = _make_ap()
ap.box_service = SimpleNamespace(
available=True,
default_workspace=tmpdir,
execute_tool=AsyncMock(return_value={'ok': True}),
)
ap.skill_mgr = SimpleNamespace(refresh_skill_from_disk=Mock())
loader = NativeToolLoader(ap)
query = SimpleNamespace(query_id='q1', launcher_type='person', launcher_id='123', variables={})
register_activated_skill(query, _make_skill_data(name='demo', package_root=tmpdir))
result = await loader.invoke_tool(
'exec',
{
'command': 'python /workspace/.skills/demo/scripts/run.py',
'workdir': '/workspace/.skills/demo',
},
query,
)
assert result == {'ok': True}
tool_parameters = ap.box_service.execute_tool.await_args.args[0]
assert tool_parameters['command'] == 'python /workspace/.skills/demo/scripts/run.py'
assert tool_parameters['workdir'] == '/workspace/.skills/demo'
ap.skill_mgr.refresh_skill_from_disk.assert_called_once_with('demo')
@pytest.mark.asyncio
async def test_write_requires_skill_activation(self):
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
with tempfile.TemporaryDirectory() as tmpdir:
ap = _make_ap()
ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
loader = NativeToolLoader(ap)
query = SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
with pytest.raises(ValueError, match='Skill "demo" is not available at this path'):
await loader.invoke_tool(
'write',
{'path': '/workspace/.skills/demo/notes.txt', 'content': 'hi'},
query,
)

View File

@@ -4,6 +4,7 @@ Tests cover:
- Tool schema generation for OpenAI and Anthropic
- Tool execution dispatch
"""
from __future__ import annotations
import pytest
@@ -52,11 +53,12 @@ class TestToolManagerSchemaGeneration:
@pytest.fixture
def sample_tools(self):
"""Create sample LLMTool list for testing."""
def dummy_weather_func(**kwargs):
return "weather result"
return 'weather result'
def dummy_calc_func(**kwargs):
return "calc result"
return 'calc result'
tools = [
resource_tool.LLMTool(
@@ -65,15 +67,10 @@ class TestToolManagerSchemaGeneration:
description='Get current weather for a location',
parameters={
'type': 'object',
'properties': {
'location': {
'type': 'string',
'description': 'City name'
}
},
'required': ['location']
'properties': {'location': {'type': 'string', 'description': 'City name'}},
'required': ['location'],
},
func=dummy_weather_func
func=dummy_weather_func,
),
resource_tool.LLMTool(
name='calculate',
@@ -81,15 +78,10 @@ class TestToolManagerSchemaGeneration:
description='Perform a calculation',
parameters={
'type': 'object',
'properties': {
'expression': {
'type': 'string',
'description': 'Math expression'
}
},
'required': ['expression']
'properties': {'expression': {'type': 'string', 'description': 'Math expression'}},
'required': ['expression'],
},
func=dummy_calc_func
func=dummy_calc_func,
),
]
return tools
@@ -188,26 +180,48 @@ class TestToolManagerExecuteFuncCall:
@pytest.fixture
def mock_app_with_loaders(self):
"""Create mock app with mock tool loaders."""
"""Create mock app with mock tool loaders.
Returns (app, plugin_loader, mcp_loader). The native and skill loaders
are attached directly to the app for tests that don't need to assert
against them — they all default to ``has_tool == False`` so the
execute_func_call probe falls through to the plugin/mcp pair.
"""
mock_app = Mock()
mock_app.logger = Mock()
def _make_inert_loader():
loader = Mock()
loader.has_tool = AsyncMock(return_value=False)
loader.invoke_tool = AsyncMock(return_value=None)
loader.initialize = AsyncMock()
loader.shutdown = AsyncMock()
return loader
# Create mock plugin loader
mock_plugin_loader = Mock()
mock_plugin_loader.has_tool = AsyncMock(return_value=False)
mock_plugin_loader = _make_inert_loader()
mock_plugin_loader.invoke_tool = AsyncMock(return_value='plugin_result')
mock_plugin_loader.initialize = AsyncMock()
mock_plugin_loader.shutdown = AsyncMock()
# Create mock MCP loader
mock_mcp_loader = Mock()
mock_mcp_loader.has_tool = AsyncMock(return_value=False)
mock_mcp_loader = _make_inert_loader()
mock_mcp_loader.invoke_tool = AsyncMock(return_value='mcp_result')
mock_mcp_loader.initialize = AsyncMock()
mock_mcp_loader.shutdown = AsyncMock()
# Stash inert native/skill loaders so the ToolManager probe order
# (native → plugin → mcp → skill) doesn't AttributeError. Tests that
# need to override these can replace the attributes on the manager.
mock_app._inert_native_loader = _make_inert_loader()
mock_app._inert_skill_loader = _make_inert_loader()
return mock_app, mock_plugin_loader, mock_mcp_loader
@staticmethod
def _wire_loaders(manager, mock_app, plugin_loader, mcp_loader):
"""Attach all four loaders (native + plugin + mcp + skill) to manager."""
manager.native_tool_loader = mock_app._inert_native_loader
manager.plugin_tool_loader = plugin_loader
manager.mcp_tool_loader = mcp_loader
manager.skill_tool_loader = mock_app._inert_skill_loader
@pytest.fixture
def sample_query(self):
"""Create sample query for testing."""
@@ -215,9 +229,7 @@ class TestToolManagerExecuteFuncCall:
return query
@pytest.mark.asyncio
async def test_execute_calls_plugin_loader_when_has_tool(
self, mock_app_with_loaders, sample_query
):
async def test_execute_calls_plugin_loader_when_has_tool(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call uses plugin loader when tool exists there."""
toolmgr = get_toolmgr_module()
@@ -225,26 +237,17 @@ class TestToolManagerExecuteFuncCall:
mock_plugin_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
result = await manager.execute_func_call(
'test_tool',
{'param': 'value'},
sample_query
)
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
assert result == 'plugin_result'
mock_plugin_loader.invoke_tool.assert_called_once_with(
'test_tool', {'param': 'value'}, sample_query
)
mock_plugin_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
# MCP loader should not be called
mock_mcp_loader.invoke_tool.assert_not_called()
@pytest.mark.asyncio
async def test_execute_calls_mcp_loader_when_plugin_not_found(
self, mock_app_with_loaders, sample_query
):
async def test_execute_calls_mcp_loader_when_plugin_not_found(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call uses MCP loader when plugin doesn't have tool."""
toolmgr = get_toolmgr_module()
@@ -253,24 +256,15 @@ class TestToolManagerExecuteFuncCall:
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
result = await manager.execute_func_call(
'test_tool',
{'param': 'value'},
sample_query
)
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
assert result == 'mcp_result'
mock_mcp_loader.invoke_tool.assert_called_once_with(
'test_tool', {'param': 'value'}, sample_query
)
mock_mcp_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
@pytest.mark.asyncio
async def test_execute_raises_when_tool_not_found(
self, mock_app_with_loaders, sample_query
):
async def test_execute_raises_when_tool_not_found(self, mock_app_with_loaders, sample_query):
"""Test that execute_func_call raises ValueError when tool not found."""
toolmgr = get_toolmgr_module()
@@ -279,20 +273,13 @@ class TestToolManagerExecuteFuncCall:
mock_mcp_loader.has_tool = AsyncMock(return_value=False)
manager = toolmgr.ToolManager(mock_app)
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
with pytest.raises(ValueError, match='未找到工具'):
await manager.execute_func_call(
'unknown_tool',
{},
sample_query
)
await manager.execute_func_call('unknown_tool', {}, sample_query)
@pytest.mark.asyncio
async def test_plugin_loader_checked_first(
self, mock_app_with_loaders, sample_query
):
async def test_plugin_loader_checked_first(self, mock_app_with_loaders, sample_query):
"""Test that plugin loader is checked before MCP loader."""
toolmgr = get_toolmgr_module()
@@ -302,8 +289,7 @@ class TestToolManagerExecuteFuncCall:
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
manager = toolmgr.ToolManager(mock_app)
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
await manager.execute_func_call('test_tool', {}, sample_query)
@@ -317,20 +303,30 @@ class TestToolManagerShutdown:
@pytest.mark.asyncio
async def test_shutdown_calls_loader_shutdown(self):
"""Test that shutdown calls shutdown on both loaders."""
"""Test that shutdown calls shutdown on every registered loader."""
toolmgr = get_toolmgr_module()
mock_app = Mock()
mock_plugin_loader = Mock()
mock_plugin_loader.shutdown = AsyncMock()
mock_mcp_loader = Mock()
mock_mcp_loader.shutdown = AsyncMock()
def _make_loader():
loader = Mock()
loader.shutdown = AsyncMock()
return loader
mock_native_loader = _make_loader()
mock_plugin_loader = _make_loader()
mock_mcp_loader = _make_loader()
mock_skill_loader = _make_loader()
manager = toolmgr.ToolManager(mock_app)
manager.native_tool_loader = mock_native_loader
manager.plugin_tool_loader = mock_plugin_loader
manager.mcp_tool_loader = mock_mcp_loader
manager.skill_tool_loader = mock_skill_loader
await manager.shutdown()
mock_native_loader.shutdown.assert_called_once()
mock_plugin_loader.shutdown.assert_called_once()
mock_mcp_loader.shutdown.assert_called_once()
mock_mcp_loader.shutdown.assert_called_once()
mock_skill_loader.shutdown.assert_called_once()

View File

@@ -0,0 +1,250 @@
from __future__ import annotations
import os
import tempfile
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
from langbot.pkg.provider.tools.toolmgr import ToolManager
class StubLoader:
def __init__(self, tools: list[resource_tool.LLMTool] | None = None, invoke_result=None):
self._tools = tools or []
self._invoke_result = invoke_result
async def get_tools(self, *_args, **_kwargs):
return self._tools
async def has_tool(self, name: str) -> bool:
return any(tool.name == name for tool in self._tools)
async def invoke_tool(self, name: str, parameters: dict, query):
return self._invoke_result(name, parameters, query) if callable(self._invoke_result) else self._invoke_result
async def shutdown(self):
return None
def make_tool(name: str) -> resource_tool.LLMTool:
return resource_tool.LLMTool(
name=name,
human_desc=name,
description=name,
parameters={'type': 'object', 'properties': {}},
func=lambda parameters: parameters,
)
@pytest.mark.asyncio
async def test_tool_manager_omits_skill_authoring_tools_by_default():
manager = ToolManager(SimpleNamespace())
manager.native_tool_loader = StubLoader([make_tool('exec')])
manager.skill_tool_loader = StubLoader([make_tool('activate')])
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
tools = await manager.get_all_tools()
assert [tool.name for tool in tools] == ['exec', 'plugin_tool', 'mcp_tool']
@pytest.mark.asyncio
async def test_tool_manager_includes_skill_authoring_tools_when_requested():
manager = ToolManager(SimpleNamespace())
manager.native_tool_loader = StubLoader([make_tool('exec')])
manager.skill_tool_loader = StubLoader([make_tool('activate')])
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
tools = await manager.get_all_tools(include_skill_authoring=True)
assert [tool.name for tool in tools] == ['exec', 'activate', 'plugin_tool', 'mcp_tool']
@pytest.mark.asyncio
async def test_tool_manager_routes_native_tool_calls():
app = SimpleNamespace()
manager = ToolManager(app)
manager.native_tool_loader = StubLoader([make_tool('exec')], invoke_result={'backend': 'fake'})
manager.skill_tool_loader = StubLoader([make_tool('activate')])
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
result = await manager.execute_func_call('exec', {'command': 'pwd'}, query=Mock())
assert result == {'backend': 'fake'}
@pytest.mark.asyncio
async def test_native_tool_loader_hides_tools_when_box_unavailable():
loader = NativeToolLoader(SimpleNamespace(box_service=SimpleNamespace(available=False)))
assert await loader.get_tools() == []
for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
assert await loader.has_tool(tool_name) is False
@pytest.mark.asyncio
async def test_native_tool_loader_exposes_all_tools_when_box_available():
box_service = SimpleNamespace(
available=True,
get_status=AsyncMock(return_value={'backend': {'available': True}}),
)
loader = NativeToolLoader(SimpleNamespace(box_service=box_service, logger=Mock()))
await loader.initialize()
tools = await loader.get_tools()
assert [tool.name for tool in tools] == ['exec', 'read', 'write', 'edit', 'glob', 'grep']
for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
assert await loader.has_tool(tool_name) is True
# ── read/write/edit file tool tests ─────────────────────────────
def _make_loader_with_workspace(tmpdir: str) -> tuple[NativeToolLoader, Mock]:
logger = Mock()
box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
ap = SimpleNamespace(box_service=box_service, logger=logger)
return NativeToolLoader(ap), logger
def _make_query() -> Mock:
q = Mock()
q.query_id = 'test-query-1'
return q
@pytest.mark.asyncio
async def test_read_file():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
with open(os.path.join(tmpdir, 'hello.txt'), 'w') as f:
f.write('hello world')
result = await loader.invoke_tool('read', {'path': '/workspace/hello.txt'}, _make_query())
assert result['ok'] is True
assert result['content'] == 'hello world'
@pytest.mark.asyncio
async def test_read_nonexistent_file():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
result = await loader.invoke_tool('read', {'path': '/workspace/no_such.txt'}, _make_query())
assert result['ok'] is False
assert 'not found' in result['error'].lower()
@pytest.mark.asyncio
async def test_read_directory():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
os.makedirs(os.path.join(tmpdir, 'subdir'))
with open(os.path.join(tmpdir, 'a.txt'), 'w') as f:
f.write('a')
result = await loader.invoke_tool('read', {'path': '/workspace'}, _make_query())
assert result['ok'] is True
assert result['is_directory'] is True
assert 'a.txt' in result['content']
@pytest.mark.asyncio
async def test_write_creates_file():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
result = await loader.invoke_tool(
'write', {'path': '/workspace/new.txt', 'content': 'new content'}, _make_query()
)
assert result['ok'] is True
with open(os.path.join(tmpdir, 'new.txt')) as f:
assert f.read() == 'new content'
@pytest.mark.asyncio
async def test_write_creates_subdirectories():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
result = await loader.invoke_tool(
'write', {'path': '/workspace/sub/deep/file.txt', 'content': 'nested'}, _make_query()
)
assert result['ok'] is True
with open(os.path.join(tmpdir, 'sub', 'deep', 'file.txt')) as f:
assert f.read() == 'nested'
@pytest.mark.asyncio
async def test_edit_replaces_unique_string():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
with open(os.path.join(tmpdir, 'code.py'), 'w') as f:
f.write('def foo():\n return 1\n')
result = await loader.invoke_tool(
'edit',
{'path': '/workspace/code.py', 'old_string': 'return 1', 'new_string': 'return 42'},
_make_query(),
)
assert result['ok'] is True
with open(os.path.join(tmpdir, 'code.py')) as f:
assert f.read() == 'def foo():\n return 42\n'
@pytest.mark.asyncio
async def test_edit_rejects_ambiguous_match():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
with open(os.path.join(tmpdir, 'dup.txt'), 'w') as f:
f.write('aaa\naaa\n')
result = await loader.invoke_tool(
'edit',
{'path': '/workspace/dup.txt', 'old_string': 'aaa', 'new_string': 'bbb'},
_make_query(),
)
assert result['ok'] is False
assert '2' in result['error']
@pytest.mark.asyncio
async def test_edit_rejects_missing_string():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
with open(os.path.join(tmpdir, 'x.txt'), 'w') as f:
f.write('hello')
result = await loader.invoke_tool(
'edit',
{'path': '/workspace/x.txt', 'old_string': 'nope', 'new_string': 'yes'},
_make_query(),
)
assert result['ok'] is False
assert 'not found' in result['error'].lower()
@pytest.mark.asyncio
async def test_path_escape_blocked():
with tempfile.TemporaryDirectory() as tmpdir:
loader, _ = _make_loader_with_workspace(tmpdir)
with pytest.raises(ValueError, match='escapes'):
await loader.invoke_tool('read', {'path': '/workspace/../../etc/passwd'}, _make_query())

View File

@@ -0,0 +1,23 @@
from pathlib import Path
from src.langbot.pkg.utils import paths
def test_get_data_root_uses_source_root_in_repo_checkout():
data_root = Path(paths.get_data_root())
repo_root = Path(__file__).resolve().parents[2]
assert data_root == repo_root / 'data'
def test_get_data_path_joins_under_data_root():
data_path = Path(paths.get_data_path('skills', 'demo-skill'))
repo_root = Path(__file__).resolve().parents[2]
assert data_path == repo_root / 'data' / 'skills' / 'demo-skill'
def test_get_data_root_honors_env_override(monkeypatch, tmp_path):
monkeypatch.setenv('LANGBOT_DATA_ROOT', str(tmp_path / 'custom-data'))
assert Path(paths.get_data_root()) == (tmp_path / 'custom-data').resolve()

View File

@@ -0,0 +1,204 @@
from __future__ import annotations
import importlib
import sys
import types
from types import SimpleNamespace
from unittest.mock import AsyncMock, Mock
import pytest
from langbot_plugin.api.entities.builtin.pipeline.query import Query
from langbot_plugin.api.entities.builtin.platform.entities import Friend
from langbot_plugin.api.entities.builtin.platform.events import FriendMessage
from langbot_plugin.api.entities.builtin.platform.message import MessageChain, Plain
from langbot_plugin.api.entities.builtin.provider.message import Message
from langbot_plugin.api.entities.builtin.provider.prompt import Prompt
from langbot_plugin.api.entities.builtin.provider.session import Conversation, LauncherTypes, Session
def _make_query() -> Query:
message_chain = MessageChain([Plain(text='create a skill')])
return Query(
query_id=1,
launcher_type=LauncherTypes.PERSON,
launcher_id='launcher-1',
sender_id='sender-1',
message_event=FriendMessage(
message_chain=message_chain,
time=0,
sender=Friend(id='sender-1', nickname='Tester', remark='Tester'),
),
message_chain=message_chain,
bot_uuid='bot-1',
pipeline_uuid='pipe-1',
pipeline_config={
'ai': {
'runner': {'runner': 'local-agent'},
'local-agent': {
'model': {'primary': 'model-1', 'fallbacks': []},
'prompt': 'default',
'knowledge-bases': [],
},
},
'trigger': {'misc': {}},
},
variables={},
)
def _make_conversation() -> Conversation:
return Conversation(
prompt=Prompt(name='default', messages=[Message(role='system', content='system prompt')]),
messages=[],
pipeline_uuid='pipe-1',
bot_uuid='bot-1',
uuid='conv-1',
)
def _make_app(*, skill_service) -> SimpleNamespace:
session = Session(launcher_type=LauncherTypes.PERSON, launcher_id='launcher-1', sender_id='sender-1')
conversation = _make_conversation()
model = SimpleNamespace(model_entity=SimpleNamespace(uuid='model-1', abilities={'func_call'}))
tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
return SimpleNamespace(
sess_mgr=SimpleNamespace(
get_session=AsyncMock(return_value=session),
get_conversation=AsyncMock(return_value=conversation),
),
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
tool_mgr=tool_mgr,
plugin_connector=SimpleNamespace(
emit_event=AsyncMock(
return_value=SimpleNamespace(
event=SimpleNamespace(
default_prompt=conversation.prompt.messages.copy(),
prompt=conversation.messages.copy(),
)
)
)
),
pipeline_service=SimpleNamespace(
get_pipeline=AsyncMock(return_value={'extensions_preferences': {'enable_all_skills': True}})
),
skill_mgr=SimpleNamespace(
build_skill_aware_prompt_addition=Mock(return_value=''),
skills={},
),
skill_service=skill_service,
logger=Mock(),
)
def _import_preproc_modules():
fake_app_module = types.ModuleType('langbot.pkg.core.app')
fake_app_module.Application = object
sys.modules['langbot.pkg.core.app'] = fake_app_module
for module_name in (
'langbot.pkg.pipeline.preproc.preproc',
'langbot.pkg.pipeline.stage',
):
sys.modules.pop(module_name, None)
preproc_module = importlib.import_module('langbot.pkg.pipeline.preproc.preproc')
entities_module = importlib.import_module('langbot.pkg.pipeline.entities')
return preproc_module, entities_module
@pytest.mark.asyncio
async def test_preproc_enables_skill_authoring_tools_when_skill_service_available():
preproc_module, entities_module = _import_preproc_modules()
app = _make_app(skill_service=SimpleNamespace())
stage = preproc_module.PreProcessor(app)
result = await stage.process(_make_query(), 'PreProcessor')
assert result.result_type == entities_module.ResultType.CONTINUE
app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=True)
@pytest.mark.asyncio
async def test_preproc_disables_skill_authoring_tools_when_skill_service_missing():
preproc_module, entities_module = _import_preproc_modules()
app = _make_app(skill_service=None)
stage = preproc_module.PreProcessor(app)
result = await stage.process(_make_query(), 'PreProcessor')
assert result.result_type == entities_module.ResultType.CONTINUE
app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=False)
@pytest.mark.asyncio
async def test_preproc_injects_skill_index_into_system_prompt():
"""The Tool Call activation pattern still needs the LLM to know which
skills exist. PreProcessor must append the SkillManager's index
addendum to the first system message."""
preproc_module, entities_module = _import_preproc_modules()
app = _make_app(skill_service=SimpleNamespace())
addendum = '\n\nAvailable Skills:\n- demo (demo): Demo skill.\n\nCall activate ...'
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value=addendum)
query = _make_query()
result = await stage_process_capture(preproc_module, app, query)
assert result.result_type == entities_module.ResultType.CONTINUE
app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=None)
head = query.prompt.messages[0]
assert head.role == 'system'
assert head.content.endswith(addendum)
@pytest.mark.asyncio
async def test_preproc_respects_pipeline_bound_skills_subset():
"""When ``enable_all_skills`` is false the bound list is passed through
so the addendum only mentions skills allowed for this pipeline."""
preproc_module, entities_module = _import_preproc_modules()
app = _make_app(skill_service=SimpleNamespace())
app.pipeline_service.get_pipeline = AsyncMock(
return_value={
'extensions_preferences': {
'enable_all_skills': False,
'skills': ['only-this'],
}
}
)
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
query = _make_query()
result = await stage_process_capture(preproc_module, app, query)
assert result.result_type == entities_module.ResultType.CONTINUE
app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=['only-this'])
assert query.variables.get('_pipeline_bound_skills') == ['only-this']
@pytest.mark.asyncio
async def test_preproc_skips_injection_when_addendum_is_empty():
"""No visible skills → system prompt is left untouched (no
``Available Skills`` block appended)."""
preproc_module, entities_module = _import_preproc_modules()
app = _make_app(skill_service=SimpleNamespace())
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
query = _make_query()
result = await stage_process_capture(preproc_module, app, query)
assert result.result_type == entities_module.ResultType.CONTINUE
if query.prompt and query.prompt.messages:
assert 'Available Skills' not in (query.prompt.messages[0].content or '')
async def stage_process_capture(preproc_module, app, query):
"""Run PreProcessor.process and return the result while keeping ``query``
accessible to the assertions (process mutates query in place)."""
stage = preproc_module.PreProcessor(app)
return await stage.process(query, 'PreProcessor')

View File

@@ -0,0 +1,89 @@
from types import SimpleNamespace
from unittest.mock import AsyncMock
import pytest
from langbot.pkg.api.http.service.skill import SkillService
class TestRequireBoxForWrite:
"""Box is the only source of truth for skills — there is no local
filesystem fallback. Every write and (most) read methods refuse cleanly
when the Box runtime is disabled, unreachable, or simply not installed."""
def _ap_with_disabled_box(self):
return SimpleNamespace(
skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
box_service=SimpleNamespace(
available=False,
enabled=False,
_connector_error='Box runtime is disabled in config (box.enabled = false)',
),
)
def _ap_with_failed_box(self):
return SimpleNamespace(
skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
box_service=SimpleNamespace(
available=False,
enabled=True,
_connector_error='docker daemon not running',
),
)
@pytest.mark.asyncio
async def test_create_skill_refused_when_box_disabled(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='disabled in config'):
await service.create_skill({'name': 'x'})
@pytest.mark.asyncio
async def test_create_skill_refused_when_box_failed(self):
service = SkillService(self._ap_with_failed_box())
with pytest.raises(ValueError, match='docker daemon not running'):
await service.create_skill({'name': 'x'})
@pytest.mark.asyncio
async def test_update_skill_refused_when_box_disabled(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='Editing a skill requires the Box runtime'):
await service.update_skill('x', {})
@pytest.mark.asyncio
async def test_write_skill_file_refused_when_box_disabled(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='Editing skill files requires the Box runtime'):
await service.write_skill_file('x', 'a.txt', 'hi')
@pytest.mark.asyncio
async def test_install_from_github_refused_when_box_disabled(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='Installing a skill from GitHub'):
await service.install_from_github({'owner': 'o', 'repo': 'r', 'asset_url': 'https://example/x.zip'})
@pytest.mark.asyncio
async def test_install_from_zip_upload_refused_when_box_disabled(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='Installing a skill from upload'):
await service.install_from_zip_upload(file_bytes=b'', filename='x.zip')
@pytest.mark.asyncio
async def test_create_skill_refused_when_box_service_missing_entirely(self):
"""No ap.box_service attribute at all (truly minimal setup):
Box is the only source of truth, so creation must still refuse."""
service = SkillService(SimpleNamespace(skill_mgr=SimpleNamespace(reload_skills=AsyncMock())))
with pytest.raises(ValueError, match='not initialised'):
await service.create_skill({'name': 'x'})
@pytest.mark.asyncio
async def test_list_skills_returns_empty_when_box_unavailable(self):
"""list_skills should render an empty surface (not crash) so the
skills page can show a banner instead of a broken state."""
service = SkillService(self._ap_with_disabled_box())
assert await service.list_skills() == []
@pytest.mark.asyncio
async def test_read_skill_file_refused_when_box_unavailable(self):
service = SkillService(self._ap_with_disabled_box())
with pytest.raises(ValueError, match='Reading a skill file'):
await service.read_skill_file('x', 'a.txt')

View File

@@ -11,7 +11,6 @@ Uses tmp_path for file system isolation where applicable.
import os
import pytest
from unittest.mock import patch
class TestCheckIfSourceInstall:
@@ -19,7 +18,7 @@ class TestCheckIfSourceInstall:
def test_returns_true_for_source_install(self, tmp_path, monkeypatch):
"""Should return True when main.py with LangBot marker exists."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n# This is the entry point')
monkeypatch.chdir(tmp_path)
@@ -33,52 +32,14 @@ class TestCheckIfSourceInstall:
paths._is_source_install = None
def test_returns_false_when_no_main_py(self, tmp_path, monkeypatch):
"""Should return False when main.py doesn't exist."""
monkeypatch.chdir(tmp_path)
from langbot.pkg.utils import paths
paths._is_source_install = None
result = paths._check_if_source_install()
assert result is False
paths._is_source_install = None
def test_returns_false_when_main_py_without_marker(self, tmp_path, monkeypatch):
"""Should return False when main.py exists but lacks LangBot marker."""
main_py = tmp_path / "main.py"
main_py.write_text('# Some other project\nprint("hello")')
monkeypatch.chdir(tmp_path)
from langbot.pkg.utils import paths
paths._is_source_install = None
result = paths._check_if_source_install()
assert result is False
paths._is_source_install = None
def test_handles_io_error_gracefully(self, tmp_path, monkeypatch):
"""Should return False when main.py cannot be read."""
main_py = tmp_path / "main.py"
main_py.write_text('# LangBot/main.py\n')
monkeypatch.chdir(tmp_path)
from langbot.pkg.utils import paths
paths._is_source_install = None
# Patch open to raise IOError
with patch("builtins.open", side_effect=IOError("Cannot read")):
result = paths._check_if_source_install()
assert result is False
paths._is_source_install = None
# Note: ``_check_if_source_install`` was refactored to walk
# ``Path(__file__).resolve().parents`` looking for ``pyproject.toml`` +
# ``main.py`` instead of relying on the cwd. That makes it robust to where
# the process is launched from but also means the old "cwd doesn't have
# main.py" / "main.py without marker" / "IOError on read" cases no longer
# apply — there's no file read at all. The corresponding negative tests
# were removed; ``test_returns_true_for_source_install`` still exercises
# the positive path because the repo checkout itself is a source install.
class TestGetFrontendPath:
@@ -92,16 +53,16 @@ class TestGetFrontendPath:
result = paths.get_frontend_path()
# The result should contain web/dist or be an absolute path to it
assert "web/dist" in result or result.endswith("dist")
assert 'web/dist' in result or result.endswith('dist')
paths._is_source_install = None
def test_finds_dist_directory_in_source_mode(self, tmp_path, monkeypatch):
"""Should find web/dist when running from source mode."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n')
web_dist = tmp_path / "web" / "dist"
web_dist = tmp_path / 'web' / 'dist'
web_dist.mkdir(parents=True)
monkeypatch.chdir(tmp_path)
@@ -111,18 +72,18 @@ class TestGetFrontendPath:
paths._is_source_install = None
result = paths.get_frontend_path()
assert result == "web/dist"
assert result == 'web/dist'
paths._is_source_install = None
def test_prefers_dist_over_out_in_source_mode(self, tmp_path, monkeypatch):
"""Should prefer web/dist over web/out when both exist in source mode."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n')
web_dist = tmp_path / "web" / "dist"
web_dist = tmp_path / 'web' / 'dist'
web_dist.mkdir(parents=True)
web_out = tmp_path / "web" / "out"
web_out = tmp_path / 'web' / 'out'
web_out.mkdir(parents=True)
monkeypatch.chdir(tmp_path)
@@ -132,7 +93,7 @@ class TestGetFrontendPath:
paths._is_source_install = None
result = paths.get_frontend_path()
assert result == "web/dist"
assert result == 'web/dist'
paths._is_source_install = None
@@ -148,19 +109,19 @@ class TestGetResourcePath:
paths._is_source_install = None
result = paths.get_resource_path("nonexistent/file.txt")
assert result == "nonexistent/file.txt"
result = paths.get_resource_path('nonexistent/file.txt')
assert result == 'nonexistent/file.txt'
paths._is_source_install = None
def test_finds_resource_in_current_directory_source_mode(self, tmp_path, monkeypatch):
"""Should find resource in current directory when in source mode."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n')
resource_file = tmp_path / "templates" / "config.yaml"
resource_file = tmp_path / 'templates' / 'config.yaml'
resource_file.parent.mkdir(parents=True, exist_ok=True)
resource_file.write_text("test: value")
resource_file.write_text('test: value')
monkeypatch.chdir(tmp_path)
@@ -168,18 +129,18 @@ class TestGetResourcePath:
paths._is_source_install = None
result = paths.get_resource_path("templates/config.yaml")
result = paths.get_resource_path('templates/config.yaml')
assert os.path.exists(result)
paths._is_source_install = None
def test_returns_relative_path_in_source_mode(self, tmp_path, monkeypatch):
"""Should return relative path if resource exists in source mode."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n')
resource_file = tmp_path / "test_resource.txt"
resource_file.write_text("test content")
resource_file = tmp_path / 'test_resource.txt'
resource_file.write_text('test content')
monkeypatch.chdir(tmp_path)
@@ -187,8 +148,8 @@ class TestGetResourcePath:
paths._is_source_install = None
result = paths.get_resource_path("test_resource.txt")
assert result == "test_resource.txt"
result = paths.get_resource_path('test_resource.txt')
assert result == 'test_resource.txt'
paths._is_source_install = None
@@ -198,7 +159,7 @@ class TestPathFunctionsCaching:
def test_source_install_cache_is_used(self, tmp_path, monkeypatch):
"""_check_if_source_install should use cached result."""
main_py = tmp_path / "main.py"
main_py = tmp_path / 'main.py'
main_py.write_text('# LangBot/main.py\n')
monkeypatch.chdir(tmp_path)
@@ -219,5 +180,5 @@ class TestPathFunctionsCaching:
paths._is_source_install = None
if __name__ == "__main__":
pytest.main([__file__, "-v"])
if __name__ == '__main__':
pytest.main([__file__, '-v'])

View File

@@ -1,136 +0,0 @@
"""
Unit tests for version utility functions.
Tests version comparison logic without network calls.
"""
from __future__ import annotations
from unittest.mock import Mock
from langbot.pkg.utils.version import VersionManager
class TestVersionComparison:
"""Tests for version comparison functions."""
def _create_version_manager(self):
"""Create a VersionManager with mock app."""
mock_app = Mock()
mock_app.proxy_mgr = Mock()
mock_app.proxy_mgr.get_forward_providers = Mock(return_value={})
mock_app.logger = Mock()
return VersionManager(mock_app)
def test_is_newer_same_version(self):
"""is_newer returns False for same version."""
vm = self._create_version_manager()
result = vm.is_newer('v1.0.0', 'v1.0.0')
assert result is False
def test_is_newer_different_major_version(self):
"""is_newer returns False for different major version."""
# Note: is_newer ignores major version changes
vm = self._create_version_manager()
result = vm.is_newer('v2.0.0', 'v1.0.0')
assert result is False
def test_is_newer_minor_update(self):
"""is_newer returns True for minor update within same major."""
vm = self._create_version_manager()
result = vm.is_newer('v1.1.0', 'v1.0.0')
assert result is True
def test_is_newer_patch_update(self):
"""is_newer returns True for patch update within same major."""
vm = self._create_version_manager()
result = vm.is_newer('v1.0.1', 'v1.0.0')
assert result is True
def test_is_newer_with_fourth_segment(self):
"""is_newer ignores fourth version segment."""
# Both have same first 3 segments
vm = self._create_version_manager()
result = vm.is_newer('v1.0.0.1', 'v1.0.0.0')
assert result is False
def test_is_newer_short_version(self):
"""is_newer handles short version numbers."""
vm = self._create_version_manager()
result = vm.is_newer('v1.0', 'v1.0')
assert result is False
def test_is_newer_older_version(self):
"""is_newer returns True when new > old."""
vm = self._create_version_manager()
result = vm.is_newer('v1.2.0', 'v1.1.0')
assert result is True
class TestCompareVersionStr:
"""Tests for compare_version_str static method."""
def test_compare_equal_versions(self):
"""Equal versions return 0."""
result = VersionManager.compare_version_str('v1.0.0', 'v1.0.0')
assert result == 0
def test_compare_without_v_prefix(self):
"""Versions without v prefix work the same."""
result = VersionManager.compare_version_str('1.0.0', '1.0.0')
assert result == 0
def test_compare_mixed_prefix(self):
"""Mixed v prefix works correctly."""
result = VersionManager.compare_version_str('v1.0.0', '1.0.0')
assert result == 0
def test_compare_first_greater(self):
"""First version greater returns 1."""
result = VersionManager.compare_version_str('v1.1.0', 'v1.0.0')
assert result == 1
def test_compare_first_smaller(self):
"""First version smaller returns -1."""
result = VersionManager.compare_version_str('v1.0.0', 'v1.1.0')
assert result == -1
def test_compare_different_lengths(self):
"""Different length versions are padded with zeros."""
result = VersionManager.compare_version_str('v1.0', 'v1.0.0')
assert result == 0
def test_compare_shorter_greater(self):
"""Shorter version padded, first still greater."""
result = VersionManager.compare_version_str('v1.1', 'v1.0.0')
assert result == 1
def test_compare_longer_greater(self):
"""Longer version, first smaller."""
result = VersionManager.compare_version_str('v1.0', 'v1.0.1')
assert result == -1
def test_compare_major_version(self):
"""Major version comparison."""
result = VersionManager.compare_version_str('v2.0.0', 'v1.9.9')
assert result == 1
def test_compare_minor_version(self):
"""Minor version comparison."""
result = VersionManager.compare_version_str('v1.5.0', 'v1.4.9')
assert result == 1
def test_compare_patch_version(self):
"""Patch version comparison."""
result = VersionManager.compare_version_str('v1.0.1', 'v1.0.0')
assert result == 1
def test_compare_four_segments(self):
"""Four segment version comparison."""
result = VersionManager.compare_version_str('v1.0.0.1', 'v1.0.0.0')
assert result == 1
def test_compare_long_versions(self):
"""Long version strings work correctly."""
result = VersionManager.compare_version_str('v1.2.3.4.5', 'v1.2.3.4.4')
assert result == 1