mirror of
https://github.com/langbot-app/LangBot.git
synced 2026-06-04 21:06:03 +00:00
Feat/sandbox (#2072)
* feat: add mcp and skills
* feat: add filter
* feat: modify frontend
* feat(box): add sandbox_exec tool loop for local-agent calculations
* feat(box): add host workspace mounting and sandbox_exec guidance
* feat(box): add BoxProfile with resource limits and improved output truncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
* feat(box): add obs
* refactor(box): unify box service lifecycle and local runtime
management
* refactor(box): remove legacy in-process runtime code and clean up smells
After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:
- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
`box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
(auto-detected by code).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add test
* fix: fix box intergration test
* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
* refactor: use rpc
* fix: import
* refactor(box): clean up sandbox subsystem code quality and efficiency
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
* fix: ruff
* refactor(sandbox): keep box logic out of pipeline and localagent
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
* fix: ruff
* refactor(box): move box runtime to langbot-plugin-sdk
Extract self-contained box runtime modules (actions, backend, client,
errors, models, runtime, security, server) to langbot-plugin-sdk and
update all imports to use `langbot_plugin.box.*`. Keep only service
and
connector in LangBot core as they depend on the Application context.
- Update docker-compose to use `langbot_plugin.box.server` entry
point
- Update pyproject.toml to use local SDK via `tool.uv.sources`
- Remove migrated source files and their unit/integration tests
- Update remaining test imports to match new module paths
* fix: ruff
* feat: enhance sandbox api
* refactor(box): derive paths from shared host root
* fix(box): tighten sandbox exposure and restore box integration coverage
* refactor(types): remove quoted annotations under postponed evaluation
* feat(box): unify native agent tools around exec/read/write/edit
* chore(sandbox): move MCP loader changes to follow-up branch
* feat(box): add session workspace quota enforcement and SDK quota metadata
* feat(skills): add Agent Skills management system (#1917)
* feat(skills): add Agent Skills management system
Implement comprehensive skills management feature inspired by agentskills spec:
Backend:
- Add Skill and SkillPipelineBinding database entities
- Add database migration (dbm018) for skills tables
- Implement SkillManager for skill loading, matching, and resolution
- Implement SkillService for CRUD operations
- Add skills API endpoints for skill and pipeline binding management
- Integrate skill index injection into pipeline preprocessor
- Add skill activation detection in LocalAgentRunner
Frontend:
- Add Skills page with listing, search, and type filter
- Add SkillDetailDialog for create/edit with preview
- Add SkillCard and SkillForm components
- Add skills API methods to BackendClient
- Add skills entry to sidebar navigation
- Add i18n translations (en-US, zh-Hans)
Features:
- Support skill and workflow types
- Sub-skill composition via {{INVOKE_SKILL: name}} syntax
- Progressive disclosure (index in prompt, full instructions on activation)
- Pipeline-specific skill bindings with priority
* fix: resolve cherry-pick conflicts for agentskills onto sandbox
- Remove non-existent external_kb service import
- Add skill_mgr mock to localagent sandbox_exec tests
- Keep database version at 24 (sandbox branch's latest)
* feat(skills): upgrade to package-backed skills with sandbox execution
Evolve the skills system from pure prompt-based to package-backed with
sandbox tool execution support:
- Add source_type/package_root/entry_file/skill_tools fields to Skill entity
- SkillManager loads SKILL.md from local package directories
- SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped)
- LocalAgent injects skill tools into use_funcs on skill activation
- BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params)
- Skill tool names auto-namespaced as skill__{skill}__{tool}
- API validation for package_root allowlist and entry path traversal
- Frontend source_type toggle, package_root input, skill_tools editor
- Migration renumbered to 025 with ALTER TABLE fallback for existing DBs
- Fix unclosed limitation section in i18n files
- Fix skills API methods misplaced outside BackendClient class
* fix: test info
* feat(skills): switch skills to package-backed storage and add import tooling
- skills 从 inline/package 双轨收敛成 package-first
- instructions 改为写入并读取 SKILL.md
- 新增本地目录扫描和 GitHub 安装 skill
- 前端把 skills 整合进 plugins 页,新增 SkillsComponent 和 GitHub 导入弹窗
- skill form 去掉 source_type / type 筛选,改成目录扫描驱动
- Box skill tool 挂载模式从 ro 改成 rw
- 测试和中英文文案同步更新
* feat: simplify langbot skill create and import
* refactor(skills): clean up legacy skill API and harden activation flow
* refactor(skills): remove skill dependency expansion and add skill_get
* fix: lint
* fix: delete
* fix(skills): align tool manager loader initialization
* refactor: remove sandbox execute skill
* fix(skills): hide activation markers and isolate skill activation flow
* refactor(skills): switch skill model to filesystem-backed packages
* refactor(skills): switch skill model to filesystem-backed packages
* refactor(skills): unify runtime skill access around filesystem paths
* refactor(skills): unify runtime skill access around filesystem paths
* feat(skills): align rw package design and fix skill activation, visibility, and lint issues
* refactor(skills): replace rich authoring API with import/reload flow and update
Box design doc
* feat(box): add sandbox_exec tool loop for local-agent calculations
* feat(box): add host workspace mounting and sandbox_exec guidance
* feat(box): add BoxProfile with resource limits and improved output truncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
* feat(box): add obs
* refactor(box): unify box service lifecycle and local runtime
management
* refactor(box): remove legacy in-process runtime code and clean up smells
After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:
- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
`box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
(auto-detected by code).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
* refactor: use rpc
* fix: import
* refactor(box): clean up sandbox subsystem code quality and efficiency
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
* fix: ruff
* refactor(sandbox): keep box logic out of pipeline and localagent
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
* refactor(box): move box runtime to langbot-plugin-sdk
Extract self-contained box runtime modules (actions, backend, client,
errors, models, runtime, security, server) to langbot-plugin-sdk and
update all imports to use `langbot_plugin.box.*`. Keep only service
and
connector in LangBot core as they depend on the Application context.
- Update docker-compose to use `langbot_plugin.box.server` entry
point
- Update pyproject.toml to use local SDK via `tool.uv.sources`
- Remove migrated source files and their unit/integration tests
- Update remaining test imports to match new module paths
* fix: ruff
* fix(box): tighten sandbox exposure and restore box integration coverage
* refactor(types): remove quoted annotations under postponed evaluation
* chore(sandbox): move MCP loader changes to follow-up branch
* refactor(plugins): simplify GitHub install flow to default master archive
* revert(api): restore plugin GitHub import flow in plugins controller
* Improve data-root handling and skill install previews
* Add managed skill authoring tools for local agents
* Refactor the skills UI around sidebar detail pages
* Document why managed skill authoring tools bypass box
* fix: lint
* feat(web): refactor plugin/skill install flows and fix skills page
- Fix sidebar skill icon
- Add skills route and error page component
- Refactor plugin GitHub install from dialog modal to inline card
- Add skill install dropdown menu (create/upload/github) in sidebar
- Wire sidebar → skills page communication via pendingSkillInstallAction context
- Add i18n keys for error page and skill install actions
* fix(web): persist sidebar collapsible section open state on navigation
Sections opened via sub-item navigation now retain their expanded state
when the user switches to a different section, instead of collapsing
because the isActive fallback becomes false.
---------
Co-authored-by: youhuanghe <1051233107@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* feat(sandbox): add MCP box integration on top of sandbox base (#2083)
* refactor(mcp): extract box stdio runtime helper
* refactor(box): introduce reusable workspace session helper
* refactor(box): run Box Runtime as subprocess inside LangBot container
Remove the separate langbot_box_runtime Docker service. Box Runtime
now always launches as a local stdio subprocess, regardless of whether
LangBot runs in Docker or not. The WebSocket transport path is kept
only for explicit runtime_url configuration (remote deployment).
This simplifies deployment by eliminating cross-container path mapping
and network hops. Box Runtime is a pure scheduling process (talks to
Docker socket / nsjail), it does not execute user code or touch the
filesystem, so container isolation is unnecessary — unlike Plugin
Runtime.
* fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor
When switching runner (e.g. local-agent → n8n), the newly mounted stage's
first emit would re-capture the saved snapshot, erasing the dirty state
caused by the runner change. The save button would incorrectly go dim.
- Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty
- Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent)
- Use stable onSubmitRef to prevent useEffect subscription churn
- Add previousInitialValues guard to prevent initialValues echo loops
* style(web): align plugin list header button heights
* docs(review): update Box architecture review documents
Replace old review docs with 5 focused documents:
- box-architecture.md: deep architecture analysis (LangBot + SDK)
- box-issues.md: 22 issues rated P0/P1/P2
- box-test-coverage.md: test coverage analysis
- box-tob-analysis.md: toB commercialization analysis
- box-vs-plugin-runtime.md: Box vs Plugin runtime comparison
* feat(web): improve login error layout and add Terms of Service link
- Improve backend connection error display with bordered container,
inline icon, and better visual hierarchy
- Extract actual error message from axios response object
- Add Terms of Service link (https://langbot.app/terms) to login footer
- Add termsOfService i18n key for all 7 locales
* refactor(web): replace all hardcoded SVG icons with lucide-react
Unify icon usage across the entire frontend by replacing 67 hardcoded
SVG icons with lucide-react components across ~25 files. This improves
consistency, maintainability, and reduces bundle duplication.
Key replacements:
- Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc.
- MCP forms: Loader2, XCircle, Trash2
- Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc.
- Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc.
- Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile
Zero hardcoded <svg> tags remain in .tsx files.
* fix(web): stop polling plugin tasks when no active installs
The PluginInstallTaskProvider was unconditionally polling
getAsyncTasks every 3s on all /home/* routes. Now it only
syncs once on mount and starts periodic polling only when
there are active (non-terminal) install tasks.
* fix(deps): update langbot-plugin version and add new dependencies
* refactor: use Space API for release checks and stop idle polling
- version.py: switch release list API from GitHub to space.langbot.app,
remove unused in-place update logic (update_all, compare_version_str),
translate all comments/logs to English
- PluginInstallTaskContext: only poll when active install tasks exist
* feat(box): add --standalone-box flag and 3-way transport decision for Box runtime
Align Box runtime connection logic with Plugin runtime's pattern:
- Docker: WebSocket to langbot_box container (ws://langbot_box:5411)
- --standalone-box: WebSocket to external Box process (ws://localhost:5411)
- Windows: subprocess + WebSocket (workaround for async stdio limitation)
- Unix/macOS: subprocess + stdio pipe (unchanged)
BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess
lifecycle reuse. Add langbot_box service to docker-compose.yaml.
* refactor(box): use single port with path-based routing for Box WS
Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411.
Update review docs to reflect the single-port architecture.
* feat(web): show Box runtime status in plugin debug info popover
Add Box status section to the debug info popover on the plugin list page,
displaying connection status, backend info, profile, active sessions,
and recent error count. Fetched from GET /api/v1/box/status in parallel
with plugin debug info. Includes i18n for all 8 supported languages.
* fix(web): remove ephemeral sandbox count from Box status display
The active_sessions count reflects transient sandbox containers that
expire after 5 minutes of inactivity, making it misleading in the UI.
Keep only connection status, backend, profile, and error count.
* feat(box): configurable sandbox scope and unified skill containers
Replace the per-message session_id with a template-based system
configurable per pipeline via 'Sandbox Scope' in the local-agent panel.
Default scope is per-chat ({launcher_type}_{launcher_id}).
Unify skill exec into the same container as default exec — skills are
mounted at /workspace/.skills/{name}/ via extra_mounts instead of
getting separate containers. All pipeline-bound skills are injected
at container creation time.
- Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages)
- Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService
- Rewrite native.py skill exec path to use execute_tool with shared session
- Update tests for new session_id format
- Add design doc: docs/review/box-session-scope.md
* feat(web): show active sandbox details in Box status popover
Display sandbox count and a detailed list of active sessions including
session ID, image, backend, resources (CPU/memory), network mode, and
last used time. Fetched from GET /api/v1/box/sessions in parallel.
Includes i18n for all 8 supported languages.
* feat(box): add startup and availability logging for sandbox tools
Log Box runtime initialization result (success with profile info, or
failure warning). Log native tool availability status at ToolManager
startup so it's immediately clear whether exec/read/write/edit tools
are registered for the LLM.
* feat(box): support custom sandbox container image via config.yaml
Add 'image' field to box config section. When set, it overrides the
profile default image (python:3.11-slim) for all sandbox containers.
Priority: caller-specified > config.yaml image > profile default.
* feat(box): add heartbeat and reconnection for Box runtime connector
Add 20-second heartbeat ping loop to detect silent Box runtime
disconnections. On disconnect, set available=false and attempt
reconnection after 3 seconds via the disconnect callback chain.
- BoxRuntimeConnector: heartbeat loop, disconnect callback parameter,
disconnect detection in connection callback and WS failure handler
- BoxService: wire disconnect callback to toggle available state and
re-initialize the connector on reconnection
* feat(web): move runtime status to dashboard, clean up plugin debug popover
Add SystemStatusCards component to the monitoring dashboard showing
Plugin Runtime and Box Runtime connection status with details (backend,
profile, sandbox count). Remove all Box/session status from the plugin
page debug popover — it now only shows debug URL and key.
Includes i18n for all 8 supported languages.
* refactor(web): compact system status into a single card alongside metrics
Replace the separate two-card row with a single compact 'System Status'
card placed as the 5th column in the metrics grid. Shows green/red dots
for Plugin Runtime and Box Runtime. Click to expand a popover with
connection details (backend, profile, sandbox count).
* feat: show connector error details for Plugin and Box runtime status
Record Box connector error in BoxService and expose it as
'connector_error' in GET /api/v1/box/status when unavailable.
Display error messages in the dashboard System Status popover
for both Plugin Runtime (plugin_connector_error) and Box Runtime
(connector_error) when they are disconnected.
* fix(web): auto-refresh system status and show disconnect errors in real time
Poll Plugin Runtime and Box Runtime status every 30 seconds so the
dashboard reflects disconnections without a manual page refresh.
Also re-fetch when the popover is opened for immediate feedback.
* fix(box): handle RPC failure in get_status/get_sessions gracefully
When the Box runtime disconnects, there is a race between the heartbeat
flipping _available=false and the frontend polling get_status(). If the
poll arrives first, client.get_status() throws a ConnectionClosedError
which propagated as a 500, causing the frontend to show a grey dot
(null status) instead of a red dot with error details.
Now get_status() catches RPC errors and returns available=false with
the exception message as connector_error. get_sessions() returns an
empty list when unavailable or on RPC failure.
* fix(box): add persistent reconnection loop with exponential backoff
The previous disconnect handler only retried once and then gave up.
Now spawns a background task that retries with exponential backoff
(3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again.
Uses a _reconnecting guard to prevent duplicate loops. Calls
connector.dispose() before each retry to clean up stale tasks.
* fix(box): detect disconnect when handler.run() returns normally
The generic Handler.run() catches ConnectionClosedError and breaks out
of its loop (normal return) instead of raising, because it has no
disconnect_callback. The old code only triggered reconnection in the
except branch, so a clean WebSocket close was never detected.
Now treat handler.run() returning normally (after successful handshake)
as a disconnect event, triggering the reconnection callback.
* fix(web): refresh system status card when clicking Refresh Data button
Pass a refreshKey prop through OverviewCards to SystemStatusCard that
increments on each Refresh Data click, triggering a re-fetch of Plugin
and Box runtime status alongside the monitoring data refresh.
* fix(web): fix system status card stuck in loading state
fetchStatus(showLoading=false) never called setLoading(false), so the
initial loading=true was never cleared. Simplify to always setLoading
in the finally block — the spinner only shows on the very first load
since subsequent fetches complete near-instantly.
* feat(web): show active sandbox details in dashboard Box status popover
Fetch box sessions alongside status and display each active sandbox
in the popover with session ID, image, resources (CPU/memory), and
last used time.
* feat(box): add global sandbox scope option
Add a 'Global (shared by all)' option to the sandbox scope selector.
Uses a constant '{global}' template variable that always resolves to
'global', so all users and chats share one sandbox container.
* refactor(web): replace popover with dialog for system status details
Replace the dropdown popover with a proper Dialog for runtime status
details. Add a small info button on the System Status card that opens
the dialog. Session details now show in a spacious 2-column grid layout
with full image name, backend, CPU/memory, network, mount path, and
created/last-used timestamps.
* fix(web): widen system status dialog and fix scroll border issue
Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move
overflow-y-auto to an inner container with overflow-hidden on
DialogContent to prevent padding bleed at scroll edges.
* feat(web): add tooltips for truncated fields in system status dialog
Wrap session_id, image, and mount path fields with Tooltip components
so hovering over truncated text shows the full value.
* feat: add download button
* feat: successfully install
* feat: delete old filter
* feat: youhua frontend
* fix: align box runtime launch args
* feat: translate
* feat: refactor market
* feat: youhua qianduan
* chore: rename extension zh translation
* feat(extensions): unify extensions endpoint and refresh extensions page UX
- Rename /home/plugins route to /home/extensions and update all sidebar links.
- Add unified GET /api/v1/extensions returning plugins, MCP servers and skills,
sorted by name; replace the three separate frontend fetches with this single call.
- Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/
Switch/Label) and clean up hardcoded color tokens on the extension card.
- Add a localStorage-persisted "Group by type" switch that, when enabled in the
All Types tab, renders extensions grouped by type with a compact section header.
- Show a spinner while loading and rename the empty-state copy from
"No plugins installed" to "No extensions installed".
- Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(extensions): fallback lucide icon when extension icon is missing
Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension
card when the icon URL is empty or the image fails to load. Picked icons
distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to
avoid visual collision with plugin component badges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(sidebar): unify installed-extensions list with plugins, MCP and skills
- Render plugins, MCP servers and skills together under the "Installed
Extensions" sidebar entry, alphabetically sorted to match the list page.
- Resolve per-item routes by extension type (plugin -> /home/extensions,
mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover
context menu on extensionType === 'plugin'.
- Lift the "group by type" toggle into SidebarDataContext (still persisted
in localStorage) so the sidebar groups items with section headers
whenever the list page has the toggle enabled.
- Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the
LangBot blue for MCP, skill, and missing-icon plugin items, overriding
the SidebarMenuSubButton svg color rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(extensions): mobile-friendly layout for extensions and add-extension pages
- Stack the extensions page header vertically on small screens, let the
filter Tabs scroll horizontally if they overflow, hide the debug
button label below sm and let the install/debug controls wrap.
- Constrain the debug popover and its inputs to the viewport width so
they no longer overflow on phone-sized screens.
- Drop the card grid from a fixed 30rem column to a min(100%, 22rem)
column at base / 28rem at sm, and reduce the gap, so cards render
cleanly at 360px+ widths in both flat and grouped views.
- Make the add-extension header actions wrap on lg- viewports and the
install dialog responsive instead of a hard 500px box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: change ui
* feat: delete version for mcp and skills
* fix: constrain home page content width
* fix: preserve monitoring card borders under sticky filters
* fix(box): restore sandbox config and shared mcp runtime
* fix(box): harden sandbox session isolation
* fix(skill): remove auto activation setting
* feat(skill): align skill system with Claude Code's Tool Call design
- Replace text marker activation with `activate` tool (Tool Call mechanism)
- Replace 7 authoring tools with 2: `activate` + `register_skill`
- Add builtin skills loading from templates/skills/
- Add create-skill as first builtin skill
- Remove SKILL_ACTIVATION_MARKER and text detection methods
- Tool Result returns SKILL.md content (protects KV Cache)
This aligns with Claude Code's progressive disclosure pattern:
- Metadata (name+description) always visible in tool description
- SKILL.md body loaded on activate via Tool Call
- Bundled resources accessible through virtual path mapping
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(tools): add glob and grep native sandbox tools
Add file discovery and content search capabilities to the sandbox:
- glob: Find files by pattern (supports ** recursive matching)
- grep: Search file contents with regex patterns
Both tools respect skill package paths and include safety limits
(max 100 files for glob, max 200 matches for grep).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* feat(skill): add skill file browsing capability
- Add API endpoints for listing/reading/writing skill files
- Add FileTree component in SkillForm for directory browsing
- Users can now view scripts/, references/, assets/ directories
- Files can be selected and edited in the instructions textarea
- Add translations for new file browsing features
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(skill): copy builtin skills to data/skills on startup
- Builtin skills (templates/skills/) are now copied to data/skills/
- Users can view and manage builtin skills in the UI
- Rename SkillAuthoringToolLoader to SkillToolLoader
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(skill): improve file browsing and fix path handling
- Fix nested directory display in skill file tree (preserve root entries)
- Fix file content display when clicking files in skill browser
- Add skill manager and tool manager as proper package modules
- Separate fileContent state to allow editing non-SKILL.md files
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(toolmgr): correct skill_tool_loader attribute name
Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call
and shutdown methods to match the attribute defined in initialize().
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(native): update tool descriptions to use register_skill
Replace references to removed import_skill_from_directory with
register_skill in exec/write/edit tool descriptions.
* feat(toolmgr): enhance tool initialization with backend availability checks
* refactor: remove unused imports and clean up code in various files
* feat: polish extension detail pages
* feat: persist sidebar list expansion
* fix: refine extension ui and backend errors
* fix: align add extension marketplace ui
* feat: manage skills through box runtime
* feat: support github skill installation
* fix: import github skill directories
* feat: install market extensions from card click
* feat(web): improve skill import flow
* feat: polish extension import flow
* fix(mcp): stabilize shared box managed processes
* fix(web): improve backend retry and sidebar scrolling
* docs(review): refresh box architecture review for feat/sandbox
Sync the docs/review/ suite to the current state of the feat/sandbox branch
(both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review.
- box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b}
config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill
Tool Call activation, shared multi-process MCP container, SkillManager,
BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect
- box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail
image conflict, frontend monitoring card) into a Resolved section; add
new P0 (INIT/backend ordering), P1 (extra_mounts immutability after
container creation), P2 (skill_store test gap, integration tests not in CI)
- box-session-scope.md: add §0 Implementation Status — Phase 1 shipped,
MCP unification landed earlier than originally scoped
- box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC),
add 7 new test files including SDK backend_selection/e2b/skill_store
- box-tob-analysis.md: connection recovery now满足基本要求; add E2B and
backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat
- box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned
with Plugin Runtime; revise remaining gaps (WS auth, shared base class)
* refactor(box): use unified env-override mechanism for box.local config
The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two
places (connector._get_box_config and service._local_config), duplicating
logic that LoadConfigStage._apply_env_overrides_to_config already provides
generically via the SECTION__SUBSECTION__KEY convention.
- Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight
from instance_config (the unified BOX__LOCAL__* overrides are already
applied before BoxService initializes)
- Harden _load_allowed_mount_roots to accept a comma-separated string,
since the generic mechanism stores a freshly-created key as a raw
string when config.yaml has no box.local.allowed_mount_roots entry
- docker-compose: rename the langbot container env vars to
BOX__LOCAL__* (the canonical convention); remove them entirely from
the langbot_box container — the Box runtime never reads box.local from
env/config.yaml, it is configured via the INIT RPC action
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: repair stale skill/sandbox tests for feat/sandbox
The skill subsystem moved to Tool-Call activation and a Box-managed
skill store; several tests still asserted removed APIs and a sys.modules
stub leaked across the suite. Full unit suite now green (was 23 failing).
- test_skill_tools: drop TestSkillManagerActivation (text-marker API
removed); rewrite TestSkillActivationHelper around the current
skill.activation.register_activated_skill; replace the CRUD
TestSkillAuthoringToolLoader with TestSkillToolLoader covering the
current activate/register_skill tools and sandbox-availability gating
- test_tool_manager_native: ToolManager attr is skill_tool_loader (not
skill_authoring_tool_loader); native loader now exposes 6 tools
(exec/read/write/edit/glob/grep) and requires initialize() with a
backend-available get_status()
- test_localagent_sandbox_exec: remove obsolete activation-marker
leakage tests and their helper providers
- test_model_service / pipeline conftest: give the mocks skill_mgr=None
so PreProcessor's local-agent skill-binding guard short-circuits
- test_n8nsvapi: stop permanently overwriting sys.modules
('langbot.pkg.provider.runner' etc.); save and restore around the
import so other modules get the real LocalAgentRunner base class
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(tests): run unit tests on every push to feat/** branches
- Add feat/** to push branches so long-lived feature branches are
tested on every push (they accumulate large changes before a PR)
- Drop the push path filter entirely: every push to master/develop/
feat/** now runs the full unit suite (the old 'pkg/**' filter never
matched the real source path 'src/langbot/pkg/**', so backend-only
pushes silently skipped tests)
- Fix the same broken path glob on the pull_request trigger
('pkg/**' -> 'src/langbot/pkg/**')
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skill): harden mount/reload paths and HTTP errors against stale skill cache
The Box backends behave inconsistently when extra_mounts reference a
missing host directory (nsjail aborts the entire sandbox start, Docker
silently creates a root-owned empty dir on the host, E2B silently skips
the upload). The cache in skill_mgr.skills is only refreshed on
in-process mutations, so out-of-band changes — container rebuilds,
manual rm in the box volume, anything the LangBot API didn't drive —
leave a stale skill that later produces one of those bad mount paths.
- box/service.py: build_skill_extra_mounts now filters skills whose
package_root is not isdir on the LangBot-visible filesystem and logs
a warning, instead of passing the bad mount through to the backend
- skill/manager.py: reload_skills (Box path) drops skills whose
package_root is missing on the LangBot-side filesystem before they
reach the in-memory cache, with a summary warning
- api/http/controller/groups/skills.py: file/CRUD handlers now also
catch BoxError (RuntimeError subclass, previously slipping past
``except ValueError`` and surfacing as 500); list/get handlers gain
a try/except so a transient Box RPC failure becomes a clean 400
instead of a stack trace
Tests added for build_skill_extra_mounts (skip missing, skip empty,
no skill manager) and SkillManager.reload_skills (drop missing on Box
path). Full unit suite: 279 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(box): add box.enabled toggle and gate consumers on availability
Make the Box sandbox runtime optional. When ``box.enabled`` is false in
config (or when an enabled Box fails to connect), every dependent feature
degrades to the same disabled-state UX rather than crashing or silently
falling back to less safe code paths.
Backend:
- config.yaml: new top-level ``box.enabled: true`` flag (default true)
- BoxService:
- Read box.enabled on construction
- initialize() short-circuits when disabled — no remote WS connect, no
stdio subprocess fork
- _on_runtime_disconnect is a no-op when disabled (no reconnect loop
on a deliberately-off service)
- get_status() now exposes ``enabled`` so the frontend can tell
"disabled in config" from "configured but failed"
- MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to
be available, not just installed
- MCP _init_stdio_python_server: when ap.box_service exists but is
unavailable, refuse the stdio server with an actionable error instead
of silently falling through to host-stdio (which bypasses the sandbox
the operator asked for). Setups without ap.box_service installed at
all keep the legacy host-stdio fallback for pre-Box dev mode
- SkillService._require_box_for_write: refuses create/update/install/
write_skill_file when ap.box_service is installed but unavailable.
Distinguishes disabled vs failed in the error message so the UI can
surface the right hint. Legacy setups (no ap.box_service) keep the
local fallback path — that distinction is what keeps the existing
local-skills tests valid
Tests:
- Box disabled-state behavior (4 cases)
- Skill write refusal in disabled & failed states (7 cases)
- MCP stdio runtime info policy updated to match new refuse-when-down
behavior
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): surface Box disabled/unavailable state across consumers
When Box is disabled in config (``box.enabled = false``) or fails to
connect, every dependent UI surface now degrades visibly:
- ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``,
``disabled`` (config-off) and a single ``hint`` key so callers don't
have to re-derive the three states
- ``BoxUnavailableNotice`` reusable Alert banner driven by that hint
- Dashboard SystemStatusCards: three-state dot + label
(connected / disabled-gray / disconnected-red); disabled state shows
the ``boxDisabled`` hint, failed state continues to show the connector
error. Plugin block kept untouched
- Skills page (create view) and SkillDetailContent (edit view):
Save button disabled and banner inserted above the form when Box is
unavailable — matches the backend gate added in the previous commit
- PipelineExtension skill section: ``enable_all_skills`` switch, Add
Skill button and Remove buttons all gate on Box availability;
banner inline under the section header
- PipelineFormComponent: banner above the ``local-agent`` stage card
when Box is unavailable, since that stage carries the sandbox-bound
``box-session-id-template`` field
- Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale
files updated with ``boxDisabled`` / ``boxUnavailable`` /
``boxRequiredHint`` strings
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(box): document the box.enabled toggle and gate behavior matrix
- docker-compose: move ``langbot_box`` under compose profiles
(``box`` and ``all``) so ``docker compose up`` no longer requires
the sandbox container. Inline comment explains how to pair the
profile choice with ``box.enabled`` so the langbot service does not
thrash trying to reach a runtime that was never started
- docs/review/box-architecture.md:
- Annotate ``box.enabled`` in the config.yaml example, listing the
exact side effects (no remote/stdio connect; tools/skills/MCP
stdio off; reads still work)
- Replace the bare compose snippet with the actual profile-driven
invocation and the BOX__ENABLED pairing
- New "关闭/连接失败时的行为矩阵" section: a single table mapping
every consumer (native tools, activate/register_skill, stdio MCP,
skill list/CRUD, pipeline AI config, extensions page, dashboard)
to its disabled-state behavior, plus the legacy ``ap.box_service``
distinguisher note
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip
The previous commit hard-coded a BoxUnavailableNotice banner above the
``local-agent`` stage card. That works, but it shouts at the user about
every field in that stage when in reality only one field —
``box-session-id-template`` — depends on the sandbox.
Use the dynamic-form schema's existing variable-injection mechanism
(``__system.*`` references via ``systemContext``) and add a sibling to
``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays
visible, becomes inert, and an info icon next to its label exposes the
reason on hover. The rest of the AI tab is left untouched.
- entities/form/dynamic.ts: extend IDynamicFormItemSchema with
``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject``
- DynamicFormComponent: evaluate disable_if with the same resolver as
show_if; OR the result into isFieldDisabled; render an Info tooltip
trigger next to the label when the condition matches
- ai.yaml metadata: attach disable_if (__system.box_available eq false)
and a localized disabled_tooltip to box-session-id-template
- PipelineFormComponent: drop the BoxUnavailableNotice import and the
per-stage banner; pass ``systemContext={ box_available: boxAvailable }``
only for the local-agent stage so other stages aren't paying the
re-render cost
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp): friendly UI message when stdio MCP refused by Box state
Previously the MCP detail dialog dumped the raw RuntimeError text from
``_init_stdio_python_server`` — English-only, prefixed with "Failed
after 4 attempts", and exposing internal config names. The retry
wrapper also kept retrying a refusal that is deterministically going
to fail again, polluting logs.
Replace the raw text with a structured signal:
- New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio
refusal path sets it before raising and uses a short opaque
discriminator (``box_disabled_in_config`` / ``box_unavailable``) as
the message body — never user-facing
- ``_lifecycle_loop_with_retry`` short-circuits on
``BOX_UNAVAILABLE``: surfaces the error immediately, no retries,
no "Failed after N attempts" prefix. Silences the warning storm
seen during smoke-testing
- ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``,
``retry_count``, ``box_session_id``, ``box_enabled`` to match what
the backend already returns in get_runtime_info_dict()
- Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and
``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect
``error_phase === 'box_unavailable'`` and render a two-line
localized notice: state line ("Box disabled / unreachable") plus
remediation line ("enable Box or switch to http/sse")
- 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get
``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``,
``mcp.boxStdioRefusedSuggestion``
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp-web): block stdio MCP creation at the form when Box is unavailable
When Box is disabled in config (``box.enabled = false``) or unreachable,
saving a new MCP server in stdio mode produced one that could never
start — the user would only learn that from the runtime error on the
detail page. Stop the user before they save instead.
Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog
``MCPFormDialog.tsx``) now:
- Disable the ``stdio`` option in the mode select when Box is
unavailable, with a small "(requires Box)" suffix so the reason is
obvious. Existing stdio configs still display their current value
- Show ``BoxUnavailableNotice`` inline under the mode select when the
currently-selected mode is stdio and Box is unavailable, so editing
a stale stdio config makes the cause visible
- Disable the Save / Submit button while stdio is selected under that
condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop
so the parent ``MCPDetailContent`` can disable both its Submit and
Save buttons. ``MCPFormDialog`` disables its Save button locally
- Refuse the submit handler too (Enter-key path) with a toast carrying
the same i18n message
i18n: ``mcp.boxRequired`` (short tag in the disabled option) and
``mcp.stdioBlockedByBoxToast`` added to all 8 locales.
Backend runtime gate (``_init_stdio_python_server`` refusal +
``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place
as the last line of defence for API bypass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(web): prevent plugin config form overflow
* refactor(skill): remove all local-filesystem fallbacks; Box is the sole source
Skills now flow exclusively through the Box runtime. Every read and write
method funnels through ``_box_service()``; when Box is unavailable
(disabled in config, connection failed, or simply not installed) the
operation either returns an empty surface (``list_skills`` → []) or
raises with a clear ``Box runtime ... not initialised / disabled /
unavailable: ...`` message via the new ``_require_box(action)`` helper.
Why: the legacy local-fallback path scanned ``data/skills/``, but Box
manages its own ``box.local.skills_root`` (default ``data/box/skills/``).
The two diverging directories caused stale / phantom skill lists when
Box flapped, and the local-fallback writes silently bypassed all the
sandboxing the operator had configured.
SkillService (``api/http/service/skill.py``):
- New ``_require_box(action)`` returns the box service or raises a
structured ValueError. ``_require_box_for_write`` kept as alias
- ``list_skills`` → returns [] when Box is down so the UI can render
the disabled banner cleanly
- ``get_skill`` / ``get_skill_by_name`` → return None
- All read-file / write-file / scan-dir / create / update / delete /
install / preview methods → ``_require_box`` then box delegate.
Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview
pipelines) removed entirely
SkillManager (``pkg/skill/manager.py``):
- ``reload_skills`` returns early with empty cache when Box is down.
data/skills/ discovery loop removed
- ``refresh_skill_from_disk`` now just reports cache presence; the
on-disk re-parse is gone since Box is the only writer
Tests:
- Drop 11 obsolete test_skill_service.py tests that exercised the
removed local-fallback paths (create/install/file/delete/update)
- Add list-empty + read-refused tests; flip the legacy-allow test to
legacy-refuses-too
- Rewrite refresh_skill_from_disk test to match the new behaviour
Several helper methods (_managed_skill_path, _resolve_skill_path,
_preview_skill_candidates, _install_preview_candidates, etc.) are now
unreachable; a follow-up commit will prune them so this diff stays
reviewable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(skill): prune dead local-filesystem helpers left over from Box migration
Follow-up to the Box-only refactor. The previous commit removed the
local-fallback BRANCHES from every public method; this one removes the
HELPERS those branches called, which are now unreachable.
SkillService (service/skill.py): 787 → 449 lines
Removed: scan_directory (sync), _read_skill_package, _write_skill_md,
_resolve_create_field, _managed_skill_path,
_managed_install_root_for_package, _normalize_package_root,
_resolve_skill_path, _find_skill_entry, _discover_skill_directories,
_safe_extract_zip, _extract_uploaded_skill_to_temp,
_download_github_skill_to_temp, _resolve_github_source_root,
_build_preview_target_dir, _preview_skill_candidates,
_select_preview_candidates, _install_preview_candidates,
_preview_source_root, _resolve_installed_skills, plus the
module-level _FRONTMATTER_FIELDS and _build_skill_md.
Kept (still needed by the surviving GitHub-import path):
_download_github_asset, _download_github_skill_directory_as_zip,
_find_github_skill_archive_entry, _copy_github_skill_directory_to_zip,
_is_github_skill_md_url, _parse_github_skill_md_url,
_resolve_github_skill_md_package_name, _validate_github_asset_url,
_uploaded_skill_target_stem, _validate_skill_name.
Imports dropped: shutil, tempfile, yaml, ....utils.paths.
SkillManager (skill/manager.py): 187 → 88 lines
Removed: get_managed_skills_root, _discover_skill_directories,
_find_skill_entry, _load_skill_file, _normalize_package_root.
Imports dropped: datetime, parse_frontmatter, paths.
Tests:
- test_skill_service.py: drop the 3 sync scan_directory tests +
skill_service fixture + _create_skill_file helper
- test_skill_tools.py: drop test_load_skill_file_success; rename
TestSkillManagerPackageLoading → TestSkillManagerCache
Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skill): re-inject skill index into local-agent system prompt
The contributor's original PR (#1917) appended an ``Available Skills``
index to the system prompt before the LLM saw the user message, so the
LLM could decide whether to activate a skill. ``7145447b`` removed the
text-marker activation flow and, together with it, the entire system
prompt injection — but the Tool Call replacement only put the available
skills inside the ``activate`` tool's description. In practice the LLM
ignores tool descriptions for selection and goes straight to native
tools, so user-visible skill activation silently broke.
Restore the injection, adapted for the Tool Call era:
- SkillManager regains ``get_skill_index(bound_skills)`` and
``build_skill_aware_prompt_addition(bound_skills)``. The addendum
carries only ``name (display_name): description`` for each
pipeline-visible skill plus one instruction line pointing at the
``activate`` tool. No SKILL.md contents — KV cache stays clean
- PreProcessor appends the addendum to the first system message (or
inserts a new one) of ``query.prompt.messages`` for the local-agent
runner. Handles plain-string and ContentElement[] bodies. Skips
cleanly when no skills are visible
- 3 new test_preproc cases: injection happens, bound-skills subset
honoured, empty addendum touches nothing. 280 passed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(box): downgrade get_status.available when backend probed unavailable
Until now ``BoxService.get_status`` returned ``available: true`` whenever
the runtime connector was healthy, even if the runtime itself reported
``backend: { available: false }`` (operator selected nsjail without the
binary, Docker daemon crashed mid-session, E2B credentials wrong, ...).
The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the
top-level flag and showed "connected" while every actual call to native
exec or skill management would fail.
The native-tool loader already polled ``status.backend.available``
independently and hid its tools correctly, but every other consumer
(dashboard banner, the disabled-state hint, the LLM-facing message)
disagreed with it.
Combine the two in the payload: ``available = self._available AND
status.backend.available``. When ``backend.available`` is false we now
also surface a ``connector_error`` that names the backend
("Configured sandbox backend \"nsjail\" is unavailable") so the dialog
shows the actionable reason instead of an empty error pane. The
detailed ``backend`` object is preserved unchanged for the dialog.
Internal ``box_service.available`` (used by ``skill_service`` writes,
``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally
NOT changed — it still tracks connector health only, so a backend blip
does not trigger spurious reconnect loops.
Tests:
- ``test_get_status_downgrades_available_when_backend_dead`` — exercise
the new branch (connector OK, backend.available=false → top-level
available=false, connector_error mentions the backend name)
- ``test_get_status_keeps_available_true_when_backend_ok`` — guard
against regressing the happy path
Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary):
``GET /api/v1/box/status`` now returns ``available: false`` with the
named connector_error, instead of the previous misleading
``available: true``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): surface the specific Box failure reason in unavailable banner
When Box is configured but the runtime reports its backend is dead
(e.g. ``box.backend = nsjail`` but the binary is missing, or Docker
daemon crashed), the backend now returns a structured
``connector_error`` like ``Configured sandbox backend "nsjail" is
unavailable``. The previous notice only said "Box sandbox is
unavailable" + a generic "enable Box" hint, hiding the actionable
detail.
- ``useBoxStatus``: derive ``reason`` from ``status.connector_error``.
Only exposed for the failed-state (``hint === 'boxUnavailable'``),
since the disabled-by-config message already carries its reason
- ``BoxUnavailableNotice``: insert the reason as a small monospaced
line between the state message and the action hint. The disabled
variant is unchanged (operator chose the state)
- Wire ``reason`` through every existing call site (Skills page +
detail, PipelineExtension, both MCP forms). Old unused ``context``
prop dropped
Net layout (3 lines, still compact):
⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ...
Configured sandbox backend "nsjail" is unavailable
This feature requires the Box runtime. Enable it in config ...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: reconcile master's unit tests with feat/sandbox refactors
The merge from master brought in new unit tests that target pre-refactor
APIs on feat/sandbox. Reconcile each:
- factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills
dict + inert prompt-addition builder) and a Mock pipeline_service so the
PreProcessor skill-index injection branch can run end-to-end in tests.
- pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so
pipeline.stage is fully initialised before any individual stage test
(preproc, longtext, ...) tries to lazy-load it. Without this preload,
running test_preproc.py in isolation hit a circular-import error via the
stage -> app -> pipelinemgr -> stage chain.
- provider/test_tool_manager.py: ToolManager now probes four loaders
(native -> plugin -> mcp -> skill). Inject inert native + skill mocks in
the execute_func_call fixture and assert all four shutdowns fire.
- utils/test_paths.py: drop the three cwd-dependent _check_if_source_install
cases. The refactor walks Path(__file__).resolve().parents looking for
pyproject.toml + main.py, so cwd no longer factors in and there's no
file read to mock-fail. The positive case and caching test still apply.
- utils/test_version.py: delete entirely. is_newer and compare_version_str
were removed when VersionManager was refactored to use the Space API for
release checks (1b4107a9); the tests targeted a surface that no longer
exists.
* refactor(box): launch box runtime via the lbp CLI subcommand
Mirror the plugin runtime: box is now started through the same CLI entry
point (langbot_plugin.cli) instead of the box module directly.
- docker-compose.yaml: langbot_box command runs `langbot_plugin.cli ... box`
(WebSocket is the default transport, no flag needed — matches `rt`).
- box/connector.py: both subprocess launch sites (_start_local_stdio and
the Windows _start_subprocess_then_ws path) invoke
`langbot_plugin.cli.__init__ box`, using `-s` for the stdio transport.
- docs/review: update stale `-m langbot_plugin.box[.server]` references.
Pairs with the SDK change that removes box's direct-launch entry points
(python -m langbot_plugin.box / .box.server) and the legacy --mode flag.
* chore: bump langbot-plugin beta 1
* fix(ci): resolve langbot-plugin from PyPI and clear lint failures
CI on feat/sandbox failed across Unit Tests, Lint and Build Dev Image.
Root causes and fixes:
- pyproject.toml had a [tool.uv.sources] editable override pinning
langbot-plugin to ../langbot-plugin-sdk. That path only exists in a
paired local checkout, so `uv sync` failed on every CI runner
("Distribution not found"). Remove the override and regenerate uv.lock
so langbot-plugin==0.4.0b1 resolves from PyPI, matching master.
- tests/integration/api/test_pipelines.py: the pipeline extensions
endpoint now calls ap.skill_service.list_skills(); add the missing
skill_service mock to the fake_pipeline_app fixture (the test came
from master, the endpoint change from feat/sandbox).
- Apply ruff format to three src files and prettier to three web files
that had committed formatting drift, failing `ruff format --check`
and `pnpm lint`.
* chore: bump beta version
* docs: remove BOX_BACKEND override reference
* fix(pipelines): stop attributing dashboard debug WS to bound web_page_bot
The dashboard pipeline-debug WebSocket
(/api/v1/pipelines/<uuid>/ws/connect) and the embed widget WebSocket
(/api/v1/embed/<bot_uuid>/ws/connect) already live on separate paths,
but the debug handler ran `_find_owner_bot(pipeline_uuid)` and, when
the same pipeline happened to be bound to a web_page_bot, passed that
bot as `owner_bot` into `handle_websocket_message`. The adapter then
used the page bot's listeners + adapter for the request, so debug
sessions were logged as "page bot" activity in the dashboard.
Debug sessions must always run under the built-in websocket_proxy_bot.
Remove `_find_owner_bot`, drop the `owner_bot` parameter from the
debug-path `_handle_receive`, and call `handle_websocket_message`
without it so the adapter takes its default proxy-bot branch. The
embed handler still resolves and passes its `runtime_bot` for the
page-bot path, so attribution there is unchanged.
* fix(plugin): install marketplace MCP from canonical mode + extra_args
_install_mcp_from_marketplace read the dropped `mcp_data.config` field
and reconstructed mode/extra_args by guessing from the URL — which lost
stdio's command/args/env/box entirely, so stdio MCP installs from the
marketplace always failed.
Use the Space record's canonical `mode` and `extra_args` directly (the
same shape stored in mcp_servers), and gate the install on `mode`
instead of the removed `config`. After a successful install, best-effort
POST to the marketplace install endpoint to bump install_count.
* feat(web): show recommendation lists in plugin market; mixed-type icons
The marketplace recommendation lists (curated rows from Space) were never
mounted in the plugin market page. Wire them in:
- fetch recommendation lists on mount and render them above the extension
grid, only when no search/filter is active.
Recommendation lists now mix plugins, MCPs and skills, so resolve each
card's icon by type (plugin / mcp / skill marketplace icon URL) instead of
always using the plugin icon endpoint.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(web): auto-open install dialog from one-click deep link
Accept a deep link from LangBot Space's one-click install:
/home/add-extension?install=1&extension_type=<plugin|mcp|skill>&author=&name=&version=
On mount, populate the install info, open the confirm dialog directly, and
strip the params from the URL. Reuses the existing marketplace install flow.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: push marketplace URL to runtime; fix market client base race
- On connecting to the plugin runtime, push the configured space.url via the
new SET_RUNTIME_CONFIG action so the runtime downloads plugins from the same
Space, instead of relying on its own CLOUD_SERVICE_URL env/default. Wrapped
in try/except so an older SDK without the action degrades gracefully.
- web: the plugin market fetched recommendation lists (and listings) via the
sync cloud client before its baseURL was resolved from system info, so it
hit the default space.langbot.app. Await getCloudServiceClient() before the
initial fetches and for the recommendation list.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(web): don't show MCP "connection failed" while still connecting
The MCP status UI rendered "连接失败" for any non-connected state, so during a
normal connection attempt the subtitle showed "连接失败" while the status pill
below it showed "连接中..." — contradictory.
Only treat an explicit ERROR (or box-unavailable) status as failed; a
CONNECTING or initial/unresolved status now shows "连接中". Applied to the MCP
detail form (subtitle + StatusDisplay) and the MCP server card.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(web): type-aware install dialog + refresh sidebar after install
The marketplace install confirm dialog was hardcoded to "安装插件 / 确定要安装
插件 X 吗" for every type. Make it type-aware (plugin / MCP / skill) and show
more info: type chip, author/name id, and version when present.
Also refresh all sidebar extension lists (plugins, MCP servers, skills) when
an install task completes, so the newly-installed extension appears
immediately regardless of type (previously only refreshPlugins ran).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(web): richer install dialog (icon + name + description), drop redundant type row
The install dialog already states the type in its title, so the "类型" row was
redundant. Replace the info box with the extension's icon (avatar), display
name, author/name id + version, and description — built from the PluginV4 for
in-app installs and from the icon endpoint by type for the one-click deep link.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(web): TDZ crash in add-extension (installIconURL before installInfo)
installIconURL was computed above the useState declaration of installInfo,
causing "Cannot access 'installInfo' before initialization" (500) on the
add-extension page. Move the computation below the state declarations.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(web): redesign install-progress dialog for MCP/skill
The progress dialog showed plugin-only stages (download + dependency install)
for every type. MCP/skill have no such steps, so show a single
"installing → done/failed" row for them (MCP: adding & connecting the server;
skill: installing the package) while keeping the detailed download/deps
stages for plugins.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(web): add missing market.componentName i18n keys
The marketplace component filter (and component badges) used
market.componentName.{Tool,Command,EventListener,KnowledgeEngine,Parser,Page}
but those keys only existed under plugins.componentName, so the market UI
showed raw keys. Add a componentName block to the market namespace (zh-Hans +
en-US; other locales fall back to zh-Hans).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(web): sidebar extensions refresh button + full-name tooltip
- Add a refresh button to the installed-extensions category header in the
sidebar; it re-fetches plugins + MCP servers + skills and spins while
loading.
- The sidebar item tooltip now shows the extension's full name (with the
description below when present), so truncated MCP/extension names are
readable on hover.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(plugin-market): rename component filter to "插件组件" with hint tooltip + persist filters
- Rename the in-app plugin market component filter label to "插件组件" /
"Plugin Component"
- Add an Info icon tooltip explaining what plugin components are (Tool /
Command / EventListener, etc.)
- Persist filter selections (type / component / tags / sort) in localStorage
so they survive reloads; restored on mount (URL type param still wins)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(plugin-market): restore missing "页面"(Page) component filter option
The market component-filter list on this branch was a diverged rewrite that
dropped the Page component kind master had added. The i18n key
(market.componentName.Page) already existed; re-add the Page entry to the
componentOptions list so plugins providing Page components can be filtered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(i18n): reword plugin component filter hint
Drop the redundant "插件组件是" lead-in and mention that components extend
LangBot's capabilities; mirror the wording in en-US.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(i18n): backfill missing market/addExtension keys in 6 locales
check-i18n surfaced that market.componentName.*, market.filterByComponentHint
and the addExtension.install* keys existed only in en-US/zh-Hans. Backfill
them for es-ES, ja-JP, ru-RU, th-TH, vi-VN and zh-Hant (reusing each locale's
existing component-name translations) and align the filterByComponent label
with the new "Plugin Component" wording. check-i18n now passes for all locales.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* i18n(plugins): relabel "group by type" as "group by format"
The installed-extensions grouping is by extension format (plugin / MCP / skill),
so rename the toggle label accordingly across all 8 locales (key unchanged).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(plugin-market): cursor-pointer on tag filter trigger
The TagsFilter Select trigger used the default cursor; add cursor-pointer so the
tag filter is clearly clickable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(sidebar): show edition badge (Community / Cloud) in logo area
Add a small badge next to the LangBot name in the sidebar header that reflects
systemInfo.edition: a neutral "Community" badge for the community edition and a
blue "Cloud" badge for the cloud edition. Adds sidebar.editionCommunity /
sidebar.editionCloud across all 8 locales.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* i18n(sidebar): unify zh-Hans cloud edition label to 云端版
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(sidebar): edition badge - drop hover, use "Cloud" in all locales
The edition badge is not interactive, so remove the hover background on the
cloud badge. Also use the literal "Cloud" label uniformly across all locales
instead of localized variants (云端版/クラウド版/...).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(box): cap tool-call loop and run workspace-quota walk off the event loop
Two robustness fixes that bite under normal sandbox usage (not just attack),
hardening the self-hosted community edition before release:
- localagent: cap the tool-call loop at MAX_TOOL_CALL_ROUNDS (128). A looping
or adversarial model could otherwise emit tool calls indefinitely (each
potentially a sandbox exec), producing a non-terminating request and runaway
cost. The cap is generous enough not to interrupt legitimate multi-step
agentic workflows.
- box.service: make _enforce_workspace_quota async and run the recursive
workspace scan via asyncio.to_thread. It ran on every quota-enforced exec and
a large workspace would block the whole asyncio runtime (all bots/pipelines).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(review): refresh box docs; trim issue list to SaaS blockers only
Community self-hosted edition is release-ready, so the box review docs are
updated to current state (date 2026-06-02 + status note) and box-issues.md is
rewritten to keep only the SaaS / multi-tenant / network-exposed release
blockers (S1-S8): unauthenticated control plane, no per-pipeline exec
authorization, unbounded sessions + no reaper, no kernel-level quota, mount
validation gaps (/ + extra_mounts), missing container hardening, lock-around-
cold-start, and the lower-severity follow-ups. Resolved items (tool-call loop
cap, async quota scan, host_path mount allowlist, _is_path_under dedup) moved to
a short "resolved before community release" record; community-only and
pure-cleanup items dropped.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore(deps): pin langbot-plugin to 0.4.0
Track the stable SDK release (0.4.0b1 -> 0.4.0); regenerate uv.lock.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: WangCham <651122857@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: fdc310 <82008029+fdc310@users.noreply.github.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
This commit is contained in:
@@ -15,7 +15,7 @@ class FakeApp:
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
command_prefix: list[str] = ["/", "!"],
|
||||
command_prefix: list[str] = ['/', '!'],
|
||||
command_enable: bool = True,
|
||||
pipeline_concurrency: int = 10,
|
||||
admins: list[str] | None = None,
|
||||
@@ -40,6 +40,8 @@ class FakeApp:
|
||||
self.telemetry = self._create_mock_telemetry()
|
||||
self.survey = None
|
||||
self.cmd_mgr = self._create_mock_cmd_mgr()
|
||||
self.skill_mgr = self._create_mock_skill_mgr()
|
||||
self.pipeline_service = self._create_mock_pipeline_service()
|
||||
|
||||
# Apply any extra attributes for specific test scenarios
|
||||
for name, value in extra_attrs.items():
|
||||
@@ -98,9 +100,9 @@ class FakeApp:
|
||||
):
|
||||
instance_config = Mock()
|
||||
instance_config.data = {
|
||||
"command": {"prefix": command_prefix, "enable": command_enable},
|
||||
"concurrency": {"pipeline": pipeline_concurrency},
|
||||
"admins": admins,
|
||||
'command': {'prefix': command_prefix, 'enable': command_enable},
|
||||
'concurrency': {'pipeline': pipeline_concurrency},
|
||||
'admins': admins,
|
||||
}
|
||||
return instance_config
|
||||
|
||||
@@ -119,6 +121,20 @@ class FakeApp:
|
||||
cmd_mgr.execute = AsyncMock()
|
||||
return cmd_mgr
|
||||
|
||||
def _create_mock_skill_mgr(self):
|
||||
"""Mock SkillManager that returns no skill index addition by default."""
|
||||
skill_mgr = Mock()
|
||||
skill_mgr.skills = {}
|
||||
skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
|
||||
skill_mgr.get_skill_index = Mock(return_value=[])
|
||||
return skill_mgr
|
||||
|
||||
def _create_mock_pipeline_service(self):
|
||||
"""Mock PipelineService.get_pipeline returning empty extensions prefs."""
|
||||
pipeline_service = AsyncMock()
|
||||
pipeline_service.get_pipeline = AsyncMock(return_value={'extensions_preferences': {}})
|
||||
return pipeline_service
|
||||
|
||||
def capture_message(self, message):
|
||||
"""Capture an outbound message for test assertions."""
|
||||
self._outbound_messages.append(message)
|
||||
@@ -134,4 +150,4 @@ class FakeApp:
|
||||
|
||||
def fake_app(**kwargs) -> FakeApp:
|
||||
"""Create a FakeApp instance with optional overrides."""
|
||||
return FakeApp(**kwargs)
|
||||
return FakeApp(**kwargs)
|
||||
|
||||
@@ -20,6 +20,7 @@ pytestmark = pytest.mark.integration
|
||||
|
||||
# ============== FIXTURE FOR SYS.MODULES ISOLATION ==============
|
||||
|
||||
|
||||
@pytest.fixture(scope='module')
|
||||
def mock_circular_import_chain():
|
||||
"""Break circular import chain for API controller."""
|
||||
@@ -53,21 +54,25 @@ def mock_circular_import_chain():
|
||||
):
|
||||
# Import groups after mocking to populate preregistered_groups
|
||||
import langbot.pkg.api.http.controller.groups.pipelines.pipelines as _pipelines # noqa: E402, F401
|
||||
|
||||
yield
|
||||
|
||||
|
||||
# ============== FAKE APPLICATION WITH PIPELINE SERVICES ==============
|
||||
|
||||
|
||||
@pytest.fixture(scope='module')
|
||||
def fake_pipeline_app():
|
||||
"""Create FakeApp with pipeline-specific services (module scope for reuse)."""
|
||||
app = FakeApp()
|
||||
|
||||
# Pipeline config
|
||||
app.instance_config.data.update({
|
||||
'api': {'port': 5300},
|
||||
'system': {'allow_modify_login_info': True, 'limitation': {}},
|
||||
})
|
||||
app.instance_config.data.update(
|
||||
{
|
||||
'api': {'port': 5300},
|
||||
'system': {'allow_modify_login_info': True, 'limitation': {}},
|
||||
}
|
||||
)
|
||||
|
||||
# Auth services
|
||||
app.user_service = Mock()
|
||||
@@ -79,25 +84,31 @@ def fake_pipeline_app():
|
||||
|
||||
# Pipeline service
|
||||
app.pipeline_service = Mock()
|
||||
app.pipeline_service.get_pipeline_metadata = AsyncMock(return_value=[
|
||||
{'name': 'trigger', 'stages': []},
|
||||
{'name': 'ai', 'stages': []},
|
||||
])
|
||||
app.pipeline_service.get_pipelines = AsyncMock(return_value=[
|
||||
{
|
||||
app.pipeline_service.get_pipeline_metadata = AsyncMock(
|
||||
return_value=[
|
||||
{'name': 'trigger', 'stages': []},
|
||||
{'name': 'ai', 'stages': []},
|
||||
]
|
||||
)
|
||||
app.pipeline_service.get_pipelines = AsyncMock(
|
||||
return_value=[
|
||||
{
|
||||
'uuid': 'test-pipeline-uuid',
|
||||
'name': 'Test Pipeline',
|
||||
'description': 'Test description',
|
||||
'created_at': '2024-01-01T00:00:00',
|
||||
'updated_at': '2024-01-01T00:00:00',
|
||||
'is_default': False,
|
||||
}
|
||||
]
|
||||
)
|
||||
app.pipeline_service.get_pipeline = AsyncMock(
|
||||
return_value={
|
||||
'uuid': 'test-pipeline-uuid',
|
||||
'name': 'Test Pipeline',
|
||||
'description': 'Test description',
|
||||
'created_at': '2024-01-01T00:00:00',
|
||||
'updated_at': '2024-01-01T00:00:00',
|
||||
'is_default': False,
|
||||
'config': {},
|
||||
}
|
||||
])
|
||||
app.pipeline_service.get_pipeline = AsyncMock(return_value={
|
||||
'uuid': 'test-pipeline-uuid',
|
||||
'name': 'Test Pipeline',
|
||||
'config': {},
|
||||
})
|
||||
)
|
||||
app.pipeline_service.create_pipeline = AsyncMock(return_value={'uuid': 'new-pipeline-uuid'})
|
||||
app.pipeline_service.update_pipeline = AsyncMock(return_value={})
|
||||
app.pipeline_service.delete_pipeline = AsyncMock()
|
||||
@@ -112,6 +123,10 @@ def fake_pipeline_app():
|
||||
app.mcp_service = Mock()
|
||||
app.mcp_service.get_mcp_servers = AsyncMock(return_value=[])
|
||||
|
||||
# Skill service (for extensions endpoint)
|
||||
app.skill_service = Mock()
|
||||
app.skill_service.list_skills = AsyncMock(return_value=[])
|
||||
|
||||
# Plugin connector (for extensions endpoint)
|
||||
app.plugin_connector.list_plugins = AsyncMock(return_value=[])
|
||||
|
||||
@@ -130,6 +145,7 @@ async def quart_test_client(fake_pipeline_app, http_controller_cls):
|
||||
|
||||
# ============== PIPELINE ENDPOINT TESTS ==============
|
||||
|
||||
|
||||
@pytest.mark.usefixtures('mock_circular_import_chain')
|
||||
class TestPipelineMetadataEndpoint:
|
||||
"""Tests for /api/v1/pipelines/_/metadata endpoint."""
|
||||
@@ -138,8 +154,7 @@ class TestPipelineMetadataEndpoint:
|
||||
async def test_get_pipeline_metadata_success(self, quart_test_client):
|
||||
"""GET /api/v1/pipelines/_/metadata returns metadata list."""
|
||||
response = await quart_test_client.get(
|
||||
'/api/v1/pipelines/_/metadata',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines/_/metadata', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -162,10 +177,7 @@ class TestPipelinesListEndpoint:
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_pipelines_success(self, quart_test_client):
|
||||
"""GET /api/v1/pipelines returns pipeline list."""
|
||||
response = await quart_test_client.get(
|
||||
'/api/v1/pipelines',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
response = await quart_test_client.get('/api/v1/pipelines', headers={'Authorization': 'Bearer test_token'})
|
||||
|
||||
assert response.status_code == 200
|
||||
data = await response.get_json()
|
||||
@@ -176,8 +188,7 @@ class TestPipelinesListEndpoint:
|
||||
async def test_get_pipelines_with_sort_param(self, quart_test_client):
|
||||
"""GET pipelines with sort parameter."""
|
||||
response = await quart_test_client.get(
|
||||
'/api/v1/pipelines?sort_by=created_at&sort_order=DESC',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines?sort_by=created_at&sort_order=DESC', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -193,8 +204,7 @@ class TestPipelinesCRUDEndpoints:
|
||||
async def test_get_single_pipeline_success(self, quart_test_client):
|
||||
"""GET /api/v1/pipelines/{uuid} returns pipeline."""
|
||||
response = await quart_test_client.get(
|
||||
'/api/v1/pipelines/test-pipeline-uuid',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -208,7 +218,7 @@ class TestPipelinesCRUDEndpoints:
|
||||
response = await quart_test_client.post(
|
||||
'/api/v1/pipelines',
|
||||
headers={'Authorization': 'Bearer test_token'},
|
||||
json={'name': 'New Pipeline', 'config': {}}
|
||||
json={'name': 'New Pipeline', 'config': {}},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -222,7 +232,7 @@ class TestPipelinesCRUDEndpoints:
|
||||
response = await quart_test_client.put(
|
||||
'/api/v1/pipelines/test-pipeline-uuid',
|
||||
headers={'Authorization': 'Bearer test_token'},
|
||||
json={'name': 'Updated Pipeline'}
|
||||
json={'name': 'Updated Pipeline'},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -233,8 +243,7 @@ class TestPipelinesCRUDEndpoints:
|
||||
async def test_delete_pipeline_success(self, quart_test_client):
|
||||
"""DELETE /api/v1/pipelines/{uuid} deletes pipeline."""
|
||||
response = await quart_test_client.delete(
|
||||
'/api/v1/pipelines/test-pipeline-uuid',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines/test-pipeline-uuid', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -245,8 +254,7 @@ class TestPipelinesCRUDEndpoints:
|
||||
async def test_copy_pipeline_success(self, quart_test_client):
|
||||
"""POST /api/v1/pipelines/{uuid}/copy copies pipeline."""
|
||||
response = await quart_test_client.post(
|
||||
'/api/v1/pipelines/test-pipeline-uuid/copy',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines/test-pipeline-uuid/copy', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
@@ -263,8 +271,7 @@ class TestPipelineExtensionsEndpoint:
|
||||
async def test_get_extensions(self, quart_test_client):
|
||||
"""GET /api/v1/pipelines/{uuid}/extensions."""
|
||||
response = await quart_test_client.get(
|
||||
'/api/v1/pipelines/test-pipeline-uuid/extensions',
|
||||
headers={'Authorization': 'Bearer test_token'}
|
||||
'/api/v1/pipelines/test-pipeline-uuid/extensions', headers={'Authorization': 'Bearer test_token'}
|
||||
)
|
||||
|
||||
# Should return 200 if pipeline found
|
||||
|
||||
0
tests/integration_tests/__init__.py
Normal file
0
tests/integration_tests/__init__.py
Normal file
0
tests/integration_tests/box/__init__.py
Normal file
0
tests/integration_tests/box/__init__.py
Normal file
329
tests/integration_tests/box/test_box_integration.py
Normal file
329
tests/integration_tests/box/test_box_integration.py
Normal file
@@ -0,0 +1,329 @@
|
||||
"""Integration tests for LangBot Box.
|
||||
|
||||
These tests verify the end-to-end behavior of the Box sandbox execution
|
||||
system. Tests decorated with ``requires_container`` need a real container
|
||||
runtime (Podman or Docker) and are skipped otherwise.
|
||||
|
||||
CI only runs ``tests/unit_tests/``, so these tests never execute in the
|
||||
CI pipeline. Run them locally with::
|
||||
|
||||
pytest tests/integration_tests/ -v
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import shutil
|
||||
import socket
|
||||
import subprocess
|
||||
from types import SimpleNamespace
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot.pkg.box.service import BoxService
|
||||
from langbot_plugin.box.backend import BaseSandboxBackend
|
||||
from langbot_plugin.box.client import ActionRPCBoxClient
|
||||
from langbot_plugin.box.errors import BoxBackendUnavailableError
|
||||
from langbot_plugin.box.models import BoxExecutionStatus, BoxNetworkMode, BoxSpec
|
||||
from langbot_plugin.box.runtime import BoxRuntime
|
||||
from langbot_plugin.box.server import BoxServerHandler
|
||||
|
||||
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
|
||||
|
||||
_logger = logging.getLogger('test.box.integration')
|
||||
|
||||
# Default image for integration tests — small and fast to pull.
|
||||
_TEST_IMAGE = 'alpine:latest'
|
||||
|
||||
|
||||
# ── Skip helpers ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _has_container_runtime() -> bool:
|
||||
for cmd in ('podman', 'docker'):
|
||||
if shutil.which(cmd) is None:
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[cmd, 'info'],
|
||||
capture_output=True,
|
||||
timeout=10,
|
||||
)
|
||||
if result.returncode == 0:
|
||||
return True
|
||||
except Exception:
|
||||
continue
|
||||
return False
|
||||
|
||||
|
||||
def _can_open_test_socket() -> bool:
|
||||
try:
|
||||
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
except OSError:
|
||||
return False
|
||||
sock.close()
|
||||
return True
|
||||
|
||||
|
||||
requires_container = pytest.mark.skipif(
|
||||
not _has_container_runtime(),
|
||||
reason='no container runtime (podman/docker) available',
|
||||
)
|
||||
|
||||
requires_socket = pytest.mark.skipif(
|
||||
not _can_open_test_socket(),
|
||||
reason='local test environment does not permit opening TCP sockets',
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class _QueueConnection:
|
||||
"""In-process Connection backed by asyncio Queues — no real IO."""
|
||||
|
||||
def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
|
||||
self._rx = rx
|
||||
self._tx = tx
|
||||
|
||||
async def send(self, message: str) -> None:
|
||||
await self._tx.put(message)
|
||||
|
||||
async def receive(self) -> str:
|
||||
return await self._rx.get()
|
||||
|
||||
async def close(self) -> None:
|
||||
pass
|
||||
|
||||
|
||||
async def _make_rpc_pair(runtime: BoxRuntime):
|
||||
"""Create an in-process (ActionRPCBoxClient, server_task, client_task) connected via queues."""
|
||||
from langbot_plugin.runtime.io.handler import Handler
|
||||
|
||||
c2s: asyncio.Queue[str] = asyncio.Queue()
|
||||
s2c: asyncio.Queue[str] = asyncio.Queue()
|
||||
client_conn = _QueueConnection(rx=s2c, tx=c2s)
|
||||
server_conn = _QueueConnection(rx=c2s, tx=s2c)
|
||||
|
||||
server_handler = BoxServerHandler(server_conn, runtime)
|
||||
server_task = asyncio.create_task(server_handler.run())
|
||||
|
||||
client_handler = Handler.__new__(Handler)
|
||||
Handler.__init__(client_handler, client_conn)
|
||||
client_task = asyncio.create_task(client_handler.run())
|
||||
|
||||
client = ActionRPCBoxClient(logger=_logger)
|
||||
client.set_handler(client_handler)
|
||||
|
||||
return client, server_task, client_task
|
||||
|
||||
|
||||
# ── Fixtures ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def box_client():
|
||||
"""Yield an ActionRPCBoxClient backed by a real BoxRuntime via in-process RPC."""
|
||||
runtime = BoxRuntime(logger=_logger)
|
||||
await runtime.initialize()
|
||||
client, server_task, client_task = await _make_rpc_pair(runtime)
|
||||
yield client
|
||||
server_task.cancel()
|
||||
client_task.cancel()
|
||||
await runtime.shutdown()
|
||||
|
||||
|
||||
# ── 1. Simple command execution ───────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_exec_simple_command(box_client: ActionRPCBoxClient):
|
||||
"""Box starts a simple command and returns stdout."""
|
||||
spec = BoxSpec(
|
||||
cmd='echo hello-box',
|
||||
session_id='int-simple',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
result = await box_client.execute(spec)
|
||||
|
||||
assert result.status == BoxExecutionStatus.COMPLETED
|
||||
assert result.exit_code == 0
|
||||
assert 'hello-box' in result.stdout
|
||||
|
||||
|
||||
# ── 2. Session file persistence ───────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_session_persists_files(box_client: ActionRPCBoxClient):
|
||||
"""Write a file in one exec, read it back in a second exec on the same session."""
|
||||
sid = 'int-persist'
|
||||
|
||||
write_result = await box_client.execute(
|
||||
BoxSpec(
|
||||
cmd='echo "hello from file" > /tmp/testfile.txt',
|
||||
session_id=sid,
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
)
|
||||
assert write_result.exit_code == 0
|
||||
|
||||
read_result = await box_client.execute(
|
||||
BoxSpec(
|
||||
cmd='cat /tmp/testfile.txt',
|
||||
session_id=sid,
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
)
|
||||
assert read_result.exit_code == 0
|
||||
assert 'hello from file' in read_result.stdout
|
||||
|
||||
|
||||
# ── 3. Timeout handling ───────────────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_timeout_kills_command(box_client: ActionRPCBoxClient):
|
||||
"""A long-running command is killed after timeout_sec."""
|
||||
session_id = 'int-timeout'
|
||||
spec = BoxSpec(
|
||||
cmd='sleep 120',
|
||||
session_id=session_id,
|
||||
workdir='/tmp',
|
||||
timeout_sec=3,
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
result = await box_client.execute(spec)
|
||||
|
||||
assert result.status == BoxExecutionStatus.TIMED_OUT
|
||||
assert result.exit_code is None
|
||||
|
||||
sessions = await box_client.get_sessions()
|
||||
assert all(session['session_id'] != session_id for session in sessions)
|
||||
|
||||
|
||||
# ── 4. Network isolation ─────────────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_offline_cannot_reach_network(box_client: ActionRPCBoxClient):
|
||||
"""With network=OFF the sandbox cannot reach the internet."""
|
||||
spec = BoxSpec(
|
||||
cmd='wget -q -O /dev/null --timeout=3 http://1.1.1.1 2>&1; exit $?',
|
||||
session_id='int-offline',
|
||||
workdir='/tmp',
|
||||
network=BoxNetworkMode.OFF,
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
result = await box_client.execute(spec)
|
||||
|
||||
assert result.exit_code != 0
|
||||
|
||||
|
||||
# ── 5. Backend unavailable ───────────────────────────────────────────
|
||||
|
||||
|
||||
class _UnavailableBackend(BaseSandboxBackend):
|
||||
"""A backend that always reports itself as unavailable."""
|
||||
|
||||
name = 'unavailable'
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(logging.getLogger('test'))
|
||||
|
||||
async def is_available(self) -> bool:
|
||||
return False
|
||||
|
||||
async def start_session(self, spec):
|
||||
raise NotImplementedError
|
||||
|
||||
async def exec(self, session, spec):
|
||||
raise NotImplementedError
|
||||
|
||||
async def stop_session(self, session):
|
||||
pass
|
||||
|
||||
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_backend_unavailable_returns_error():
|
||||
"""When no backend is available the full RPC path returns BoxBackendUnavailableError."""
|
||||
runtime = BoxRuntime(logger=_logger, backends=[_UnavailableBackend()])
|
||||
await runtime.initialize()
|
||||
client, server_task, client_task = await _make_rpc_pair(runtime)
|
||||
try:
|
||||
spec = BoxSpec(
|
||||
cmd='echo hello',
|
||||
session_id='int-no-backend',
|
||||
workdir='/tmp',
|
||||
)
|
||||
with pytest.raises(BoxBackendUnavailableError):
|
||||
await client.execute(spec)
|
||||
finally:
|
||||
server_task.cancel()
|
||||
client_task.cancel()
|
||||
await runtime.shutdown()
|
||||
|
||||
|
||||
# ── 6. Full service-to-runtime path ──────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_full_service_to_remote_runtime(tmp_path):
|
||||
"""BoxService -> ActionRPCBoxClient -> RPC -> BoxRuntime -> real backend."""
|
||||
runtime = BoxRuntime(logger=_logger)
|
||||
await runtime.initialize()
|
||||
client, server_task, client_task = await _make_rpc_pair(runtime)
|
||||
try:
|
||||
host_dir = tmp_path / 'workspace'
|
||||
host_dir.mkdir()
|
||||
|
||||
mock_ap = SimpleNamespace(
|
||||
logger=_logger,
|
||||
instance_config=SimpleNamespace(
|
||||
data={
|
||||
'box': {
|
||||
'backend': 'local',
|
||||
'runtime': {'endpoint': ''},
|
||||
'local': {
|
||||
'profile': 'default',
|
||||
'allowed_mount_roots': [str(tmp_path)],
|
||||
'default_workspace': str(host_dir),
|
||||
},
|
||||
'e2b': {'api_key': '', 'api_url': '', 'template': ''},
|
||||
}
|
||||
}
|
||||
),
|
||||
)
|
||||
|
||||
service = BoxService(mock_ap, client=client)
|
||||
await service.initialize()
|
||||
|
||||
query = pipeline_query.Query.model_construct(query_id=42)
|
||||
result = await service.execute_tool(
|
||||
{'command': 'echo service-path'},
|
||||
query,
|
||||
)
|
||||
|
||||
assert result['ok'] is True
|
||||
assert result['status'] == 'completed'
|
||||
assert 'service-path' in result['stdout']
|
||||
assert result['session_id'] == 'query_42'
|
||||
finally:
|
||||
server_task.cancel()
|
||||
client_task.cancel()
|
||||
await runtime.shutdown()
|
||||
368
tests/integration_tests/box/test_box_mcp_integration.py
Normal file
368
tests/integration_tests/box/test_box_mcp_integration.py
Normal file
@@ -0,0 +1,368 @@
|
||||
"""Integration tests for Box MCP-related features.
|
||||
|
||||
These tests verify managed process lifecycle, WebSocket stdio attach,
|
||||
session cleanup, and the single-session query API using a real container
|
||||
runtime.
|
||||
|
||||
CI only runs ``tests/unit_tests/``, so these tests never execute in the
|
||||
CI pipeline. Run them locally with::
|
||||
|
||||
pytest tests/integration_tests/box/test_box_mcp_integration.py -v
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import shutil
|
||||
import socket
|
||||
import subprocess
|
||||
|
||||
import aiohttp
|
||||
import pytest
|
||||
from aiohttp.test_utils import TestServer
|
||||
|
||||
from langbot_plugin.box.client import ActionRPCBoxClient
|
||||
from langbot_plugin.box.errors import BoxManagedProcessNotFoundError, BoxSessionNotFoundError
|
||||
from langbot_plugin.box.models import BoxManagedProcessSpec, BoxManagedProcessStatus, BoxSpec
|
||||
from langbot_plugin.box.runtime import BoxRuntime
|
||||
from langbot_plugin.box.server import BoxServerHandler, create_ws_relay_app
|
||||
|
||||
_logger = logging.getLogger('test.box.mcp_integration')
|
||||
|
||||
_TEST_IMAGE = 'alpine:latest'
|
||||
|
||||
|
||||
# ── Skip helpers ──────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _has_container_runtime() -> bool:
|
||||
for cmd in ('podman', 'docker'):
|
||||
if shutil.which(cmd) is None:
|
||||
continue
|
||||
try:
|
||||
result = subprocess.run([cmd, 'info'], capture_output=True, timeout=10)
|
||||
if result.returncode == 0:
|
||||
return True
|
||||
except Exception:
|
||||
continue
|
||||
return False
|
||||
|
||||
|
||||
def _can_open_test_socket() -> bool:
|
||||
try:
|
||||
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
|
||||
except OSError:
|
||||
return False
|
||||
sock.close()
|
||||
return True
|
||||
|
||||
|
||||
requires_container = pytest.mark.skipif(
|
||||
not _has_container_runtime(),
|
||||
reason='no container runtime (podman/docker) available',
|
||||
)
|
||||
|
||||
requires_socket = pytest.mark.skipif(
|
||||
not _can_open_test_socket(),
|
||||
reason='local test environment does not permit opening TCP sockets',
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class _QueueConnection:
|
||||
"""In-process Connection backed by asyncio Queues — no real IO."""
|
||||
|
||||
def __init__(self, rx: asyncio.Queue[str], tx: asyncio.Queue[str]):
|
||||
self._rx = rx
|
||||
self._tx = tx
|
||||
|
||||
async def send(self, message: str) -> None:
|
||||
await self._tx.put(message)
|
||||
|
||||
async def receive(self) -> str:
|
||||
return await self._rx.get()
|
||||
|
||||
async def close(self) -> None:
|
||||
pass
|
||||
|
||||
|
||||
async def _make_rpc_pair(runtime: BoxRuntime):
|
||||
"""Create an in-process RPC pair connected via queues."""
|
||||
from langbot_plugin.runtime.io.handler import Handler
|
||||
|
||||
c2s: asyncio.Queue[str] = asyncio.Queue()
|
||||
s2c: asyncio.Queue[str] = asyncio.Queue()
|
||||
client_conn = _QueueConnection(rx=s2c, tx=c2s)
|
||||
server_conn = _QueueConnection(rx=c2s, tx=s2c)
|
||||
|
||||
server_handler = BoxServerHandler(server_conn, runtime)
|
||||
server_task = asyncio.create_task(server_handler.run())
|
||||
|
||||
client_handler = Handler.__new__(Handler)
|
||||
Handler.__init__(client_handler, client_conn)
|
||||
client_task = asyncio.create_task(client_handler.run())
|
||||
|
||||
client = ActionRPCBoxClient(logger=_logger)
|
||||
client.set_handler(client_handler)
|
||||
|
||||
return client, server_task, client_task
|
||||
|
||||
|
||||
# ── Fixtures ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def box_server():
|
||||
"""Yield a (ws_relay_url, ActionRPCBoxClient) backed by a real BoxRuntime."""
|
||||
runtime = BoxRuntime(logger=_logger)
|
||||
await runtime.initialize()
|
||||
|
||||
# Start ws relay for managed process attach
|
||||
ws_app = create_ws_relay_app(runtime)
|
||||
ws_server = TestServer(ws_app)
|
||||
await ws_server.start_server()
|
||||
|
||||
client, server_task, client_task = await _make_rpc_pair(runtime)
|
||||
|
||||
ws_relay_url = str(ws_server.make_url(''))
|
||||
yield ws_relay_url, client
|
||||
|
||||
server_task.cancel()
|
||||
client_task.cancel()
|
||||
await runtime.shutdown()
|
||||
await ws_server.close()
|
||||
|
||||
|
||||
# ── 1. Managed process lifecycle ─────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_managed_process_start_and_query(box_server):
|
||||
"""Start a managed process and query its status."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
# Create session
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-lifecycle',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Start a managed process that stays alive
|
||||
proc_spec = BoxManagedProcessSpec(
|
||||
command='sh',
|
||||
args=['-c', 'while true; do sleep 1; done'],
|
||||
cwd='/tmp',
|
||||
)
|
||||
info = await client.start_managed_process('mcp-int-lifecycle', proc_spec)
|
||||
assert info.status == BoxManagedProcessStatus.RUNNING
|
||||
|
||||
# Query it
|
||||
info2 = await client.get_managed_process('mcp-int-lifecycle')
|
||||
assert info2.status == BoxManagedProcessStatus.RUNNING
|
||||
assert info2.command == 'sh'
|
||||
|
||||
# Stop only the managed process while keeping the session available
|
||||
await client.stop_managed_process('mcp-int-lifecycle')
|
||||
with pytest.raises(BoxManagedProcessNotFoundError):
|
||||
await client.get_managed_process('mcp-int-lifecycle')
|
||||
session_info = await client.get_session('mcp-int-lifecycle')
|
||||
assert session_info['session_id'] == 'mcp-int-lifecycle'
|
||||
|
||||
# Cleanup
|
||||
await client.delete_session('mcp-int-lifecycle')
|
||||
|
||||
|
||||
# ── 2. WebSocket stdio attach ────────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_ws_stdio_attach_echo(box_server):
|
||||
"""Attach to a managed process via WebSocket and verify bidirectional IO."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-ws',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Start a cat process (echoes stdin to stdout)
|
||||
proc_spec = BoxManagedProcessSpec(
|
||||
command='cat',
|
||||
args=[],
|
||||
cwd='/tmp',
|
||||
)
|
||||
await client.start_managed_process('mcp-int-ws', proc_spec)
|
||||
|
||||
# Connect via WebSocket (ws relay)
|
||||
ws_url = client.get_managed_process_websocket_url('mcp-int-ws', ws_relay_url)
|
||||
session = aiohttp.ClientSession()
|
||||
try:
|
||||
async with session.ws_connect(ws_url) as ws:
|
||||
# Send a line
|
||||
await ws.send_str('hello from test')
|
||||
|
||||
# Expect to receive it back (cat echoes)
|
||||
msg = await asyncio.wait_for(ws.receive(), timeout=5)
|
||||
assert msg.type == aiohttp.WSMsgType.TEXT
|
||||
assert 'hello from test' in msg.data
|
||||
finally:
|
||||
await session.close()
|
||||
|
||||
await client.delete_session('mcp-int-ws')
|
||||
|
||||
|
||||
# ── 3. Session cleanup removes container ─────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_delete_session_cleans_up(box_server):
|
||||
"""After deleting a session, it should no longer exist."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-cleanup',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Start a process
|
||||
proc_spec = BoxManagedProcessSpec(
|
||||
command='sleep',
|
||||
args=['3600'],
|
||||
cwd='/tmp',
|
||||
)
|
||||
await client.start_managed_process('mcp-int-cleanup', proc_spec)
|
||||
|
||||
# Delete
|
||||
await client.delete_session('mcp-int-cleanup')
|
||||
|
||||
# Session should be gone
|
||||
with pytest.raises(BoxSessionNotFoundError):
|
||||
await client.get_session('mcp-int-cleanup')
|
||||
|
||||
|
||||
# ── 4. GET session details ────────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_get_session_returns_details(box_server):
|
||||
"""Get single session returns session details and managed process info."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-get',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Query without managed process
|
||||
info = await client.get_session('mcp-int-get')
|
||||
assert info['session_id'] == 'mcp-int-get'
|
||||
assert info['image'] == _TEST_IMAGE
|
||||
assert 'managed_process' not in info
|
||||
|
||||
# Start a process and query again
|
||||
proc_spec = BoxManagedProcessSpec(
|
||||
command='sleep',
|
||||
args=['3600'],
|
||||
cwd='/tmp',
|
||||
)
|
||||
await client.start_managed_process('mcp-int-get', proc_spec)
|
||||
|
||||
info2 = await client.get_session('mcp-int-get')
|
||||
assert info2['session_id'] == 'mcp-int-get'
|
||||
assert 'managed_process' in info2
|
||||
assert info2['managed_process']['status'] == BoxManagedProcessStatus.RUNNING.value
|
||||
|
||||
await client.delete_session('mcp-int-get')
|
||||
|
||||
|
||||
# ── 5. Process exit detected ────────────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_process_exit_detected(box_server):
|
||||
"""When a managed process exits, its status should reflect EXITED."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-exit',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Start a process that exits immediately
|
||||
proc_spec = BoxManagedProcessSpec(
|
||||
command='sh',
|
||||
args=['-c', 'echo done && exit 0'],
|
||||
cwd='/tmp',
|
||||
)
|
||||
await client.start_managed_process('mcp-int-exit', proc_spec)
|
||||
|
||||
# Wait a bit for process to exit
|
||||
await asyncio.sleep(2)
|
||||
|
||||
info = await client.get_managed_process('mcp-int-exit')
|
||||
assert info.status == BoxManagedProcessStatus.EXITED
|
||||
assert info.exit_code == 0
|
||||
|
||||
await client.delete_session('mcp-int-exit')
|
||||
|
||||
|
||||
# ── 6. Instance ID orphan cleanup ───────────────────────────────────
|
||||
|
||||
|
||||
@requires_container
|
||||
@requires_socket
|
||||
@pytest.mark.asyncio
|
||||
async def test_orphan_cleanup_preserves_own_containers(box_server):
|
||||
"""Orphan cleanup should not remove containers belonging to the current instance."""
|
||||
ws_relay_url, client = box_server
|
||||
|
||||
# Create a session (container gets current instance ID label)
|
||||
spec = BoxSpec(
|
||||
cmd='',
|
||||
session_id='mcp-int-orphan',
|
||||
workdir='/tmp',
|
||||
image=_TEST_IMAGE,
|
||||
)
|
||||
await client.create_session(spec)
|
||||
|
||||
# Verify session exists
|
||||
sessions = await client.get_sessions()
|
||||
assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
|
||||
|
||||
# Trigger status check (which doesn't clean up own containers)
|
||||
status = await client.get_status()
|
||||
assert status['active_sessions'] >= 1
|
||||
|
||||
# Our session should still exist
|
||||
sessions = await client.get_sessions()
|
||||
assert any(s['session_id'] == 'mcp-int-orphan' for s in sessions)
|
||||
|
||||
await client.delete_session('mcp-int-orphan')
|
||||
106
tests/unit_tests/box/test_box_connector.py
Normal file
106
tests/unit_tests/box/test_box_connector.py
Normal file
@@ -0,0 +1,106 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import Mock
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot_plugin.box.client import ActionRPCBoxClient
|
||||
from langbot.pkg.box.connector import BoxRuntimeConnector
|
||||
|
||||
|
||||
def make_app(logger: Mock, runtime_endpoint: str = ''):
|
||||
return SimpleNamespace(
|
||||
logger=logger,
|
||||
instance_config=SimpleNamespace(
|
||||
data={
|
||||
'box': {
|
||||
'backend': 'local',
|
||||
'runtime': {'endpoint': runtime_endpoint},
|
||||
'local': {
|
||||
'profile': 'default',
|
||||
'allowed_mount_roots': [],
|
||||
'default_workspace': '',
|
||||
},
|
||||
'e2b': {'api_key': '', 'api_url': '', 'template': ''},
|
||||
}
|
||||
}
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def test_box_runtime_connector_stdio_when_no_url(monkeypatch: pytest.MonkeyPatch):
|
||||
"""Without runtime.endpoint, on a non-Docker Unix platform, use stdio."""
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
connector = BoxRuntimeConnector(make_app(Mock()))
|
||||
|
||||
assert connector._uses_websocket() is False
|
||||
assert isinstance(connector.client, ActionRPCBoxClient)
|
||||
|
||||
|
||||
def test_box_runtime_connector_ws_when_url_configured(monkeypatch: pytest.MonkeyPatch):
|
||||
"""With an explicit runtime.endpoint, always use WebSocket."""
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
logger = Mock()
|
||||
connector = BoxRuntimeConnector(make_app(logger, runtime_endpoint='http://box-runtime:5410'))
|
||||
|
||||
assert connector._uses_websocket() is True
|
||||
assert isinstance(connector.client, ActionRPCBoxClient)
|
||||
|
||||
|
||||
def test_box_runtime_connector_ws_in_docker(monkeypatch: pytest.MonkeyPatch):
|
||||
"""Inside Docker (no explicit URL), use WebSocket to reach a sibling container."""
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'docker')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
connector = BoxRuntimeConnector(make_app(Mock()))
|
||||
|
||||
assert connector._uses_websocket() is True
|
||||
assert connector.ws_relay_base_url == 'http://langbot_box:5410'
|
||||
|
||||
|
||||
def test_box_runtime_connector_ws_with_standalone_flag(monkeypatch: pytest.MonkeyPatch):
|
||||
"""With --standalone-box flag, use WebSocket even on a local Unix platform."""
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', True)
|
||||
connector = BoxRuntimeConnector(make_app(Mock()))
|
||||
|
||||
assert connector._uses_websocket() is True
|
||||
|
||||
|
||||
def test_box_runtime_connector_ws_relay_url_default(monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
connector = BoxRuntimeConnector(make_app(Mock()))
|
||||
|
||||
assert connector.ws_relay_base_url == 'http://127.0.0.1:5410'
|
||||
|
||||
|
||||
def test_box_runtime_connector_ws_relay_url_explicit(monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
connector = BoxRuntimeConnector(make_app(Mock(), runtime_endpoint='http://box-runtime:5410'))
|
||||
assert connector.ws_relay_base_url == 'http://box-runtime:5410'
|
||||
|
||||
|
||||
def test_box_runtime_connector_dispose_terminates_subprocess(monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.get_platform', lambda: 'linux')
|
||||
monkeypatch.setattr('langbot.pkg.utils.platform.standalone_box', False)
|
||||
logger = Mock()
|
||||
connector = BoxRuntimeConnector(make_app(logger))
|
||||
subprocess = Mock()
|
||||
subprocess.returncode = None
|
||||
handler_task = Mock()
|
||||
ctrl_task = Mock()
|
||||
connector._subprocess = subprocess
|
||||
connector._handler_task = handler_task
|
||||
connector._ctrl_task = ctrl_task
|
||||
|
||||
connector.dispose()
|
||||
|
||||
subprocess.terminate.assert_called_once()
|
||||
handler_task.cancel.assert_called_once()
|
||||
ctrl_task.cancel.assert_called_once()
|
||||
assert connector._handler_task is None
|
||||
assert connector._ctrl_task is None
|
||||
1392
tests/unit_tests/box/test_box_service.py
Normal file
1392
tests/unit_tests/box/test_box_service.py
Normal file
File diff suppressed because it is too large
Load Diff
147
tests/unit_tests/box/test_workspace.py
Normal file
147
tests/unit_tests/box/test_workspace.py
Normal file
@@ -0,0 +1,147 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import tempfile
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot.pkg.box.workspace import (
|
||||
BoxWorkspaceSession,
|
||||
classify_python_workspace,
|
||||
infer_workspace_host_path,
|
||||
rewrite_mounted_path,
|
||||
wrap_python_command_with_env,
|
||||
)
|
||||
|
||||
|
||||
def test_rewrite_mounted_path_translates_host_prefix():
|
||||
result = rewrite_mounted_path('/tmp/demo/project/app.py', '/tmp/demo/project')
|
||||
assert result == '/workspace/app.py'
|
||||
|
||||
|
||||
def test_infer_workspace_host_path_unwraps_virtualenv_bin_dir():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
project_root = os.path.join(tmpdir, 'project')
|
||||
os.makedirs(os.path.join(project_root, '.venv', 'bin'))
|
||||
python_bin = os.path.join(project_root, '.venv', 'bin', 'python')
|
||||
script = os.path.join(project_root, 'server.py')
|
||||
|
||||
with open(python_bin, 'w', encoding='utf-8') as handle:
|
||||
handle.write('')
|
||||
with open(script, 'w', encoding='utf-8') as handle:
|
||||
handle.write('print("ok")\n')
|
||||
|
||||
result = infer_workspace_host_path(python_bin, [script])
|
||||
|
||||
assert result == os.path.realpath(project_root)
|
||||
|
||||
|
||||
def test_classify_python_workspace_detects_package_and_requirements():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
assert classify_python_workspace(tmpdir) is None
|
||||
|
||||
with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as handle:
|
||||
handle.write('requests\n')
|
||||
assert classify_python_workspace(tmpdir) == 'requirements'
|
||||
|
||||
with open(os.path.join(tmpdir, 'pyproject.toml'), 'w', encoding='utf-8') as handle:
|
||||
handle.write('[project]\nname = "demo"\n')
|
||||
assert classify_python_workspace(tmpdir) == 'package'
|
||||
|
||||
|
||||
def test_wrap_python_command_with_env_contains_bootstrap_and_command():
|
||||
command = wrap_python_command_with_env('python script.py')
|
||||
|
||||
assert 'python -m venv "$_LB_VENV_DIR"' in command
|
||||
assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
|
||||
assert command.rstrip().endswith('python script.py')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_workspace_session_execute_for_query_uses_session_payload():
|
||||
box_service = SimpleNamespace(execute_spec_payload=AsyncMock(return_value={'ok': True}))
|
||||
workspace = BoxWorkspaceSession(
|
||||
box_service,
|
||||
'skill-person_123-demo',
|
||||
host_path='/tmp/project',
|
||||
host_path_mode='rw',
|
||||
env={'FOO': 'bar'},
|
||||
)
|
||||
|
||||
query = SimpleNamespace(query_id='q1')
|
||||
result = await workspace.execute_for_query(query, 'python run.py', workdir='/workspace', timeout_sec=30)
|
||||
|
||||
assert result == {'ok': True}
|
||||
payload = box_service.execute_spec_payload.await_args.args[0]
|
||||
assert payload == {
|
||||
'session_id': 'skill-person_123-demo',
|
||||
'workdir': '/workspace',
|
||||
'env': {'FOO': 'bar'},
|
||||
'persistent': False,
|
||||
'host_path': '/tmp/project',
|
||||
'host_path_mode': 'rw',
|
||||
'cmd': 'python run.py',
|
||||
'timeout_sec': 30,
|
||||
}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_workspace_session_start_managed_process_rewrites_command_and_args():
|
||||
box_service = SimpleNamespace(start_managed_process=AsyncMock(return_value={'status': 'running'}))
|
||||
workspace = BoxWorkspaceSession(
|
||||
box_service,
|
||||
'mcp-u1',
|
||||
host_path='/tmp/project',
|
||||
host_path_mode='ro',
|
||||
)
|
||||
|
||||
result = await workspace.start_managed_process(
|
||||
'/tmp/project/.venv/bin/python',
|
||||
['/tmp/project/server.py', '--config', '/tmp/project/config.json'],
|
||||
env={'TOKEN': '1'},
|
||||
)
|
||||
|
||||
assert result == {'status': 'running'}
|
||||
session_id = box_service.start_managed_process.await_args.args[0]
|
||||
payload = box_service.start_managed_process.await_args.args[1]
|
||||
assert session_id == 'mcp-u1'
|
||||
assert payload == {
|
||||
'command': 'python',
|
||||
'args': ['/workspace/server.py', '--config', '/workspace/config.json'],
|
||||
'env': {'TOKEN': '1'},
|
||||
'cwd': '/workspace',
|
||||
'process_id': 'default',
|
||||
}
|
||||
|
||||
|
||||
def test_workspace_session_build_session_payload_keeps_generic_workspace_shape():
|
||||
workspace = BoxWorkspaceSession(
|
||||
Mock(),
|
||||
'workspace-1',
|
||||
host_path='/tmp/project',
|
||||
host_path_mode='rw',
|
||||
env={'FOO': 'bar'},
|
||||
network='on',
|
||||
read_only_rootfs=False,
|
||||
image='python:3.11',
|
||||
cpus=1.0,
|
||||
memory_mb=512,
|
||||
pids_limit=128,
|
||||
)
|
||||
|
||||
assert workspace.build_session_payload() == {
|
||||
'session_id': 'workspace-1',
|
||||
'workdir': '/workspace',
|
||||
'env': {'FOO': 'bar'},
|
||||
'persistent': False,
|
||||
'network': 'on',
|
||||
'read_only_rootfs': False,
|
||||
'host_path': '/tmp/project',
|
||||
'host_path_mode': 'rw',
|
||||
'image': 'python:3.11',
|
||||
'cpus': 1.0,
|
||||
'memory_mb': 512,
|
||||
'pids_limit': 128,
|
||||
}
|
||||
@@ -12,6 +12,12 @@ from __future__ import annotations
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
# Preload pipelinemgr so the pipeline.stage module is fully initialised before
|
||||
# any individual stage test (e.g. preproc, longtext) tries to import it. Without
|
||||
# this, running a stage test in isolation triggers a circular-import error:
|
||||
# stage.py → core.app → pipelinemgr → stage.stage_class (not yet bound).
|
||||
import langbot.pkg.pipeline.pipelinemgr # noqa: F401
|
||||
|
||||
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
|
||||
import langbot_plugin.api.entities.builtin.platform.message as platform_message
|
||||
import langbot_plugin.api.entities.builtin.platform.events as platform_events
|
||||
@@ -34,6 +40,9 @@ class MockApplication:
|
||||
self.query_pool = self._create_mock_query_pool()
|
||||
self.instance_config = self._create_mock_instance_config()
|
||||
self.task_mgr = self._create_mock_task_manager()
|
||||
# Skill manager is optional; PreProcessor only touches it for the
|
||||
# local-agent runner. None keeps the skill-binding branch inert.
|
||||
self.skill_mgr = None
|
||||
|
||||
def _create_mock_logger(self):
|
||||
logger = Mock()
|
||||
|
||||
78
tests/unit_tests/pipeline/test_chat_handler_logging.py
Normal file
78
tests/unit_tests/pipeline/test_chat_handler_logging.py
Normal file
@@ -0,0 +1,78 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import Mock
|
||||
|
||||
import pytest
|
||||
import langbot_plugin.api.entities.builtin.provider.message as provider_message
|
||||
|
||||
# TODO: unskip once the handler ↔ app circular import is resolved
|
||||
pytest.skip(
|
||||
'circular import in handler ↔ app; will be unblocked once resolved',
|
||||
allow_module_level=True,
|
||||
)
|
||||
|
||||
from langbot.pkg.pipeline.process.handler import MessageHandler # noqa: E402
|
||||
|
||||
|
||||
class _StubHandler(MessageHandler):
|
||||
async def handle(self, query):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
handler = _StubHandler(ap=Mock())
|
||||
|
||||
|
||||
def test_chat_handler_formats_tool_call_request_log():
|
||||
result = provider_message.Message(
|
||||
role='assistant',
|
||||
content='',
|
||||
tool_calls=[
|
||||
provider_message.ToolCall(
|
||||
id='call-1',
|
||||
type='function',
|
||||
function=provider_message.FunctionCall(name='exec', arguments='{}'),
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
summary = handler.format_result_log(result)
|
||||
|
||||
assert summary == 'assistant: requested tools: exec'
|
||||
|
||||
|
||||
def test_chat_handler_formats_tool_result_log():
|
||||
result = provider_message.Message(
|
||||
role='tool',
|
||||
content='{"status":"completed","exit_code":0,"backend":"podman","stdout":"42\\n"}',
|
||||
tool_call_id='call-1',
|
||||
)
|
||||
|
||||
summary = handler.format_result_log(result)
|
||||
|
||||
# Tool results use generic cut_str truncation
|
||||
assert summary is not None
|
||||
assert summary.startswith('tool: {"status":"com')
|
||||
assert summary.endswith('...')
|
||||
|
||||
|
||||
def test_chat_handler_formats_tool_error_log():
|
||||
result = provider_message.MessageChunk(
|
||||
role='tool',
|
||||
content='err: host_path must point to an existing directory on the host',
|
||||
tool_call_id='call-1',
|
||||
is_final=True,
|
||||
)
|
||||
|
||||
summary = handler.format_result_log(result)
|
||||
|
||||
assert summary is not None
|
||||
assert summary.startswith('tool error: err: host_path must')
|
||||
assert summary.endswith('...')
|
||||
|
||||
|
||||
def test_chat_handler_skips_empty_assistant_log():
|
||||
result = provider_message.Message(role='assistant', content='')
|
||||
|
||||
summary = handler.format_result_log(result)
|
||||
|
||||
assert summary is None
|
||||
@@ -14,33 +14,43 @@ import json
|
||||
import sys
|
||||
from unittest.mock import AsyncMock, MagicMock, Mock, patch
|
||||
|
||||
# Break the circular import chain before importing n8nsvapi:
|
||||
import pytest
|
||||
import langbot_plugin.api.entities.builtin.provider.message as provider_message
|
||||
|
||||
# Break the circular import chain while importing n8nsvapi:
|
||||
# n8nsvapi → runner → app → pipelinemgr → all runners → runner (partially init)
|
||||
_mock_runner = MagicMock()
|
||||
_mock_runner.runner_class = lambda name: (lambda cls: cls) # no-op decorator
|
||||
_mock_runner.RequestRunner = object
|
||||
_mocked_imports = {
|
||||
'langbot.pkg.provider.runner': _mock_runner,
|
||||
# The stubs are restored in a ``finally`` block so this module does NOT pollute
|
||||
# sys.modules for other test modules (e.g. ones importing the real
|
||||
# LocalAgentRunner, which would otherwise inherit ``object`` and break).
|
||||
# Mirrors master's intent but uses try/finally so a raised import doesn't
|
||||
# leave the global namespace in a stubbed state, and includes
|
||||
# ``langbot.pkg.utils.httpclient`` which master didn't stub.
|
||||
_runner_stub = MagicMock()
|
||||
_runner_stub.runner_class = lambda name: (lambda cls: cls) # no-op decorator
|
||||
_runner_stub.RequestRunner = object
|
||||
_import_stubs = {
|
||||
'langbot.pkg.provider.runner': _runner_stub,
|
||||
'langbot.pkg.core.app': MagicMock(),
|
||||
'langbot.pkg.utils.httpclient': MagicMock(),
|
||||
}
|
||||
_original_imports = {name: sys.modules.get(name) for name in _mocked_imports}
|
||||
sys.modules.update(_mocked_imports)
|
||||
|
||||
import pytest # noqa: E402
|
||||
import langbot_plugin.api.entities.builtin.provider.message as provider_message # noqa: E402
|
||||
from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner # noqa: E402
|
||||
|
||||
for _name, _original in _original_imports.items():
|
||||
if _original is None:
|
||||
sys.modules.pop(_name, None)
|
||||
else:
|
||||
sys.modules[_name] = _original
|
||||
_saved_modules = {name: sys.modules.get(name) for name in _import_stubs}
|
||||
for _name, _stub in _import_stubs.items():
|
||||
sys.modules[_name] = _stub
|
||||
try:
|
||||
from langbot.pkg.provider.runners.n8nsvapi import N8nServiceAPIRunner
|
||||
finally:
|
||||
for _name, _original in _saved_modules.items():
|
||||
if _original is None:
|
||||
sys.modules.pop(_name, None)
|
||||
else:
|
||||
sys.modules[_name] = _original
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def make_runner(output_key: str = 'response') -> N8nServiceAPIRunner:
|
||||
ap = Mock()
|
||||
ap.logger = Mock()
|
||||
@@ -83,6 +93,7 @@ async def collect_chunks(runner: N8nServiceAPIRunner, chunks: list[bytes | str])
|
||||
# _process_response: stream format (type:item/end)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_stream_format_single_item():
|
||||
"""Single item + end in one chunk yields final chunk with full content."""
|
||||
@@ -165,6 +176,7 @@ async def test_stream_format_no_spurious_empty_yield():
|
||||
# _process_response: plain JSON fallback
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_plain_json_with_output_key():
|
||||
"""Plain JSON with matching output_key extracts value via output_key."""
|
||||
@@ -235,6 +247,7 @@ async def test_invalid_json_returns_raw_text():
|
||||
# _call_webhook: output type depends on is_stream
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def make_query(is_stream: bool):
|
||||
"""Build a minimal Query mock."""
|
||||
query = Mock()
|
||||
|
||||
242
tests/unit_tests/provider/test_localagent_sandbox_exec.py
Normal file
242
tests/unit_tests/provider/test_localagent_sandbox_exec.py
Normal file
@@ -0,0 +1,242 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
import langbot_plugin.api.entities.builtin.pipeline.query as pipeline_query
|
||||
import langbot_plugin.api.entities.builtin.provider.message as provider_message
|
||||
import langbot_plugin.api.entities.builtin.provider.session as provider_session
|
||||
|
||||
from langbot.pkg.provider.runners.localagent import LocalAgentRunner
|
||||
|
||||
|
||||
class RecordingProvider:
|
||||
def __init__(self):
|
||||
self.requests: list[dict] = []
|
||||
|
||||
async def invoke_llm(self, query, model, messages, funcs, extra_args=None, remove_think=None):
|
||||
self.requests.append(
|
||||
{
|
||||
'messages': list(messages),
|
||||
'funcs': list(funcs),
|
||||
'remove_think': remove_think,
|
||||
}
|
||||
)
|
||||
|
||||
if len(self.requests) == 1:
|
||||
return provider_message.Message(
|
||||
role='assistant',
|
||||
content='Let me calculate that exactly.',
|
||||
tool_calls=[
|
||||
provider_message.ToolCall(
|
||||
id='call-1',
|
||||
type='function',
|
||||
function=provider_message.FunctionCall(
|
||||
name='exec',
|
||||
arguments=json.dumps(
|
||||
{'command': ("python - <<'PY'\nnums = [1, 2, 3, 4]\nprint(sum(nums) / len(nums))\nPY")}
|
||||
),
|
||||
),
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
tool_result = json.loads(messages[-1].content)
|
||||
return provider_message.Message(
|
||||
role='assistant',
|
||||
content=f'The average is {tool_result["stdout"]}.',
|
||||
)
|
||||
|
||||
|
||||
class RecordingStreamProvider:
|
||||
def __init__(self):
|
||||
self.stream_requests: list[dict] = []
|
||||
|
||||
def invoke_llm_stream(self, query, model, messages, funcs, extra_args=None, remove_think=None):
|
||||
self.stream_requests.append(
|
||||
{
|
||||
'messages': list(messages),
|
||||
'funcs': list(funcs),
|
||||
'remove_think': remove_think,
|
||||
}
|
||||
)
|
||||
|
||||
async def _stream():
|
||||
if len(self.stream_requests) == 1:
|
||||
yield provider_message.MessageChunk(
|
||||
role='assistant',
|
||||
tool_calls=[
|
||||
provider_message.ToolCall(
|
||||
id='call-1',
|
||||
type='function',
|
||||
function=provider_message.FunctionCall(
|
||||
name='exec',
|
||||
arguments=json.dumps({'command': "python -c 'print(1)'"}),
|
||||
),
|
||||
)
|
||||
],
|
||||
is_final=True,
|
||||
)
|
||||
return
|
||||
|
||||
yield provider_message.MessageChunk(
|
||||
role='assistant',
|
||||
content='Tool execution failed.',
|
||||
is_final=True,
|
||||
)
|
||||
|
||||
return _stream()
|
||||
|
||||
|
||||
def make_query() -> pipeline_query.Query:
|
||||
adapter = AsyncMock()
|
||||
adapter.is_stream_output_supported = AsyncMock(return_value=False)
|
||||
|
||||
return pipeline_query.Query.model_construct(
|
||||
query_id='avg-query',
|
||||
launcher_type=provider_session.LauncherTypes.PERSON,
|
||||
launcher_id=12345,
|
||||
sender_id=12345,
|
||||
message_chain=[],
|
||||
message_event=None,
|
||||
adapter=adapter,
|
||||
pipeline_uuid='pipeline-uuid',
|
||||
bot_uuid='bot-uuid',
|
||||
pipeline_config={
|
||||
'ai': {
|
||||
'runner': {'runner': 'local-agent'},
|
||||
'local-agent': {'model': {'primary': 'test-model-uuid', 'fallbacks': []}, 'prompt': 'test-prompt'},
|
||||
},
|
||||
'output': {'misc': {'remove-think': False}},
|
||||
},
|
||||
prompt=SimpleNamespace(messages=[]),
|
||||
messages=[],
|
||||
user_message=provider_message.Message(
|
||||
role='user',
|
||||
content='Please calculate the average of 1, 2, 3, and 4.',
|
||||
),
|
||||
use_funcs=[SimpleNamespace(name='exec')],
|
||||
use_llm_model_uuid='test-model-uuid',
|
||||
variables={},
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_localagent_uses_exec_for_exact_calculation():
|
||||
provider = RecordingProvider()
|
||||
model = SimpleNamespace(
|
||||
provider=provider,
|
||||
model_entity=SimpleNamespace(
|
||||
uuid='test-model-uuid',
|
||||
name='test-model',
|
||||
abilities=['func_call'],
|
||||
extra_args={},
|
||||
),
|
||||
)
|
||||
|
||||
tool_manager = SimpleNamespace(
|
||||
execute_func_call=AsyncMock(
|
||||
return_value={
|
||||
'session_id': 'avg-query',
|
||||
'backend': 'podman',
|
||||
'status': 'completed',
|
||||
'ok': True,
|
||||
'exit_code': 0,
|
||||
'stdout': '2.5',
|
||||
'stderr': '',
|
||||
'duration_ms': 18,
|
||||
}
|
||||
)
|
||||
)
|
||||
|
||||
app = SimpleNamespace(
|
||||
logger=Mock(),
|
||||
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
|
||||
tool_mgr=tool_manager,
|
||||
rag_mgr=SimpleNamespace(),
|
||||
box_service=SimpleNamespace(
|
||||
get_system_guidance=Mock(
|
||||
return_value=(
|
||||
'When the exec tool is available, use it for exact calculations, statistics, '
|
||||
'structured data parsing, and code execution instead of estimating mentally. '
|
||||
'Unless the user explicitly asks for the script, code, or implementation details, '
|
||||
'do not include the generated script in the final answer. '
|
||||
'A default workspace is mounted at /workspace for file tasks.'
|
||||
)
|
||||
),
|
||||
),
|
||||
skill_mgr=SimpleNamespace(
|
||||
get_skills_for_pipeline=AsyncMock(return_value=[]),
|
||||
detect_skill_activation=AsyncMock(return_value=None),
|
||||
build_activation_prompt=Mock(return_value=None),
|
||||
),
|
||||
)
|
||||
|
||||
runner = LocalAgentRunner(app, pipeline_config={})
|
||||
query = make_query()
|
||||
|
||||
results = [message async for message in runner.run(query)]
|
||||
|
||||
assert [message.role for message in results] == ['assistant', 'tool', 'assistant']
|
||||
assert results[-1].content == 'The average is 2.5.'
|
||||
|
||||
tool_manager.execute_func_call.assert_awaited_once()
|
||||
tool_name, tool_parameters = tool_manager.execute_func_call.await_args.args[:2]
|
||||
assert tool_name == 'exec'
|
||||
assert 'print(sum(nums) / len(nums))' in tool_parameters['command']
|
||||
|
||||
first_request = provider.requests[0]
|
||||
assert any(
|
||||
message.role == 'system'
|
||||
and 'exec' in str(message.content)
|
||||
and 'exact calculations' in str(message.content)
|
||||
and 'Unless the user explicitly asks for the script' in str(message.content)
|
||||
and '/workspace' in str(message.content)
|
||||
for message in first_request['messages']
|
||||
)
|
||||
assert [tool.name for tool in first_request['funcs']] == ['exec']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_localagent_streaming_tool_error_yields_message_chunks():
|
||||
provider = RecordingStreamProvider()
|
||||
model = SimpleNamespace(
|
||||
provider=provider,
|
||||
model_entity=SimpleNamespace(
|
||||
uuid='test-model-uuid',
|
||||
name='test-model',
|
||||
abilities=['func_call'],
|
||||
extra_args={},
|
||||
),
|
||||
)
|
||||
|
||||
adapter = AsyncMock()
|
||||
adapter.is_stream_output_supported = AsyncMock(return_value=True)
|
||||
|
||||
query = make_query()
|
||||
query.adapter = adapter
|
||||
|
||||
app = SimpleNamespace(
|
||||
logger=Mock(),
|
||||
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
|
||||
tool_mgr=SimpleNamespace(execute_func_call=AsyncMock(side_effect=RuntimeError('boom'))),
|
||||
rag_mgr=SimpleNamespace(),
|
||||
box_service=SimpleNamespace(
|
||||
get_system_guidance=Mock(return_value='sandbox guidance'),
|
||||
),
|
||||
skill_mgr=SimpleNamespace(
|
||||
get_skills_for_pipeline=AsyncMock(return_value=[]),
|
||||
detect_skill_activation=AsyncMock(return_value=None),
|
||||
build_activation_prompt=Mock(return_value=None),
|
||||
),
|
||||
)
|
||||
|
||||
runner = LocalAgentRunner(app, pipeline_config={})
|
||||
|
||||
results = [message async for message in runner.run(query)]
|
||||
|
||||
assert all(isinstance(message, provider_message.MessageChunk) for message in results)
|
||||
assert any(message.role == 'tool' and message.content == 'err: boom' for message in results)
|
||||
712
tests/unit_tests/provider/test_mcp_box_integration.py
Normal file
712
tests/unit_tests/provider/test_mcp_box_integration.py
Normal file
@@ -0,0 +1,712 @@
|
||||
"""Tests for MCP Box integration: path rewriting, host_path inference, config model, payloads.
|
||||
|
||||
Uses importlib.util.spec_from_file_location to load mcp.py directly without
|
||||
triggering the circular import chain through the app module.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import importlib.util
|
||||
import os
|
||||
import sys
|
||||
import tempfile
|
||||
import types
|
||||
from contextlib import asynccontextmanager
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Load mcp.py directly from file path, with stub dependencies
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _stub_module(fqn: str, attrs: dict | None = None, is_package: bool = False):
|
||||
"""Create or return a stub module and register it in sys.modules."""
|
||||
if fqn in sys.modules:
|
||||
mod = sys.modules[fqn]
|
||||
else:
|
||||
mod = types.ModuleType(fqn)
|
||||
mod.__spec__ = importlib.machinery.ModuleSpec(fqn, None, is_package=is_package)
|
||||
if is_package:
|
||||
mod.__path__ = []
|
||||
sys.modules[fqn] = mod
|
||||
parts = fqn.rsplit('.', 1)
|
||||
if len(parts) == 2 and parts[0] in sys.modules:
|
||||
setattr(sys.modules[parts[0]], parts[1], mod)
|
||||
if attrs:
|
||||
for k, v in attrs.items():
|
||||
setattr(mod, k, v)
|
||||
return mod
|
||||
|
||||
|
||||
@pytest.fixture(scope='module', autouse=True)
|
||||
def mcp_module():
|
||||
"""Load mcp.py with minimal stubs to avoid circular imports."""
|
||||
saved = {}
|
||||
|
||||
def _save_and_stub(name, attrs=None, is_package=False):
|
||||
saved[name] = sys.modules.get(name)
|
||||
# Don't overwrite modules that already exist (from other test modules)
|
||||
if name in sys.modules:
|
||||
return
|
||||
_stub_module(name, attrs, is_package)
|
||||
|
||||
# Stub entire dependency chains as packages / modules
|
||||
_save_and_stub('langbot_plugin', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api.entities', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api.entities.events', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api.entities.events.pipeline_query', {})
|
||||
_save_and_stub('langbot_plugin.api.entities.builtin', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api.entities.builtin.resource', is_package=True)
|
||||
_save_and_stub(
|
||||
'langbot_plugin.api.entities.builtin.resource.tool',
|
||||
{
|
||||
'LLMTool': type('LLMTool', (), {}),
|
||||
},
|
||||
)
|
||||
_save_and_stub('langbot_plugin.api.entities.builtin.provider', is_package=True)
|
||||
_save_and_stub('langbot_plugin.api.entities.builtin.provider.message', {})
|
||||
_save_and_stub('sqlalchemy', {'select': Mock()})
|
||||
_save_and_stub('httpx', {'AsyncClient': Mock()})
|
||||
_save_and_stub('mcp', {'ClientSession': Mock, 'StdioServerParameters': Mock}, is_package=True)
|
||||
_save_and_stub('mcp.client', is_package=True)
|
||||
_save_and_stub('mcp.client.stdio', {'stdio_client': Mock()})
|
||||
_save_and_stub('mcp.client.sse', {'sse_client': Mock()})
|
||||
_save_and_stub('mcp.client.streamable_http', {'streamable_http_client': Mock()})
|
||||
_save_and_stub('mcp.client.websocket', {'websocket_client': Mock()})
|
||||
|
||||
# Stub the provider.tools.loader (source of circular import)
|
||||
_save_and_stub('langbot', is_package=True)
|
||||
_save_and_stub('langbot.pkg', is_package=True)
|
||||
_save_and_stub('langbot.pkg.provider', is_package=True)
|
||||
_save_and_stub('langbot.pkg.provider.tools', is_package=True)
|
||||
_save_and_stub(
|
||||
'langbot.pkg.provider.tools.loader',
|
||||
{
|
||||
'ToolLoader': type('ToolLoader', (), {'__init__': lambda self, ap: None}),
|
||||
},
|
||||
)
|
||||
_save_and_stub('langbot.pkg.provider.tools.loaders', is_package=True)
|
||||
_save_and_stub('langbot.pkg.core', is_package=True)
|
||||
_save_and_stub('langbot.pkg.core.app', {'Application': type('Application', (), {})})
|
||||
_save_and_stub('langbot.pkg.entity', is_package=True)
|
||||
_save_and_stub('langbot.pkg.entity.persistence', is_package=True)
|
||||
_save_and_stub('langbot.pkg.entity.persistence.mcp', {})
|
||||
|
||||
# box models
|
||||
import enum as _enum
|
||||
|
||||
class _BPS(str, _enum.Enum):
|
||||
RUNNING = 'running'
|
||||
EXITED = 'exited'
|
||||
|
||||
_save_and_stub('langbot_plugin.box', is_package=True)
|
||||
_save_and_stub('langbot_plugin.box.models', {'BoxManagedProcessStatus': _BPS})
|
||||
|
||||
# Now load mcp.py via spec_from_file_location
|
||||
mod_fqn = 'langbot.pkg.provider.tools.loaders.mcp'
|
||||
sys.modules.pop(mod_fqn, None)
|
||||
mcp_path = os.path.join(
|
||||
os.path.dirname(__file__),
|
||||
'..',
|
||||
'..',
|
||||
'..',
|
||||
'src',
|
||||
'langbot',
|
||||
'pkg',
|
||||
'provider',
|
||||
'tools',
|
||||
'loaders',
|
||||
'mcp.py',
|
||||
)
|
||||
mcp_path = os.path.normpath(mcp_path)
|
||||
pkg_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(mcp_path))))
|
||||
sys.modules['langbot.pkg'].__path__ = [pkg_root]
|
||||
sys.modules['langbot.pkg.provider.tools.loaders'].__path__ = [os.path.dirname(mcp_path)]
|
||||
spec = importlib.util.spec_from_file_location(mod_fqn, mcp_path)
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
sys.modules[mod_fqn] = mod
|
||||
spec.loader.exec_module(mod)
|
||||
|
||||
yield mod
|
||||
|
||||
# Cleanup
|
||||
sys.modules.pop(mod_fqn, None)
|
||||
sys.modules.pop('langbot.pkg.provider.tools.loaders.mcp_stdio', None)
|
||||
sys.modules.pop('langbot.pkg.box.workspace', None)
|
||||
for name in reversed(list(saved)):
|
||||
if saved[name] is None:
|
||||
sys.modules.pop(name, None)
|
||||
else:
|
||||
sys.modules[name] = saved[name]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_ap():
|
||||
ap = Mock()
|
||||
ap.logger = Mock()
|
||||
ap.box_service = Mock()
|
||||
return ap
|
||||
|
||||
|
||||
def _make_session(mcp_module, server_config: dict, ap=None):
|
||||
if ap is None:
|
||||
ap = _make_ap()
|
||||
return mcp_module.RuntimeMCPSession(
|
||||
server_name=server_config.get('name', 'test-server'),
|
||||
server_config=server_config,
|
||||
enable=True,
|
||||
ap=ap,
|
||||
)
|
||||
|
||||
|
||||
# ── MCPServerBoxConfig ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestMCPServerBoxConfig:
|
||||
def test_default_values(self, mcp_module):
|
||||
cfg = mcp_module.MCPServerBoxConfig.model_validate({})
|
||||
assert cfg.image is None
|
||||
assert cfg.network == 'on'
|
||||
assert cfg.host_path is None
|
||||
assert cfg.host_path_mode == 'ro'
|
||||
assert cfg.env == {}
|
||||
assert cfg.startup_timeout_sec == 120
|
||||
assert cfg.cpus is None
|
||||
assert cfg.memory_mb is None
|
||||
assert cfg.pids_limit is None
|
||||
assert cfg.read_only_rootfs is None
|
||||
|
||||
def test_custom_values(self, mcp_module):
|
||||
cfg = mcp_module.MCPServerBoxConfig.model_validate(
|
||||
{
|
||||
'image': 'node:20',
|
||||
'network': 'on',
|
||||
'host_path': '/home/user/mcp',
|
||||
'host_path_mode': 'rw',
|
||||
'env': {'FOO': 'bar'},
|
||||
'startup_timeout_sec': 60,
|
||||
'cpus': 2.0,
|
||||
'memory_mb': 1024,
|
||||
'pids_limit': 256,
|
||||
'read_only_rootfs': False,
|
||||
}
|
||||
)
|
||||
assert cfg.image == 'node:20'
|
||||
assert cfg.network == 'on'
|
||||
assert cfg.cpus == 2.0
|
||||
assert cfg.memory_mb == 1024
|
||||
|
||||
def test_extra_fields_ignored(self, mcp_module):
|
||||
cfg = mcp_module.MCPServerBoxConfig.model_validate(
|
||||
{
|
||||
'image': 'node:20',
|
||||
'unknown_field': 'whatever',
|
||||
}
|
||||
)
|
||||
assert cfg.image == 'node:20'
|
||||
assert not hasattr(cfg, 'unknown_field')
|
||||
|
||||
|
||||
# ── Path Rewriting ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRewritePath:
|
||||
def test_no_host_path_returns_unchanged(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
assert s._rewrite_path('/some/path', None) == '/some/path'
|
||||
|
||||
def test_empty_path_returns_empty(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
assert s._rewrite_path('', '/home/user/mcp') == ''
|
||||
|
||||
def test_prefix_match_rewrites(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
result = s._rewrite_path('/home/user/mcp/server.py', '/home/user/mcp')
|
||||
assert result == '/workspace/server.py'
|
||||
|
||||
def test_exact_match_rewrites_to_workspace(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
result = s._rewrite_path('/home/user/mcp', '/home/user/mcp')
|
||||
assert result == '/workspace'
|
||||
|
||||
def test_non_matching_path_unchanged(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
result = s._rewrite_path('/opt/other/server.py', '/home/user/mcp')
|
||||
assert result == '/opt/other/server.py'
|
||||
|
||||
def test_similar_prefix_not_rewritten(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
result = s._rewrite_path('/home/user/mcp-other/file.py', '/home/user/mcp')
|
||||
assert result == '/home/user/mcp-other/file.py'
|
||||
|
||||
def test_nested_subpath_rewrites(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
result = s._rewrite_path('/home/user/mcp/src/lib/main.py', '/home/user/mcp')
|
||||
assert result == '/workspace/src/lib/main.py'
|
||||
|
||||
|
||||
# ── host_path Inference ─────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestInferHostPath:
|
||||
def test_no_absolute_paths_returns_none(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': ['server.py'],
|
||||
},
|
||||
)
|
||||
assert s._infer_host_path() is None
|
||||
|
||||
def test_nonexistent_path_returns_none(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': '/nonexistent/path/to/python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
assert s._infer_host_path() is None
|
||||
|
||||
def test_existing_absolute_path_infers_directory(self, mcp_module):
|
||||
with tempfile.NamedTemporaryFile(suffix='.py') as f:
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [f.name],
|
||||
},
|
||||
)
|
||||
result = s._infer_host_path()
|
||||
assert result is not None
|
||||
assert result == os.path.dirname(os.path.realpath(f.name))
|
||||
|
||||
|
||||
# ── Build Box Session Payload ───────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildBoxSessionPayload:
|
||||
def test_minimal_config(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
payload = s._build_box_session_payload('session-123')
|
||||
assert payload['session_id'] == 'session-123'
|
||||
assert payload['workdir'] == '/workspace'
|
||||
assert payload['env'] == {}
|
||||
assert 'host_path' not in payload
|
||||
|
||||
def test_with_host_path(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
'box': {'host_path': '/home/user/mcp', 'host_path_mode': 'ro'},
|
||||
},
|
||||
)
|
||||
payload = s._build_box_session_payload('session-123')
|
||||
assert payload['host_path'] == '/home/user/mcp'
|
||||
assert payload['host_path_mode'] == 'ro'
|
||||
|
||||
def test_optional_fields_included_when_set(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
'box': {'image': 'node:20', 'cpus': 2.0, 'memory_mb': 1024, 'pids_limit': 256},
|
||||
},
|
||||
)
|
||||
payload = s._build_box_session_payload('session-123')
|
||||
assert payload['image'] == 'node:20'
|
||||
assert payload['cpus'] == 2.0
|
||||
assert payload['memory_mb'] == 1024
|
||||
assert payload['pids_limit'] == 256
|
||||
|
||||
def test_none_fields_excluded(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
payload = s._build_box_session_payload('session-123')
|
||||
assert 'image' not in payload
|
||||
assert 'cpus' not in payload
|
||||
|
||||
|
||||
# ── Build Box Process Payload ───────────────────────────────────────
|
||||
|
||||
|
||||
class TestBuildBoxProcessPayload:
|
||||
def test_basic_payload(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': ['server.py'],
|
||||
'env': {'KEY': 'val'},
|
||||
},
|
||||
)
|
||||
payload = s._build_box_process_payload()
|
||||
assert payload['command'] == 'python'
|
||||
assert payload['args'] == ['server.py']
|
||||
assert payload['env'] == {'KEY': 'val'}
|
||||
assert payload['cwd'] == '/workspace'
|
||||
|
||||
def test_path_rewriting_applied(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': '/home/user/mcp/venv/bin/python',
|
||||
'args': ['/home/user/mcp/server.py', '--config', '/home/user/mcp/config.json'],
|
||||
'env': {},
|
||||
'box': {'host_path': '/home/user/mcp'},
|
||||
},
|
||||
)
|
||||
payload = s._build_box_process_payload()
|
||||
# venv python is replaced with plain 'python' (deps installed in-container)
|
||||
assert payload['command'] == 'python'
|
||||
assert payload['args'] == ['/workspace/server.py', '--config', '/workspace/config.json']
|
||||
|
||||
def test_non_matching_args_not_rewritten(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': ['/opt/other/server.py', '--flag'],
|
||||
'env': {},
|
||||
'box': {'host_path': '/home/user/mcp'},
|
||||
},
|
||||
)
|
||||
payload = s._build_box_process_payload()
|
||||
assert payload['command'] == 'python'
|
||||
assert payload['args'] == ['/opt/other/server.py', '--flag']
|
||||
|
||||
|
||||
# ── get_runtime_info_dict ───────────────────────────────────────────
|
||||
|
||||
|
||||
class TestGetRuntimeInfoDict:
|
||||
def test_non_stdio_session(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'test-uuid',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
info = s.get_runtime_info_dict()
|
||||
assert info['status'] == 'connecting'
|
||||
assert 'box_session_id' not in info
|
||||
|
||||
def test_runtime_tools_include_parameters(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'test-uuid',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
s.functions = [
|
||||
SimpleNamespace(
|
||||
name='create-service',
|
||||
description='Create a service',
|
||||
parameters={
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'project_id': {'type': 'string'},
|
||||
},
|
||||
'required': ['project_id'],
|
||||
},
|
||||
)
|
||||
]
|
||||
|
||||
info = s.get_runtime_info_dict()
|
||||
|
||||
assert info['tools'][0]['parameters']['properties']['project_id']['type'] == 'string'
|
||||
assert info['tools'][0]['parameters']['required'] == ['project_id']
|
||||
|
||||
def test_stdio_session_includes_box_info(self, mcp_module):
|
||||
ap = _make_ap()
|
||||
ap.box_service.available = True
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'test-uuid',
|
||||
'mode': 'stdio',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
ap=ap,
|
||||
)
|
||||
info = s.get_runtime_info_dict()
|
||||
assert info['box_session_id'] == 'mcp-shared'
|
||||
assert info['box_enabled'] is True
|
||||
|
||||
def test_stdio_session_refuses_when_box_unavailable(self, mcp_module):
|
||||
"""Policy: when Box is configured but unavailable (disabled in config
|
||||
OR connection failed), stdio MCP servers are NOT treated as box-stdio.
|
||||
``_init_stdio_python_server`` will raise a clear refusal at start
|
||||
time; until then, the runtime info simply omits box_session_id so the
|
||||
UI can render the disabled state cleanly."""
|
||||
ap = _make_ap()
|
||||
ap.box_service.available = False
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'test-uuid',
|
||||
'mode': 'stdio',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
ap=ap,
|
||||
)
|
||||
info = s.get_runtime_info_dict()
|
||||
assert 'box_session_id' not in info
|
||||
assert 'box_enabled' not in info
|
||||
|
||||
def test_stdio_session_without_box_service_uses_local_stdio(self, mcp_module):
|
||||
ap = _make_ap()
|
||||
del ap.box_service
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'test-uuid',
|
||||
'mode': 'stdio',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
ap=ap,
|
||||
)
|
||||
info = s.get_runtime_info_dict()
|
||||
assert 'box_session_id' not in info
|
||||
|
||||
|
||||
# ── Box config parsing ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBoxConfigParsing:
|
||||
def test_box_config_parsed_from_server_config(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
'box': {'image': 'node:20', 'host_path': '/home/user/mcp'},
|
||||
},
|
||||
)
|
||||
assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
|
||||
assert s.box_config.image == 'node:20'
|
||||
assert s.box_config.host_path == '/home/user/mcp'
|
||||
|
||||
def test_missing_box_key_uses_defaults(self, mcp_module):
|
||||
s = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'sse',
|
||||
'command': 'python',
|
||||
'args': [],
|
||||
},
|
||||
)
|
||||
assert isinstance(s.box_config, mcp_module.MCPServerBoxConfig)
|
||||
assert s.box_config.image is None
|
||||
assert s.box_config.host_path_mode == 'ro'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_init_box_stdio_server_stages_host_path_in_shared_workspace(mcp_module, tmp_path):
|
||||
mcp_stdio_module = sys.modules['langbot.pkg.provider.tools.loaders.mcp_stdio']
|
||||
|
||||
class FakeClientSession:
|
||||
def __init__(self, *_args):
|
||||
pass
|
||||
|
||||
async def __aenter__(self):
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
return False
|
||||
|
||||
async def initialize(self):
|
||||
return None
|
||||
|
||||
@asynccontextmanager
|
||||
async def fake_websocket_client(_url: str):
|
||||
yield ('read-stream', 'write-stream')
|
||||
|
||||
mcp_stdio_module.ClientSession = FakeClientSession
|
||||
mcp_stdio_module.websocket_client = fake_websocket_client
|
||||
|
||||
ap = _make_ap()
|
||||
ap.box_service.available = True
|
||||
ap.box_service.default_workspace = str(tmp_path / 'shared-box-workspace')
|
||||
ap.box_service.create_session = AsyncMock(return_value={})
|
||||
ap.box_service.build_spec = Mock(return_value='validated-spec')
|
||||
ap.box_service.client = SimpleNamespace(
|
||||
execute=AsyncMock(return_value=SimpleNamespace(ok=True, stderr='', exit_code=0))
|
||||
)
|
||||
ap.box_service.start_managed_process = AsyncMock(return_value={})
|
||||
ap.box_service.get_managed_process_websocket_url = Mock(return_value='ws://box.example/process')
|
||||
|
||||
host_path = tmp_path / 'mcp-source'
|
||||
host_path.mkdir()
|
||||
server_file = host_path / 'server.py'
|
||||
server_file.write_text('print("hello")\n', encoding='utf-8')
|
||||
|
||||
session = _make_session(
|
||||
mcp_module,
|
||||
{
|
||||
'name': 'test',
|
||||
'uuid': 'u1',
|
||||
'mode': 'stdio',
|
||||
'command': str(host_path / '.venv' / 'bin' / 'python'),
|
||||
'args': [str(server_file)],
|
||||
'box': {'host_path': str(host_path)},
|
||||
},
|
||||
ap=ap,
|
||||
)
|
||||
|
||||
await session._init_box_stdio_server()
|
||||
await session.exit_stack.aclose()
|
||||
|
||||
assert ap.box_service.create_session.await_count == 1
|
||||
session_payload = ap.box_service.create_session.await_args.args[0]
|
||||
assert session_payload['session_id'] == 'mcp-shared'
|
||||
assert 'host_path' not in session_payload
|
||||
assert ap.box_service.build_spec.call_count == 1
|
||||
assert ap.box_service.build_spec.call_args.kwargs.get('skip_host_mount_validation', False) is False
|
||||
assert ap.box_service.build_spec.call_args.args[0]['host_path'] == str(host_path)
|
||||
|
||||
staged_file = tmp_path / 'shared-box-workspace' / '.mcp' / 'u1' / 'workspace' / 'server.py'
|
||||
assert staged_file.read_text(encoding='utf-8') == 'print("hello")\n'
|
||||
|
||||
process_payload = ap.box_service.start_managed_process.await_args.args[1]
|
||||
assert process_payload['process_id'] == 'u1'
|
||||
assert process_payload['command'] == 'python'
|
||||
assert process_payload['args'] == ['/workspace/.mcp/u1/workspace/server.py']
|
||||
assert process_payload['cwd'] == '/workspace/.mcp/u1/workspace'
|
||||
@@ -169,6 +169,7 @@ async def test_updated_llm_model_is_immediately_usable_by_local_agent_pipeline()
|
||||
ap.logger = Mock()
|
||||
ap.persistence_mgr = SimpleNamespace(execute_async=AsyncMock())
|
||||
ap.tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
|
||||
ap.skill_mgr = None # PreProcessor only uses skill_mgr for the local-agent skill-binding branch
|
||||
ap.plugin_connector = SimpleNamespace(
|
||||
emit_event=AsyncMock(return_value=SimpleNamespace(event=SimpleNamespace(default_prompt=[], prompt=[])))
|
||||
)
|
||||
|
||||
479
tests/unit_tests/provider/test_skill_tools.py
Normal file
479
tests/unit_tests/provider/test_skill_tools.py
Normal file
@@ -0,0 +1,479 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import tempfile
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def _make_ap(logger=None):
|
||||
ap = SimpleNamespace()
|
||||
ap.logger = logger or Mock()
|
||||
ap.persistence_mgr = Mock()
|
||||
ap.persistence_mgr.execute_async = AsyncMock(return_value=Mock(all=Mock(return_value=[])))
|
||||
ap.persistence_mgr.serialize_model = Mock(side_effect=lambda cls, row: row)
|
||||
return ap
|
||||
|
||||
|
||||
def _make_skill_data(
|
||||
name='test-skill',
|
||||
instructions='Do something',
|
||||
package_root='',
|
||||
entry_file='SKILL.md',
|
||||
**kwargs,
|
||||
):
|
||||
return {
|
||||
'name': name,
|
||||
'display_name': kwargs.pop('display_name', name),
|
||||
'description': kwargs.pop('description', f'Description of {name}'),
|
||||
'instructions': instructions,
|
||||
'package_root': package_root,
|
||||
'entry_file': entry_file,
|
||||
**kwargs,
|
||||
}
|
||||
|
||||
|
||||
class TestSkillManagerCache:
|
||||
"""The Box runtime is the only source of truth — SkillManager just holds
|
||||
an in-memory cache populated by ``reload_skills``. There is no local
|
||||
filesystem reader anymore."""
|
||||
|
||||
def test_refresh_skill_from_disk_reports_cache_presence(self):
|
||||
"""Box is the only source of truth for skill content. refresh_skill_from_disk
|
||||
now just reports whether the skill is still in the in-memory cache —
|
||||
the actual content refresh is driven by SkillService awaiting
|
||||
``reload_skills`` after every Box mutation."""
|
||||
from langbot.pkg.skill.manager import SkillManager
|
||||
|
||||
ap = _make_ap()
|
||||
mgr = SkillManager(ap)
|
||||
|
||||
# Empty cache → returns False
|
||||
assert mgr.refresh_skill_from_disk('test-skill') is False
|
||||
|
||||
# Cache populated → returns True; method does NOT mutate the cache
|
||||
cached = _make_skill_data(name='test-skill', instructions='Cached')
|
||||
mgr.skills['test-skill'] = cached
|
||||
assert mgr.refresh_skill_from_disk('test-skill') is True
|
||||
assert mgr.skills['test-skill'] is cached
|
||||
assert mgr.refresh_skill_from_disk('') is False
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_reload_skills_drops_box_skills_with_missing_package_root(self):
|
||||
"""When Box reports a skill whose package_root is gone from the
|
||||
LangBot-visible filesystem, the cache must drop it instead of
|
||||
keeping a stale entry that would later produce a bad mount."""
|
||||
from langbot.pkg.skill.manager import SkillManager
|
||||
|
||||
with tempfile.TemporaryDirectory() as live_dir:
|
||||
ghost_dir = os.path.join(live_dir, '_does_not_exist')
|
||||
box_service = SimpleNamespace(
|
||||
available=True,
|
||||
list_skills=AsyncMock(
|
||||
return_value=[
|
||||
_make_skill_data(name='alive', package_root=live_dir),
|
||||
_make_skill_data(name='ghost', package_root=ghost_dir),
|
||||
]
|
||||
),
|
||||
)
|
||||
|
||||
ap = _make_ap()
|
||||
ap.box_service = box_service
|
||||
mgr = SkillManager(ap)
|
||||
|
||||
await mgr.reload_skills()
|
||||
|
||||
assert list(mgr.skills) == ['alive']
|
||||
# Warning fired with the dropped skill name so operators can see it.
|
||||
warning_messages = [str(call.args[0]) for call in ap.logger.warning.call_args_list]
|
||||
assert any('ghost' in msg and 'package_root missing' in msg for msg in warning_messages)
|
||||
|
||||
|
||||
class TestSkillActivationHelper:
|
||||
"""Skill activation is now Tool-Call based.
|
||||
|
||||
The legacy text-marker mechanism (``[ACTIVATE_SKILL: x]`` detection,
|
||||
``build_activation_prompt_for_skills``, ``remove_activation_marker``,
|
||||
``prepare_skill_activation``) has been removed. Activation now goes
|
||||
through ``skill.activation.register_activated_skill``, invoked by the
|
||||
``activate`` Tool Call.
|
||||
"""
|
||||
|
||||
def test_register_activated_skill_records_known_skill(self):
|
||||
from langbot.pkg.skill.activation import register_activated_skill
|
||||
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
|
||||
from langbot.pkg.skill.manager import SkillManager
|
||||
|
||||
ap = _make_ap()
|
||||
mgr = SkillManager(ap)
|
||||
mgr.skills = {
|
||||
'primary': _make_skill_data(name='primary', instructions='Primary instructions'),
|
||||
}
|
||||
ap.skill_mgr = mgr
|
||||
|
||||
query = SimpleNamespace(variables={})
|
||||
|
||||
assert register_activated_skill(ap, query, 'primary') is True
|
||||
assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'primary'}
|
||||
assert query.variables[ACTIVATED_SKILLS_KEY]['primary']['name'] == 'primary'
|
||||
|
||||
def test_register_activated_skill_rejects_unknown_skill(self):
|
||||
from langbot.pkg.skill.activation import register_activated_skill
|
||||
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
|
||||
from langbot.pkg.skill.manager import SkillManager
|
||||
|
||||
ap = _make_ap()
|
||||
mgr = SkillManager(ap)
|
||||
mgr.skills = {'primary': _make_skill_data(name='primary')}
|
||||
ap.skill_mgr = mgr
|
||||
|
||||
query = SimpleNamespace(variables={})
|
||||
|
||||
assert register_activated_skill(ap, query, 'missing') is False
|
||||
assert ACTIVATED_SKILLS_KEY not in query.variables
|
||||
|
||||
def test_register_activated_skill_without_skill_manager_returns_false(self):
|
||||
from langbot.pkg.skill.activation import register_activated_skill
|
||||
|
||||
ap = _make_ap() # no skill_mgr attribute
|
||||
query = SimpleNamespace(variables={})
|
||||
|
||||
assert register_activated_skill(ap, query, 'primary') is False
|
||||
|
||||
|
||||
class TestSkillPathHelpers:
|
||||
def test_get_visible_skills_filters_by_bound_names(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY, get_visible_skills
|
||||
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(
|
||||
skills={
|
||||
'visible': _make_skill_data(name='visible'),
|
||||
'hidden': _make_skill_data(name='hidden'),
|
||||
}
|
||||
)
|
||||
query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['visible']})
|
||||
|
||||
result = get_visible_skills(ap, query)
|
||||
|
||||
assert list(result.keys()) == ['visible']
|
||||
|
||||
def test_resolve_virtual_skill_path_allows_visible_skill_reads(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill import (
|
||||
PIPELINE_BOUND_SKILLS_KEY,
|
||||
resolve_virtual_skill_path,
|
||||
)
|
||||
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
|
||||
query = SimpleNamespace(variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
|
||||
|
||||
skill, rewritten = resolve_virtual_skill_path(
|
||||
ap,
|
||||
query,
|
||||
'/workspace/.skills/demo/SKILL.md',
|
||||
include_visible=True,
|
||||
include_activated=False,
|
||||
)
|
||||
|
||||
assert skill['name'] == 'demo'
|
||||
assert rewritten == '/workspace/SKILL.md'
|
||||
|
||||
def test_build_skill_session_id_uses_name_based_identifier(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill import build_skill_session_id
|
||||
|
||||
with_launcher = build_skill_session_id(
|
||||
{'name': 'writer'},
|
||||
SimpleNamespace(query_id=42, launcher_type='person', launcher_id='123'),
|
||||
)
|
||||
fallback = build_skill_session_id({'name': 'writer'}, SimpleNamespace(query_id=99))
|
||||
|
||||
assert with_launcher == 'skill-person_123-writer'
|
||||
assert fallback == 'skill-99-writer'
|
||||
|
||||
def test_should_prepare_skill_python_env_detects_manifests_and_venv(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill import should_prepare_skill_python_env
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
assert should_prepare_skill_python_env(tmpdir) is False
|
||||
|
||||
with open(os.path.join(tmpdir, 'requirements.txt'), 'w', encoding='utf-8') as f:
|
||||
f.write('requests==2.32.0\n')
|
||||
assert should_prepare_skill_python_env(tmpdir) is True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
os.makedirs(os.path.join(tmpdir, '.venv'))
|
||||
assert should_prepare_skill_python_env(tmpdir) is True
|
||||
|
||||
def test_wrap_skill_command_with_python_env_bootstraps_then_runs_command(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill import wrap_skill_command_with_python_env
|
||||
|
||||
command = wrap_skill_command_with_python_env('python scripts/run.py')
|
||||
|
||||
assert 'python -m venv "$_LB_VENV_DIR"' in command
|
||||
assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
|
||||
assert command.rstrip().endswith('python scripts/run.py')
|
||||
|
||||
|
||||
class TestSkillToolLoader:
|
||||
"""The skill tool surface is now just ``activate`` + ``register_skill``.
|
||||
|
||||
The legacy CRUD authoring tools (create/list/get/update/delete/
|
||||
import_skill_from_directory/reload_skills) were removed; skill CRUD is
|
||||
handled by SkillService via the HTTP API / web UI instead.
|
||||
"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_activate_returns_instructions_and_registers_skill(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import (
|
||||
ACTIVATE_SKILL_TOOL_NAME,
|
||||
SkillToolLoader,
|
||||
)
|
||||
from langbot.pkg.provider.tools.loaders.skill import ACTIVATED_SKILLS_KEY
|
||||
|
||||
skill = _make_skill_data(name='demo', package_root='/data/skills/demo', instructions='Step 1')
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(
|
||||
skills={'demo': skill},
|
||||
get_skill_by_name=lambda name: skill if name == 'demo' else None,
|
||||
)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
query = SimpleNamespace(variables={})
|
||||
|
||||
result = await loader.invoke_tool(ACTIVATE_SKILL_TOOL_NAME, {'skill_name': 'demo'}, query)
|
||||
|
||||
assert result['activated'] is True
|
||||
assert result['skill_name'] == 'demo'
|
||||
assert result['mount_path'] == '/workspace/.skills/demo'
|
||||
assert 'Step 1' in result['content']
|
||||
assert set(query.variables[ACTIVATED_SKILLS_KEY].keys()) == {'demo'}
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_activate_unknown_skill_raises(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import (
|
||||
ACTIVATE_SKILL_TOOL_NAME,
|
||||
SkillToolLoader,
|
||||
)
|
||||
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(
|
||||
skills={'demo': _make_skill_data(name='demo')},
|
||||
get_skill_by_name=lambda name: None,
|
||||
)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
|
||||
with pytest.raises(ValueError, match='not found'):
|
||||
await loader.invoke_tool(
|
||||
ACTIVATE_SKILL_TOOL_NAME,
|
||||
{'skill_name': 'ghost'},
|
||||
SimpleNamespace(variables={}),
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_register_skill_scans_directory_and_creates_skill(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import (
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
SkillToolLoader,
|
||||
)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
repo_dir = os.path.join(tmpdir, 'repo')
|
||||
os.makedirs(repo_dir)
|
||||
|
||||
ap = _make_ap()
|
||||
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
|
||||
ap.skill_service = SimpleNamespace(
|
||||
scan_directory_async=AsyncMock(
|
||||
return_value={
|
||||
'name': 'cloned-skill',
|
||||
'display_name': 'Cloned Skill',
|
||||
'description': 'Imported from clone',
|
||||
'instructions': 'Do work',
|
||||
}
|
||||
),
|
||||
create_skill=AsyncMock(
|
||||
return_value=_make_skill_data(name='cloned-skill', package_root=os.path.realpath(repo_dir))
|
||||
),
|
||||
)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
result = await loader.invoke_tool(
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
{'path': '/workspace/repo'},
|
||||
SimpleNamespace(),
|
||||
)
|
||||
|
||||
ap.skill_service.scan_directory_async.assert_awaited_once_with(os.path.realpath(repo_dir))
|
||||
ap.skill_service.create_skill.assert_awaited_once_with(
|
||||
{
|
||||
'name': 'cloned-skill',
|
||||
'display_name': 'Cloned Skill',
|
||||
'description': 'Imported from clone',
|
||||
'instructions': 'Do work',
|
||||
'package_root': os.path.realpath(repo_dir),
|
||||
}
|
||||
)
|
||||
assert result['registered'] is True
|
||||
assert result['skill_name'] == 'cloned-skill'
|
||||
assert result['source_path'] == '/workspace/repo'
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_register_skill_rejects_workspace_escape(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import (
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
SkillToolLoader,
|
||||
)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
ap = _make_ap()
|
||||
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
|
||||
ap.skill_service = SimpleNamespace(scan_directory_async=AsyncMock(), create_skill=AsyncMock())
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
|
||||
with pytest.raises(ValueError, match='escapes the workspace boundary'):
|
||||
await loader.invoke_tool(
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
{'path': '/workspace/../../etc'},
|
||||
SimpleNamespace(),
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_register_skill_requires_skill_service(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import (
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
SkillToolLoader,
|
||||
)
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
ap = _make_ap() # no skill_service attribute
|
||||
ap.box_service = SimpleNamespace(default_workspace=tmpdir, available=True)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
|
||||
with pytest.raises(ValueError, match='Skill service not available'):
|
||||
await loader.invoke_tool(
|
||||
REGISTER_SKILL_TOOL_NAME,
|
||||
{'path': '/workspace/foo'},
|
||||
SimpleNamespace(),
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tools_hidden_when_sandbox_backend_unavailable(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
|
||||
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(skills={})
|
||||
ap.box_service = SimpleNamespace(
|
||||
available=True,
|
||||
get_status=AsyncMock(return_value={'backend': {'available': False}}),
|
||||
)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
await loader.initialize()
|
||||
|
||||
assert await loader.get_tools() == []
|
||||
assert await loader.has_tool('activate') is False
|
||||
assert await loader.has_tool('register_skill') is False
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tools_exposed_when_sandbox_backend_available(self):
|
||||
from langbot.pkg.provider.tools.loaders.skill_authoring import SkillToolLoader
|
||||
|
||||
ap = _make_ap()
|
||||
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo')})
|
||||
ap.box_service = SimpleNamespace(
|
||||
available=True,
|
||||
get_status=AsyncMock(return_value={'backend': {'available': True}}),
|
||||
)
|
||||
|
||||
loader = SkillToolLoader(ap)
|
||||
await loader.initialize()
|
||||
|
||||
tools = await loader.get_tools()
|
||||
|
||||
assert sorted(tool.name for tool in tools) == ['activate', 'register_skill']
|
||||
assert await loader.has_tool('activate') is True
|
||||
assert await loader.has_tool('register_skill') is True
|
||||
|
||||
|
||||
class TestNativeToolLoaderSkillPaths:
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_visible_skill_file(self):
|
||||
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
|
||||
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
skill_md = os.path.join(tmpdir, 'SKILL.md')
|
||||
with open(skill_md, 'w', encoding='utf-8') as f:
|
||||
f.write('demo instructions')
|
||||
|
||||
ap = _make_ap()
|
||||
ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
|
||||
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
|
||||
loader = NativeToolLoader(ap)
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'read',
|
||||
{'path': '/workspace/.skills/demo/SKILL.md'},
|
||||
SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']}),
|
||||
)
|
||||
|
||||
assert result == {'ok': True, 'content': 'demo instructions'}
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_exec_in_activated_skill_mount_rewrites_command_and_refreshes(self):
|
||||
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
|
||||
from langbot.pkg.provider.tools.loaders.skill import register_activated_skill
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
ap = _make_ap()
|
||||
ap.box_service = SimpleNamespace(
|
||||
available=True,
|
||||
default_workspace=tmpdir,
|
||||
execute_tool=AsyncMock(return_value={'ok': True}),
|
||||
)
|
||||
ap.skill_mgr = SimpleNamespace(refresh_skill_from_disk=Mock())
|
||||
loader = NativeToolLoader(ap)
|
||||
|
||||
query = SimpleNamespace(query_id='q1', launcher_type='person', launcher_id='123', variables={})
|
||||
register_activated_skill(query, _make_skill_data(name='demo', package_root=tmpdir))
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'exec',
|
||||
{
|
||||
'command': 'python /workspace/.skills/demo/scripts/run.py',
|
||||
'workdir': '/workspace/.skills/demo',
|
||||
},
|
||||
query,
|
||||
)
|
||||
|
||||
assert result == {'ok': True}
|
||||
tool_parameters = ap.box_service.execute_tool.await_args.args[0]
|
||||
assert tool_parameters['command'] == 'python /workspace/.skills/demo/scripts/run.py'
|
||||
assert tool_parameters['workdir'] == '/workspace/.skills/demo'
|
||||
ap.skill_mgr.refresh_skill_from_disk.assert_called_once_with('demo')
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_write_requires_skill_activation(self):
|
||||
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
|
||||
from langbot.pkg.provider.tools.loaders.skill import PIPELINE_BOUND_SKILLS_KEY
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
ap = _make_ap()
|
||||
ap.box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
|
||||
ap.skill_mgr = SimpleNamespace(skills={'demo': _make_skill_data(name='demo', package_root=tmpdir)})
|
||||
loader = NativeToolLoader(ap)
|
||||
|
||||
query = SimpleNamespace(query_id='q1', variables={PIPELINE_BOUND_SKILLS_KEY: ['demo']})
|
||||
|
||||
with pytest.raises(ValueError, match='Skill "demo" is not available at this path'):
|
||||
await loader.invoke_tool(
|
||||
'write',
|
||||
{'path': '/workspace/.skills/demo/notes.txt', 'content': 'hi'},
|
||||
query,
|
||||
)
|
||||
@@ -4,6 +4,7 @@ Tests cover:
|
||||
- Tool schema generation for OpenAI and Anthropic
|
||||
- Tool execution dispatch
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
@@ -52,11 +53,12 @@ class TestToolManagerSchemaGeneration:
|
||||
@pytest.fixture
|
||||
def sample_tools(self):
|
||||
"""Create sample LLMTool list for testing."""
|
||||
|
||||
def dummy_weather_func(**kwargs):
|
||||
return "weather result"
|
||||
return 'weather result'
|
||||
|
||||
def dummy_calc_func(**kwargs):
|
||||
return "calc result"
|
||||
return 'calc result'
|
||||
|
||||
tools = [
|
||||
resource_tool.LLMTool(
|
||||
@@ -65,15 +67,10 @@ class TestToolManagerSchemaGeneration:
|
||||
description='Get current weather for a location',
|
||||
parameters={
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'location': {
|
||||
'type': 'string',
|
||||
'description': 'City name'
|
||||
}
|
||||
},
|
||||
'required': ['location']
|
||||
'properties': {'location': {'type': 'string', 'description': 'City name'}},
|
||||
'required': ['location'],
|
||||
},
|
||||
func=dummy_weather_func
|
||||
func=dummy_weather_func,
|
||||
),
|
||||
resource_tool.LLMTool(
|
||||
name='calculate',
|
||||
@@ -81,15 +78,10 @@ class TestToolManagerSchemaGeneration:
|
||||
description='Perform a calculation',
|
||||
parameters={
|
||||
'type': 'object',
|
||||
'properties': {
|
||||
'expression': {
|
||||
'type': 'string',
|
||||
'description': 'Math expression'
|
||||
}
|
||||
},
|
||||
'required': ['expression']
|
||||
'properties': {'expression': {'type': 'string', 'description': 'Math expression'}},
|
||||
'required': ['expression'],
|
||||
},
|
||||
func=dummy_calc_func
|
||||
func=dummy_calc_func,
|
||||
),
|
||||
]
|
||||
return tools
|
||||
@@ -188,26 +180,48 @@ class TestToolManagerExecuteFuncCall:
|
||||
|
||||
@pytest.fixture
|
||||
def mock_app_with_loaders(self):
|
||||
"""Create mock app with mock tool loaders."""
|
||||
"""Create mock app with mock tool loaders.
|
||||
|
||||
Returns (app, plugin_loader, mcp_loader). The native and skill loaders
|
||||
are attached directly to the app for tests that don't need to assert
|
||||
against them — they all default to ``has_tool == False`` so the
|
||||
execute_func_call probe falls through to the plugin/mcp pair.
|
||||
"""
|
||||
mock_app = Mock()
|
||||
mock_app.logger = Mock()
|
||||
|
||||
def _make_inert_loader():
|
||||
loader = Mock()
|
||||
loader.has_tool = AsyncMock(return_value=False)
|
||||
loader.invoke_tool = AsyncMock(return_value=None)
|
||||
loader.initialize = AsyncMock()
|
||||
loader.shutdown = AsyncMock()
|
||||
return loader
|
||||
|
||||
# Create mock plugin loader
|
||||
mock_plugin_loader = Mock()
|
||||
mock_plugin_loader.has_tool = AsyncMock(return_value=False)
|
||||
mock_plugin_loader = _make_inert_loader()
|
||||
mock_plugin_loader.invoke_tool = AsyncMock(return_value='plugin_result')
|
||||
mock_plugin_loader.initialize = AsyncMock()
|
||||
mock_plugin_loader.shutdown = AsyncMock()
|
||||
|
||||
# Create mock MCP loader
|
||||
mock_mcp_loader = Mock()
|
||||
mock_mcp_loader.has_tool = AsyncMock(return_value=False)
|
||||
mock_mcp_loader = _make_inert_loader()
|
||||
mock_mcp_loader.invoke_tool = AsyncMock(return_value='mcp_result')
|
||||
mock_mcp_loader.initialize = AsyncMock()
|
||||
mock_mcp_loader.shutdown = AsyncMock()
|
||||
|
||||
# Stash inert native/skill loaders so the ToolManager probe order
|
||||
# (native → plugin → mcp → skill) doesn't AttributeError. Tests that
|
||||
# need to override these can replace the attributes on the manager.
|
||||
mock_app._inert_native_loader = _make_inert_loader()
|
||||
mock_app._inert_skill_loader = _make_inert_loader()
|
||||
|
||||
return mock_app, mock_plugin_loader, mock_mcp_loader
|
||||
|
||||
@staticmethod
|
||||
def _wire_loaders(manager, mock_app, plugin_loader, mcp_loader):
|
||||
"""Attach all four loaders (native + plugin + mcp + skill) to manager."""
|
||||
manager.native_tool_loader = mock_app._inert_native_loader
|
||||
manager.plugin_tool_loader = plugin_loader
|
||||
manager.mcp_tool_loader = mcp_loader
|
||||
manager.skill_tool_loader = mock_app._inert_skill_loader
|
||||
|
||||
@pytest.fixture
|
||||
def sample_query(self):
|
||||
"""Create sample query for testing."""
|
||||
@@ -215,9 +229,7 @@ class TestToolManagerExecuteFuncCall:
|
||||
return query
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_calls_plugin_loader_when_has_tool(
|
||||
self, mock_app_with_loaders, sample_query
|
||||
):
|
||||
async def test_execute_calls_plugin_loader_when_has_tool(self, mock_app_with_loaders, sample_query):
|
||||
"""Test that execute_func_call uses plugin loader when tool exists there."""
|
||||
toolmgr = get_toolmgr_module()
|
||||
|
||||
@@ -225,26 +237,17 @@ class TestToolManagerExecuteFuncCall:
|
||||
mock_plugin_loader.has_tool = AsyncMock(return_value=True)
|
||||
|
||||
manager = toolmgr.ToolManager(mock_app)
|
||||
manager.plugin_tool_loader = mock_plugin_loader
|
||||
manager.mcp_tool_loader = mock_mcp_loader
|
||||
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
|
||||
|
||||
result = await manager.execute_func_call(
|
||||
'test_tool',
|
||||
{'param': 'value'},
|
||||
sample_query
|
||||
)
|
||||
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
|
||||
|
||||
assert result == 'plugin_result'
|
||||
mock_plugin_loader.invoke_tool.assert_called_once_with(
|
||||
'test_tool', {'param': 'value'}, sample_query
|
||||
)
|
||||
mock_plugin_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
|
||||
# MCP loader should not be called
|
||||
mock_mcp_loader.invoke_tool.assert_not_called()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_calls_mcp_loader_when_plugin_not_found(
|
||||
self, mock_app_with_loaders, sample_query
|
||||
):
|
||||
async def test_execute_calls_mcp_loader_when_plugin_not_found(self, mock_app_with_loaders, sample_query):
|
||||
"""Test that execute_func_call uses MCP loader when plugin doesn't have tool."""
|
||||
toolmgr = get_toolmgr_module()
|
||||
|
||||
@@ -253,24 +256,15 @@ class TestToolManagerExecuteFuncCall:
|
||||
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
|
||||
|
||||
manager = toolmgr.ToolManager(mock_app)
|
||||
manager.plugin_tool_loader = mock_plugin_loader
|
||||
manager.mcp_tool_loader = mock_mcp_loader
|
||||
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
|
||||
|
||||
result = await manager.execute_func_call(
|
||||
'test_tool',
|
||||
{'param': 'value'},
|
||||
sample_query
|
||||
)
|
||||
result = await manager.execute_func_call('test_tool', {'param': 'value'}, sample_query)
|
||||
|
||||
assert result == 'mcp_result'
|
||||
mock_mcp_loader.invoke_tool.assert_called_once_with(
|
||||
'test_tool', {'param': 'value'}, sample_query
|
||||
)
|
||||
mock_mcp_loader.invoke_tool.assert_called_once_with('test_tool', {'param': 'value'}, sample_query)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_execute_raises_when_tool_not_found(
|
||||
self, mock_app_with_loaders, sample_query
|
||||
):
|
||||
async def test_execute_raises_when_tool_not_found(self, mock_app_with_loaders, sample_query):
|
||||
"""Test that execute_func_call raises ValueError when tool not found."""
|
||||
toolmgr = get_toolmgr_module()
|
||||
|
||||
@@ -279,20 +273,13 @@ class TestToolManagerExecuteFuncCall:
|
||||
mock_mcp_loader.has_tool = AsyncMock(return_value=False)
|
||||
|
||||
manager = toolmgr.ToolManager(mock_app)
|
||||
manager.plugin_tool_loader = mock_plugin_loader
|
||||
manager.mcp_tool_loader = mock_mcp_loader
|
||||
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
|
||||
|
||||
with pytest.raises(ValueError, match='未找到工具'):
|
||||
await manager.execute_func_call(
|
||||
'unknown_tool',
|
||||
{},
|
||||
sample_query
|
||||
)
|
||||
await manager.execute_func_call('unknown_tool', {}, sample_query)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_plugin_loader_checked_first(
|
||||
self, mock_app_with_loaders, sample_query
|
||||
):
|
||||
async def test_plugin_loader_checked_first(self, mock_app_with_loaders, sample_query):
|
||||
"""Test that plugin loader is checked before MCP loader."""
|
||||
toolmgr = get_toolmgr_module()
|
||||
|
||||
@@ -302,8 +289,7 @@ class TestToolManagerExecuteFuncCall:
|
||||
mock_mcp_loader.has_tool = AsyncMock(return_value=True)
|
||||
|
||||
manager = toolmgr.ToolManager(mock_app)
|
||||
manager.plugin_tool_loader = mock_plugin_loader
|
||||
manager.mcp_tool_loader = mock_mcp_loader
|
||||
self._wire_loaders(manager, mock_app, mock_plugin_loader, mock_mcp_loader)
|
||||
|
||||
await manager.execute_func_call('test_tool', {}, sample_query)
|
||||
|
||||
@@ -317,20 +303,30 @@ class TestToolManagerShutdown:
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_shutdown_calls_loader_shutdown(self):
|
||||
"""Test that shutdown calls shutdown on both loaders."""
|
||||
"""Test that shutdown calls shutdown on every registered loader."""
|
||||
toolmgr = get_toolmgr_module()
|
||||
|
||||
mock_app = Mock()
|
||||
mock_plugin_loader = Mock()
|
||||
mock_plugin_loader.shutdown = AsyncMock()
|
||||
mock_mcp_loader = Mock()
|
||||
mock_mcp_loader.shutdown = AsyncMock()
|
||||
|
||||
def _make_loader():
|
||||
loader = Mock()
|
||||
loader.shutdown = AsyncMock()
|
||||
return loader
|
||||
|
||||
mock_native_loader = _make_loader()
|
||||
mock_plugin_loader = _make_loader()
|
||||
mock_mcp_loader = _make_loader()
|
||||
mock_skill_loader = _make_loader()
|
||||
|
||||
manager = toolmgr.ToolManager(mock_app)
|
||||
manager.native_tool_loader = mock_native_loader
|
||||
manager.plugin_tool_loader = mock_plugin_loader
|
||||
manager.mcp_tool_loader = mock_mcp_loader
|
||||
manager.skill_tool_loader = mock_skill_loader
|
||||
|
||||
await manager.shutdown()
|
||||
|
||||
mock_native_loader.shutdown.assert_called_once()
|
||||
mock_plugin_loader.shutdown.assert_called_once()
|
||||
mock_mcp_loader.shutdown.assert_called_once()
|
||||
mock_mcp_loader.shutdown.assert_called_once()
|
||||
mock_skill_loader.shutdown.assert_called_once()
|
||||
|
||||
250
tests/unit_tests/provider/test_tool_manager_native.py
Normal file
250
tests/unit_tests/provider/test_tool_manager_native.py
Normal file
@@ -0,0 +1,250 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import tempfile
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
import langbot_plugin.api.entities.builtin.resource.tool as resource_tool
|
||||
|
||||
from langbot.pkg.provider.tools.loaders.native import NativeToolLoader
|
||||
from langbot.pkg.provider.tools.toolmgr import ToolManager
|
||||
|
||||
|
||||
class StubLoader:
|
||||
def __init__(self, tools: list[resource_tool.LLMTool] | None = None, invoke_result=None):
|
||||
self._tools = tools or []
|
||||
self._invoke_result = invoke_result
|
||||
|
||||
async def get_tools(self, *_args, **_kwargs):
|
||||
return self._tools
|
||||
|
||||
async def has_tool(self, name: str) -> bool:
|
||||
return any(tool.name == name for tool in self._tools)
|
||||
|
||||
async def invoke_tool(self, name: str, parameters: dict, query):
|
||||
return self._invoke_result(name, parameters, query) if callable(self._invoke_result) else self._invoke_result
|
||||
|
||||
async def shutdown(self):
|
||||
return None
|
||||
|
||||
|
||||
def make_tool(name: str) -> resource_tool.LLMTool:
|
||||
return resource_tool.LLMTool(
|
||||
name=name,
|
||||
human_desc=name,
|
||||
description=name,
|
||||
parameters={'type': 'object', 'properties': {}},
|
||||
func=lambda parameters: parameters,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_manager_omits_skill_authoring_tools_by_default():
|
||||
manager = ToolManager(SimpleNamespace())
|
||||
manager.native_tool_loader = StubLoader([make_tool('exec')])
|
||||
manager.skill_tool_loader = StubLoader([make_tool('activate')])
|
||||
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
|
||||
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
|
||||
|
||||
tools = await manager.get_all_tools()
|
||||
|
||||
assert [tool.name for tool in tools] == ['exec', 'plugin_tool', 'mcp_tool']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_manager_includes_skill_authoring_tools_when_requested():
|
||||
manager = ToolManager(SimpleNamespace())
|
||||
manager.native_tool_loader = StubLoader([make_tool('exec')])
|
||||
manager.skill_tool_loader = StubLoader([make_tool('activate')])
|
||||
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
|
||||
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
|
||||
|
||||
tools = await manager.get_all_tools(include_skill_authoring=True)
|
||||
|
||||
assert [tool.name for tool in tools] == ['exec', 'activate', 'plugin_tool', 'mcp_tool']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_tool_manager_routes_native_tool_calls():
|
||||
app = SimpleNamespace()
|
||||
manager = ToolManager(app)
|
||||
manager.native_tool_loader = StubLoader([make_tool('exec')], invoke_result={'backend': 'fake'})
|
||||
manager.skill_tool_loader = StubLoader([make_tool('activate')])
|
||||
manager.plugin_tool_loader = StubLoader([make_tool('plugin_tool')])
|
||||
manager.mcp_tool_loader = StubLoader([make_tool('mcp_tool')])
|
||||
|
||||
result = await manager.execute_func_call('exec', {'command': 'pwd'}, query=Mock())
|
||||
|
||||
assert result == {'backend': 'fake'}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_native_tool_loader_hides_tools_when_box_unavailable():
|
||||
loader = NativeToolLoader(SimpleNamespace(box_service=SimpleNamespace(available=False)))
|
||||
|
||||
assert await loader.get_tools() == []
|
||||
for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
|
||||
assert await loader.has_tool(tool_name) is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_native_tool_loader_exposes_all_tools_when_box_available():
|
||||
box_service = SimpleNamespace(
|
||||
available=True,
|
||||
get_status=AsyncMock(return_value={'backend': {'available': True}}),
|
||||
)
|
||||
loader = NativeToolLoader(SimpleNamespace(box_service=box_service, logger=Mock()))
|
||||
await loader.initialize()
|
||||
|
||||
tools = await loader.get_tools()
|
||||
|
||||
assert [tool.name for tool in tools] == ['exec', 'read', 'write', 'edit', 'glob', 'grep']
|
||||
for tool_name in ('exec', 'read', 'write', 'edit', 'glob', 'grep'):
|
||||
assert await loader.has_tool(tool_name) is True
|
||||
|
||||
|
||||
# ── read/write/edit file tool tests ─────────────────────────────
|
||||
|
||||
|
||||
def _make_loader_with_workspace(tmpdir: str) -> tuple[NativeToolLoader, Mock]:
|
||||
logger = Mock()
|
||||
box_service = SimpleNamespace(available=True, default_workspace=tmpdir)
|
||||
ap = SimpleNamespace(box_service=box_service, logger=logger)
|
||||
return NativeToolLoader(ap), logger
|
||||
|
||||
|
||||
def _make_query() -> Mock:
|
||||
q = Mock()
|
||||
q.query_id = 'test-query-1'
|
||||
return q
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_file():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
with open(os.path.join(tmpdir, 'hello.txt'), 'w') as f:
|
||||
f.write('hello world')
|
||||
|
||||
result = await loader.invoke_tool('read', {'path': '/workspace/hello.txt'}, _make_query())
|
||||
|
||||
assert result['ok'] is True
|
||||
assert result['content'] == 'hello world'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_nonexistent_file():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
|
||||
result = await loader.invoke_tool('read', {'path': '/workspace/no_such.txt'}, _make_query())
|
||||
|
||||
assert result['ok'] is False
|
||||
assert 'not found' in result['error'].lower()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_directory():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
os.makedirs(os.path.join(tmpdir, 'subdir'))
|
||||
with open(os.path.join(tmpdir, 'a.txt'), 'w') as f:
|
||||
f.write('a')
|
||||
|
||||
result = await loader.invoke_tool('read', {'path': '/workspace'}, _make_query())
|
||||
|
||||
assert result['ok'] is True
|
||||
assert result['is_directory'] is True
|
||||
assert 'a.txt' in result['content']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_write_creates_file():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'write', {'path': '/workspace/new.txt', 'content': 'new content'}, _make_query()
|
||||
)
|
||||
|
||||
assert result['ok'] is True
|
||||
with open(os.path.join(tmpdir, 'new.txt')) as f:
|
||||
assert f.read() == 'new content'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_write_creates_subdirectories():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'write', {'path': '/workspace/sub/deep/file.txt', 'content': 'nested'}, _make_query()
|
||||
)
|
||||
|
||||
assert result['ok'] is True
|
||||
with open(os.path.join(tmpdir, 'sub', 'deep', 'file.txt')) as f:
|
||||
assert f.read() == 'nested'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_edit_replaces_unique_string():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
with open(os.path.join(tmpdir, 'code.py'), 'w') as f:
|
||||
f.write('def foo():\n return 1\n')
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'edit',
|
||||
{'path': '/workspace/code.py', 'old_string': 'return 1', 'new_string': 'return 42'},
|
||||
_make_query(),
|
||||
)
|
||||
|
||||
assert result['ok'] is True
|
||||
with open(os.path.join(tmpdir, 'code.py')) as f:
|
||||
assert f.read() == 'def foo():\n return 42\n'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_edit_rejects_ambiguous_match():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
with open(os.path.join(tmpdir, 'dup.txt'), 'w') as f:
|
||||
f.write('aaa\naaa\n')
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'edit',
|
||||
{'path': '/workspace/dup.txt', 'old_string': 'aaa', 'new_string': 'bbb'},
|
||||
_make_query(),
|
||||
)
|
||||
|
||||
assert result['ok'] is False
|
||||
assert '2' in result['error']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_edit_rejects_missing_string():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
with open(os.path.join(tmpdir, 'x.txt'), 'w') as f:
|
||||
f.write('hello')
|
||||
|
||||
result = await loader.invoke_tool(
|
||||
'edit',
|
||||
{'path': '/workspace/x.txt', 'old_string': 'nope', 'new_string': 'yes'},
|
||||
_make_query(),
|
||||
)
|
||||
|
||||
assert result['ok'] is False
|
||||
assert 'not found' in result['error'].lower()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_path_escape_blocked():
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
loader, _ = _make_loader_with_workspace(tmpdir)
|
||||
|
||||
with pytest.raises(ValueError, match='escapes'):
|
||||
await loader.invoke_tool('read', {'path': '/workspace/../../etc/passwd'}, _make_query())
|
||||
23
tests/unit_tests/test_paths.py
Normal file
23
tests/unit_tests/test_paths.py
Normal file
@@ -0,0 +1,23 @@
|
||||
from pathlib import Path
|
||||
|
||||
from src.langbot.pkg.utils import paths
|
||||
|
||||
|
||||
def test_get_data_root_uses_source_root_in_repo_checkout():
|
||||
data_root = Path(paths.get_data_root())
|
||||
repo_root = Path(__file__).resolve().parents[2]
|
||||
|
||||
assert data_root == repo_root / 'data'
|
||||
|
||||
|
||||
def test_get_data_path_joins_under_data_root():
|
||||
data_path = Path(paths.get_data_path('skills', 'demo-skill'))
|
||||
repo_root = Path(__file__).resolve().parents[2]
|
||||
|
||||
assert data_path == repo_root / 'data' / 'skills' / 'demo-skill'
|
||||
|
||||
|
||||
def test_get_data_root_honors_env_override(monkeypatch, tmp_path):
|
||||
monkeypatch.setenv('LANGBOT_DATA_ROOT', str(tmp_path / 'custom-data'))
|
||||
|
||||
assert Path(paths.get_data_root()) == (tmp_path / 'custom-data').resolve()
|
||||
204
tests/unit_tests/test_preproc.py
Normal file
204
tests/unit_tests/test_preproc.py
Normal file
@@ -0,0 +1,204 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import sys
|
||||
import types
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock, Mock
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot_plugin.api.entities.builtin.pipeline.query import Query
|
||||
from langbot_plugin.api.entities.builtin.platform.entities import Friend
|
||||
from langbot_plugin.api.entities.builtin.platform.events import FriendMessage
|
||||
from langbot_plugin.api.entities.builtin.platform.message import MessageChain, Plain
|
||||
from langbot_plugin.api.entities.builtin.provider.message import Message
|
||||
from langbot_plugin.api.entities.builtin.provider.prompt import Prompt
|
||||
from langbot_plugin.api.entities.builtin.provider.session import Conversation, LauncherTypes, Session
|
||||
|
||||
|
||||
def _make_query() -> Query:
|
||||
message_chain = MessageChain([Plain(text='create a skill')])
|
||||
return Query(
|
||||
query_id=1,
|
||||
launcher_type=LauncherTypes.PERSON,
|
||||
launcher_id='launcher-1',
|
||||
sender_id='sender-1',
|
||||
message_event=FriendMessage(
|
||||
message_chain=message_chain,
|
||||
time=0,
|
||||
sender=Friend(id='sender-1', nickname='Tester', remark='Tester'),
|
||||
),
|
||||
message_chain=message_chain,
|
||||
bot_uuid='bot-1',
|
||||
pipeline_uuid='pipe-1',
|
||||
pipeline_config={
|
||||
'ai': {
|
||||
'runner': {'runner': 'local-agent'},
|
||||
'local-agent': {
|
||||
'model': {'primary': 'model-1', 'fallbacks': []},
|
||||
'prompt': 'default',
|
||||
'knowledge-bases': [],
|
||||
},
|
||||
},
|
||||
'trigger': {'misc': {}},
|
||||
},
|
||||
variables={},
|
||||
)
|
||||
|
||||
|
||||
def _make_conversation() -> Conversation:
|
||||
return Conversation(
|
||||
prompt=Prompt(name='default', messages=[Message(role='system', content='system prompt')]),
|
||||
messages=[],
|
||||
pipeline_uuid='pipe-1',
|
||||
bot_uuid='bot-1',
|
||||
uuid='conv-1',
|
||||
)
|
||||
|
||||
|
||||
def _make_app(*, skill_service) -> SimpleNamespace:
|
||||
session = Session(launcher_type=LauncherTypes.PERSON, launcher_id='launcher-1', sender_id='sender-1')
|
||||
conversation = _make_conversation()
|
||||
model = SimpleNamespace(model_entity=SimpleNamespace(uuid='model-1', abilities={'func_call'}))
|
||||
tool_mgr = SimpleNamespace(get_all_tools=AsyncMock(return_value=[]))
|
||||
|
||||
return SimpleNamespace(
|
||||
sess_mgr=SimpleNamespace(
|
||||
get_session=AsyncMock(return_value=session),
|
||||
get_conversation=AsyncMock(return_value=conversation),
|
||||
),
|
||||
model_mgr=SimpleNamespace(get_model_by_uuid=AsyncMock(return_value=model)),
|
||||
tool_mgr=tool_mgr,
|
||||
plugin_connector=SimpleNamespace(
|
||||
emit_event=AsyncMock(
|
||||
return_value=SimpleNamespace(
|
||||
event=SimpleNamespace(
|
||||
default_prompt=conversation.prompt.messages.copy(),
|
||||
prompt=conversation.messages.copy(),
|
||||
)
|
||||
)
|
||||
)
|
||||
),
|
||||
pipeline_service=SimpleNamespace(
|
||||
get_pipeline=AsyncMock(return_value={'extensions_preferences': {'enable_all_skills': True}})
|
||||
),
|
||||
skill_mgr=SimpleNamespace(
|
||||
build_skill_aware_prompt_addition=Mock(return_value=''),
|
||||
skills={},
|
||||
),
|
||||
skill_service=skill_service,
|
||||
logger=Mock(),
|
||||
)
|
||||
|
||||
|
||||
def _import_preproc_modules():
|
||||
fake_app_module = types.ModuleType('langbot.pkg.core.app')
|
||||
fake_app_module.Application = object
|
||||
sys.modules['langbot.pkg.core.app'] = fake_app_module
|
||||
|
||||
for module_name in (
|
||||
'langbot.pkg.pipeline.preproc.preproc',
|
||||
'langbot.pkg.pipeline.stage',
|
||||
):
|
||||
sys.modules.pop(module_name, None)
|
||||
|
||||
preproc_module = importlib.import_module('langbot.pkg.pipeline.preproc.preproc')
|
||||
entities_module = importlib.import_module('langbot.pkg.pipeline.entities')
|
||||
return preproc_module, entities_module
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_preproc_enables_skill_authoring_tools_when_skill_service_available():
|
||||
preproc_module, entities_module = _import_preproc_modules()
|
||||
|
||||
app = _make_app(skill_service=SimpleNamespace())
|
||||
stage = preproc_module.PreProcessor(app)
|
||||
|
||||
result = await stage.process(_make_query(), 'PreProcessor')
|
||||
|
||||
assert result.result_type == entities_module.ResultType.CONTINUE
|
||||
app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=True)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_preproc_disables_skill_authoring_tools_when_skill_service_missing():
|
||||
preproc_module, entities_module = _import_preproc_modules()
|
||||
|
||||
app = _make_app(skill_service=None)
|
||||
stage = preproc_module.PreProcessor(app)
|
||||
|
||||
result = await stage.process(_make_query(), 'PreProcessor')
|
||||
|
||||
assert result.result_type == entities_module.ResultType.CONTINUE
|
||||
app.tool_mgr.get_all_tools.assert_awaited_once_with(None, None, include_skill_authoring=False)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_preproc_injects_skill_index_into_system_prompt():
|
||||
"""The Tool Call activation pattern still needs the LLM to know which
|
||||
skills exist. PreProcessor must append the SkillManager's index
|
||||
addendum to the first system message."""
|
||||
preproc_module, entities_module = _import_preproc_modules()
|
||||
|
||||
app = _make_app(skill_service=SimpleNamespace())
|
||||
addendum = '\n\nAvailable Skills:\n- demo (demo): Demo skill.\n\nCall activate ...'
|
||||
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value=addendum)
|
||||
|
||||
query = _make_query()
|
||||
result = await stage_process_capture(preproc_module, app, query)
|
||||
|
||||
assert result.result_type == entities_module.ResultType.CONTINUE
|
||||
app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=None)
|
||||
head = query.prompt.messages[0]
|
||||
assert head.role == 'system'
|
||||
assert head.content.endswith(addendum)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_preproc_respects_pipeline_bound_skills_subset():
|
||||
"""When ``enable_all_skills`` is false the bound list is passed through
|
||||
so the addendum only mentions skills allowed for this pipeline."""
|
||||
preproc_module, entities_module = _import_preproc_modules()
|
||||
|
||||
app = _make_app(skill_service=SimpleNamespace())
|
||||
app.pipeline_service.get_pipeline = AsyncMock(
|
||||
return_value={
|
||||
'extensions_preferences': {
|
||||
'enable_all_skills': False,
|
||||
'skills': ['only-this'],
|
||||
}
|
||||
}
|
||||
)
|
||||
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
|
||||
|
||||
query = _make_query()
|
||||
result = await stage_process_capture(preproc_module, app, query)
|
||||
|
||||
assert result.result_type == entities_module.ResultType.CONTINUE
|
||||
app.skill_mgr.build_skill_aware_prompt_addition.assert_called_once_with(bound_skills=['only-this'])
|
||||
assert query.variables.get('_pipeline_bound_skills') == ['only-this']
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_preproc_skips_injection_when_addendum_is_empty():
|
||||
"""No visible skills → system prompt is left untouched (no
|
||||
``Available Skills`` block appended)."""
|
||||
preproc_module, entities_module = _import_preproc_modules()
|
||||
|
||||
app = _make_app(skill_service=SimpleNamespace())
|
||||
app.skill_mgr.build_skill_aware_prompt_addition = Mock(return_value='')
|
||||
|
||||
query = _make_query()
|
||||
result = await stage_process_capture(preproc_module, app, query)
|
||||
|
||||
assert result.result_type == entities_module.ResultType.CONTINUE
|
||||
if query.prompt and query.prompt.messages:
|
||||
assert 'Available Skills' not in (query.prompt.messages[0].content or '')
|
||||
|
||||
|
||||
async def stage_process_capture(preproc_module, app, query):
|
||||
"""Run PreProcessor.process and return the result while keeping ``query``
|
||||
accessible to the assertions (process mutates query in place)."""
|
||||
stage = preproc_module.PreProcessor(app)
|
||||
return await stage.process(query, 'PreProcessor')
|
||||
89
tests/unit_tests/test_skill_service.py
Normal file
89
tests/unit_tests/test_skill_service.py
Normal file
@@ -0,0 +1,89 @@
|
||||
from types import SimpleNamespace
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import pytest
|
||||
|
||||
from langbot.pkg.api.http.service.skill import SkillService
|
||||
|
||||
|
||||
class TestRequireBoxForWrite:
|
||||
"""Box is the only source of truth for skills — there is no local
|
||||
filesystem fallback. Every write and (most) read methods refuse cleanly
|
||||
when the Box runtime is disabled, unreachable, or simply not installed."""
|
||||
|
||||
def _ap_with_disabled_box(self):
|
||||
return SimpleNamespace(
|
||||
skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
|
||||
box_service=SimpleNamespace(
|
||||
available=False,
|
||||
enabled=False,
|
||||
_connector_error='Box runtime is disabled in config (box.enabled = false)',
|
||||
),
|
||||
)
|
||||
|
||||
def _ap_with_failed_box(self):
|
||||
return SimpleNamespace(
|
||||
skill_mgr=SimpleNamespace(reload_skills=AsyncMock()),
|
||||
box_service=SimpleNamespace(
|
||||
available=False,
|
||||
enabled=True,
|
||||
_connector_error='docker daemon not running',
|
||||
),
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_skill_refused_when_box_disabled(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='disabled in config'):
|
||||
await service.create_skill({'name': 'x'})
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_skill_refused_when_box_failed(self):
|
||||
service = SkillService(self._ap_with_failed_box())
|
||||
with pytest.raises(ValueError, match='docker daemon not running'):
|
||||
await service.create_skill({'name': 'x'})
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_update_skill_refused_when_box_disabled(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='Editing a skill requires the Box runtime'):
|
||||
await service.update_skill('x', {})
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_write_skill_file_refused_when_box_disabled(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='Editing skill files requires the Box runtime'):
|
||||
await service.write_skill_file('x', 'a.txt', 'hi')
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_install_from_github_refused_when_box_disabled(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='Installing a skill from GitHub'):
|
||||
await service.install_from_github({'owner': 'o', 'repo': 'r', 'asset_url': 'https://example/x.zip'})
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_install_from_zip_upload_refused_when_box_disabled(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='Installing a skill from upload'):
|
||||
await service.install_from_zip_upload(file_bytes=b'', filename='x.zip')
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_create_skill_refused_when_box_service_missing_entirely(self):
|
||||
"""No ap.box_service attribute at all (truly minimal setup):
|
||||
Box is the only source of truth, so creation must still refuse."""
|
||||
service = SkillService(SimpleNamespace(skill_mgr=SimpleNamespace(reload_skills=AsyncMock())))
|
||||
with pytest.raises(ValueError, match='not initialised'):
|
||||
await service.create_skill({'name': 'x'})
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_skills_returns_empty_when_box_unavailable(self):
|
||||
"""list_skills should render an empty surface (not crash) so the
|
||||
skills page can show a banner instead of a broken state."""
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
assert await service.list_skills() == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_read_skill_file_refused_when_box_unavailable(self):
|
||||
service = SkillService(self._ap_with_disabled_box())
|
||||
with pytest.raises(ValueError, match='Reading a skill file'):
|
||||
await service.read_skill_file('x', 'a.txt')
|
||||
@@ -11,7 +11,6 @@ Uses tmp_path for file system isolation where applicable.
|
||||
|
||||
import os
|
||||
import pytest
|
||||
from unittest.mock import patch
|
||||
|
||||
|
||||
class TestCheckIfSourceInstall:
|
||||
@@ -19,7 +18,7 @@ class TestCheckIfSourceInstall:
|
||||
|
||||
def test_returns_true_for_source_install(self, tmp_path, monkeypatch):
|
||||
"""Should return True when main.py with LangBot marker exists."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n# This is the entry point')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
@@ -33,52 +32,14 @@ class TestCheckIfSourceInstall:
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_returns_false_when_no_main_py(self, tmp_path, monkeypatch):
|
||||
"""Should return False when main.py doesn't exist."""
|
||||
monkeypatch.chdir(tmp_path)
|
||||
|
||||
from langbot.pkg.utils import paths
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths._check_if_source_install()
|
||||
assert result is False
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_returns_false_when_main_py_without_marker(self, tmp_path, monkeypatch):
|
||||
"""Should return False when main.py exists but lacks LangBot marker."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py.write_text('# Some other project\nprint("hello")')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
|
||||
from langbot.pkg.utils import paths
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths._check_if_source_install()
|
||||
assert result is False
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_handles_io_error_gracefully(self, tmp_path, monkeypatch):
|
||||
"""Should return False when main.py cannot be read."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
|
||||
from langbot.pkg.utils import paths
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
# Patch open to raise IOError
|
||||
with patch("builtins.open", side_effect=IOError("Cannot read")):
|
||||
result = paths._check_if_source_install()
|
||||
assert result is False
|
||||
|
||||
paths._is_source_install = None
|
||||
# Note: ``_check_if_source_install`` was refactored to walk
|
||||
# ``Path(__file__).resolve().parents`` looking for ``pyproject.toml`` +
|
||||
# ``main.py`` instead of relying on the cwd. That makes it robust to where
|
||||
# the process is launched from but also means the old "cwd doesn't have
|
||||
# main.py" / "main.py without marker" / "IOError on read" cases no longer
|
||||
# apply — there's no file read at all. The corresponding negative tests
|
||||
# were removed; ``test_returns_true_for_source_install`` still exercises
|
||||
# the positive path because the repo checkout itself is a source install.
|
||||
|
||||
|
||||
class TestGetFrontendPath:
|
||||
@@ -92,16 +53,16 @@ class TestGetFrontendPath:
|
||||
|
||||
result = paths.get_frontend_path()
|
||||
# The result should contain web/dist or be an absolute path to it
|
||||
assert "web/dist" in result or result.endswith("dist")
|
||||
assert 'web/dist' in result or result.endswith('dist')
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_finds_dist_directory_in_source_mode(self, tmp_path, monkeypatch):
|
||||
"""Should find web/dist when running from source mode."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
web_dist = tmp_path / "web" / "dist"
|
||||
web_dist = tmp_path / 'web' / 'dist'
|
||||
web_dist.mkdir(parents=True)
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
@@ -111,18 +72,18 @@ class TestGetFrontendPath:
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths.get_frontend_path()
|
||||
assert result == "web/dist"
|
||||
assert result == 'web/dist'
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_prefers_dist_over_out_in_source_mode(self, tmp_path, monkeypatch):
|
||||
"""Should prefer web/dist over web/out when both exist in source mode."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
web_dist = tmp_path / "web" / "dist"
|
||||
web_dist = tmp_path / 'web' / 'dist'
|
||||
web_dist.mkdir(parents=True)
|
||||
web_out = tmp_path / "web" / "out"
|
||||
web_out = tmp_path / 'web' / 'out'
|
||||
web_out.mkdir(parents=True)
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
@@ -132,7 +93,7 @@ class TestGetFrontendPath:
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths.get_frontend_path()
|
||||
assert result == "web/dist"
|
||||
assert result == 'web/dist'
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
@@ -148,19 +109,19 @@ class TestGetResourcePath:
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths.get_resource_path("nonexistent/file.txt")
|
||||
assert result == "nonexistent/file.txt"
|
||||
result = paths.get_resource_path('nonexistent/file.txt')
|
||||
assert result == 'nonexistent/file.txt'
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_finds_resource_in_current_directory_source_mode(self, tmp_path, monkeypatch):
|
||||
"""Should find resource in current directory when in source mode."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
resource_file = tmp_path / "templates" / "config.yaml"
|
||||
resource_file = tmp_path / 'templates' / 'config.yaml'
|
||||
resource_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
resource_file.write_text("test: value")
|
||||
resource_file.write_text('test: value')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
|
||||
@@ -168,18 +129,18 @@ class TestGetResourcePath:
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths.get_resource_path("templates/config.yaml")
|
||||
result = paths.get_resource_path('templates/config.yaml')
|
||||
assert os.path.exists(result)
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
def test_returns_relative_path_in_source_mode(self, tmp_path, monkeypatch):
|
||||
"""Should return relative path if resource exists in source mode."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
resource_file = tmp_path / "test_resource.txt"
|
||||
resource_file.write_text("test content")
|
||||
resource_file = tmp_path / 'test_resource.txt'
|
||||
resource_file.write_text('test content')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
|
||||
@@ -187,8 +148,8 @@ class TestGetResourcePath:
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
result = paths.get_resource_path("test_resource.txt")
|
||||
assert result == "test_resource.txt"
|
||||
result = paths.get_resource_path('test_resource.txt')
|
||||
assert result == 'test_resource.txt'
|
||||
|
||||
paths._is_source_install = None
|
||||
|
||||
@@ -198,7 +159,7 @@ class TestPathFunctionsCaching:
|
||||
|
||||
def test_source_install_cache_is_used(self, tmp_path, monkeypatch):
|
||||
"""_check_if_source_install should use cached result."""
|
||||
main_py = tmp_path / "main.py"
|
||||
main_py = tmp_path / 'main.py'
|
||||
main_py.write_text('# LangBot/main.py\n')
|
||||
|
||||
monkeypatch.chdir(tmp_path)
|
||||
@@ -219,5 +180,5 @@ class TestPathFunctionsCaching:
|
||||
paths._is_source_install = None
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
if __name__ == '__main__':
|
||||
pytest.main([__file__, '-v'])
|
||||
|
||||
@@ -1,136 +0,0 @@
|
||||
"""
|
||||
Unit tests for version utility functions.
|
||||
|
||||
Tests version comparison logic without network calls.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import Mock
|
||||
|
||||
from langbot.pkg.utils.version import VersionManager
|
||||
|
||||
|
||||
class TestVersionComparison:
|
||||
"""Tests for version comparison functions."""
|
||||
|
||||
def _create_version_manager(self):
|
||||
"""Create a VersionManager with mock app."""
|
||||
mock_app = Mock()
|
||||
mock_app.proxy_mgr = Mock()
|
||||
mock_app.proxy_mgr.get_forward_providers = Mock(return_value={})
|
||||
mock_app.logger = Mock()
|
||||
return VersionManager(mock_app)
|
||||
|
||||
def test_is_newer_same_version(self):
|
||||
"""is_newer returns False for same version."""
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.0.0', 'v1.0.0')
|
||||
assert result is False
|
||||
|
||||
def test_is_newer_different_major_version(self):
|
||||
"""is_newer returns False for different major version."""
|
||||
# Note: is_newer ignores major version changes
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v2.0.0', 'v1.0.0')
|
||||
assert result is False
|
||||
|
||||
def test_is_newer_minor_update(self):
|
||||
"""is_newer returns True for minor update within same major."""
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.1.0', 'v1.0.0')
|
||||
assert result is True
|
||||
|
||||
def test_is_newer_patch_update(self):
|
||||
"""is_newer returns True for patch update within same major."""
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.0.1', 'v1.0.0')
|
||||
assert result is True
|
||||
|
||||
def test_is_newer_with_fourth_segment(self):
|
||||
"""is_newer ignores fourth version segment."""
|
||||
# Both have same first 3 segments
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.0.0.1', 'v1.0.0.0')
|
||||
assert result is False
|
||||
|
||||
def test_is_newer_short_version(self):
|
||||
"""is_newer handles short version numbers."""
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.0', 'v1.0')
|
||||
assert result is False
|
||||
|
||||
def test_is_newer_older_version(self):
|
||||
"""is_newer returns True when new > old."""
|
||||
vm = self._create_version_manager()
|
||||
result = vm.is_newer('v1.2.0', 'v1.1.0')
|
||||
assert result is True
|
||||
|
||||
|
||||
class TestCompareVersionStr:
|
||||
"""Tests for compare_version_str static method."""
|
||||
|
||||
def test_compare_equal_versions(self):
|
||||
"""Equal versions return 0."""
|
||||
result = VersionManager.compare_version_str('v1.0.0', 'v1.0.0')
|
||||
assert result == 0
|
||||
|
||||
def test_compare_without_v_prefix(self):
|
||||
"""Versions without v prefix work the same."""
|
||||
result = VersionManager.compare_version_str('1.0.0', '1.0.0')
|
||||
assert result == 0
|
||||
|
||||
def test_compare_mixed_prefix(self):
|
||||
"""Mixed v prefix works correctly."""
|
||||
result = VersionManager.compare_version_str('v1.0.0', '1.0.0')
|
||||
assert result == 0
|
||||
|
||||
def test_compare_first_greater(self):
|
||||
"""First version greater returns 1."""
|
||||
result = VersionManager.compare_version_str('v1.1.0', 'v1.0.0')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_first_smaller(self):
|
||||
"""First version smaller returns -1."""
|
||||
result = VersionManager.compare_version_str('v1.0.0', 'v1.1.0')
|
||||
assert result == -1
|
||||
|
||||
def test_compare_different_lengths(self):
|
||||
"""Different length versions are padded with zeros."""
|
||||
result = VersionManager.compare_version_str('v1.0', 'v1.0.0')
|
||||
assert result == 0
|
||||
|
||||
def test_compare_shorter_greater(self):
|
||||
"""Shorter version padded, first still greater."""
|
||||
result = VersionManager.compare_version_str('v1.1', 'v1.0.0')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_longer_greater(self):
|
||||
"""Longer version, first smaller."""
|
||||
result = VersionManager.compare_version_str('v1.0', 'v1.0.1')
|
||||
assert result == -1
|
||||
|
||||
def test_compare_major_version(self):
|
||||
"""Major version comparison."""
|
||||
result = VersionManager.compare_version_str('v2.0.0', 'v1.9.9')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_minor_version(self):
|
||||
"""Minor version comparison."""
|
||||
result = VersionManager.compare_version_str('v1.5.0', 'v1.4.9')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_patch_version(self):
|
||||
"""Patch version comparison."""
|
||||
result = VersionManager.compare_version_str('v1.0.1', 'v1.0.0')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_four_segments(self):
|
||||
"""Four segment version comparison."""
|
||||
result = VersionManager.compare_version_str('v1.0.0.1', 'v1.0.0.0')
|
||||
assert result == 1
|
||||
|
||||
def test_compare_long_versions(self):
|
||||
"""Long version strings work correctly."""
|
||||
result = VersionManager.compare_version_str('v1.2.3.4.5', 'v1.2.3.4.4')
|
||||
assert result == 1
|
||||
Reference in New Issue
Block a user