mirror of
https://github.com/langbot-app/LangBot.git
synced 2026-06-03 20:44:36 +00:00
f2153f736cb8838b7c7ecd6a5064dcdc4cd5defd
86 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f2153f736c | refactor(agent-runner): align config with agent semantics | ||
|
|
d0383e146e | refactor(agent-runner): remove host context windowing | ||
|
|
afaf09ccc7 | feat(agent-runner): normalize binding config boundaries | ||
|
|
f7775a8ed7 | fix: enforce agent run API permissions | ||
|
|
d0aa6eb7f2 | fix(agent-runner): authorize external runner tools | ||
|
|
0705680ed7 | docs(agent-runner): align runner protocol boundaries | ||
|
|
a25d9e0ef2 | fix(agent-runner): stabilize event context and streams | ||
|
|
2ee4880ff6 | refactor(agent-runner): tighten protocol v1 runtime boundaries | ||
|
|
69c4749e84 | feat(agent-runner): align protocol adapter terminology | ||
|
|
a0d15ea054 |
feat(agent-runner): route pipeline runs through event-first flow
- run_from_query() now delegates to run(event, binding) instead of maintaining a separate legacy execution path - Pipeline Query is converted to AgentEventEnvelope via PipelineCompatAdapter - Pipeline config is converted to AgentBinding with StatePolicy - bound_plugins authorization preserved from Pipeline - Legacy compatibility fields preserved: - query_id → context.runtime.query_id → session registry - prompt → context.compatibility.extra.prompt (not top-level) - params → context.compatibility.extra.params (with proper filtering) - max-round → bootstrap.messages and compatibility.legacy_messages - Pipeline path gains event-first host capabilities: - EventLog and Transcript writing - ArtifactStore registration - PersistentStateStore for state.updated - Removed legacy handlers: - _handle_artifact_created_query() (replaced by _handle_artifact_created) - _handle_state_updated() (replaced by _handle_state_updated_event) This change unifies the execution path while preserving backward compatibility for Pipeline-based runners. EventGateway is not implemented in this branch; only the event-first entry point is reserved. |
||
|
|
fa6b40a82b | feat(agent-runner): add persistent state APIs | ||
|
|
ad6bf5b478 | feat(agent-runner): scope event-first state by binding | ||
|
|
92d28bfcb0 | feat(agent-runner): persist created artifacts | ||
|
|
6fc93235f7 | feat(agent-runner): add artifact store pull APIs | ||
|
|
bf73414884 |
feat(agent-runner): add event-first context facts and pull APIs
Add EventLog and Transcript persistence entities for storing auditable event facts and conversation history projection. Implement event-first AgentRunContext builder that produces Protocol v1 compliant context payloads with required fields: event, delivery, context (ContextAccess). Key changes: - EventLog ORM: auditable event records with indexes - Transcript ORM: conversation history projection with composite indexes - AgentRunContextBuilder: Protocol v1 payload with delivery, context, bootstrap - EventLogStore/TranscriptStore: async stores for fact sources - Host action handlers: HISTORY_PAGE, HISTORY_SEARCH, EVENT_GET, EVENT_PAGE - Context validation: build_context output validates via SDK AgentRunContext - Alembic migration for event_log and transcript tables - Alembic env.py imports all ORM models for autogenerate discovery Legacy compatibility: max-round messages go into bootstrap.messages and compatibility.legacy_messages, not top-level messages field. |
||
|
|
bc4610a2a9 | fix(agent-runner): package context for plugin execution | ||
|
|
be8d30894a | feat: make agent runner config schema driven | ||
|
|
0a4009d14c | chore(agent): remove v1 wording from runner internals | ||
|
|
5b45c867b4 | feat(agent): reserve stable runner event names | ||
|
|
4e0a670fe2 | feat(agent-runner): enrich plugin runner host context | ||
|
|
c286f97b26 | fix: log agent runner best-effort failures | ||
|
|
95b3ab036c | test: address agent runner review comments | ||
|
|
711f12d71f | feat: support dynamic agent runner defaults | ||
|
|
d9280f64eb |
feat(plugin): implement INVOKE_RERANK handler with run-scoped authorization
- Add invoke_rerank action handler in plugin handler - Validate rerank model access via run session - Cap documents at 64 for API limit - Return sorted results by relevance score |
||
|
|
dc82fb584a |
perf(agent-runner): improve session registry and orchestrator efficiency
- Add pre-computed _authorized_ids (frozenset) at session registration for O(1) lookup - Refactor is_resource_allowed() from linear search to set membership check - Add thread-safe locking to get_session_registry() singleton - Cache _session_registry and _state_store references in orchestrator __init__ - Add asyncio.gather() for parallel resource building in AgentResourceBuilder - Create shared test fixtures in tests/unit_tests/agent/conftest.py - Update test files to import from shared conftest.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
d6b8f48e73 |
feat(agent-runner): integrate AgentRunner Protocol v1 with plugin system
Phase 0 integration complete - verified minimal loop with local-agent stub runner. Changes: - Add AgentRunOrchestrator for plugin-based agent execution - Add AgentResultNormalizer for Protocol v1 result conversion - Add AgentRunnerDescriptor for runner ID parsing (plugin:author/name/runner) - Update chat handler to use new orchestrator instead of direct runner lookup - Add plugin handler methods for list_agent_runners and run_agent - Add connector methods for AgentRunner protocol forwarding - Update pipeline API to include runner options in metadata - Add integration docs and implementation plan Integration verified: - Runner: plugin:langbot/local-agent/default - Input: "你好" - Output: [stub] Echo: 你好 - Date: 2026-05-10 10:09 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
96b041846d |
Feat/sandbox (#2072)
* feat: add mcp and skills
* feat: add filter
* feat: modify frontend
* feat(box): add sandbox_exec tool loop for local-agent calculations
* feat(box): add host workspace mounting and sandbox_exec guidance
* feat(box): add BoxProfile with resource limits and improved output truncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
* feat(box): add obs
* refactor(box): unify box service lifecycle and local runtime
management
* refactor(box): remove legacy in-process runtime code and clean up smells
After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:
- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
`box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
(auto-detected by code).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add test
* fix: fix box intergration test
* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
* refactor: use rpc
* fix: import
* refactor(box): clean up sandbox subsystem code quality and efficiency
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
* fix: ruff
* refactor(sandbox): keep box logic out of pipeline and localagent
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
* fix: ruff
* refactor(box): move box runtime to langbot-plugin-sdk
Extract self-contained box runtime modules (actions, backend, client,
errors, models, runtime, security, server) to langbot-plugin-sdk and
update all imports to use `langbot_plugin.box.*`. Keep only service
and
connector in LangBot core as they depend on the Application context.
- Update docker-compose to use `langbot_plugin.box.server` entry
point
- Update pyproject.toml to use local SDK via `tool.uv.sources`
- Remove migrated source files and their unit/integration tests
- Update remaining test imports to match new module paths
* fix: ruff
* feat: enhance sandbox api
* refactor(box): derive paths from shared host root
* fix(box): tighten sandbox exposure and restore box integration coverage
* refactor(types): remove quoted annotations under postponed evaluation
* feat(box): unify native agent tools around exec/read/write/edit
* chore(sandbox): move MCP loader changes to follow-up branch
* feat(box): add session workspace quota enforcement and SDK quota metadata
* feat(skills): add Agent Skills management system (#1917)
* feat(skills): add Agent Skills management system
Implement comprehensive skills management feature inspired by agentskills spec:
Backend:
- Add Skill and SkillPipelineBinding database entities
- Add database migration (dbm018) for skills tables
- Implement SkillManager for skill loading, matching, and resolution
- Implement SkillService for CRUD operations
- Add skills API endpoints for skill and pipeline binding management
- Integrate skill index injection into pipeline preprocessor
- Add skill activation detection in LocalAgentRunner
Frontend:
- Add Skills page with listing, search, and type filter
- Add SkillDetailDialog for create/edit with preview
- Add SkillCard and SkillForm components
- Add skills API methods to BackendClient
- Add skills entry to sidebar navigation
- Add i18n translations (en-US, zh-Hans)
Features:
- Support skill and workflow types
- Sub-skill composition via {{INVOKE_SKILL: name}} syntax
- Progressive disclosure (index in prompt, full instructions on activation)
- Pipeline-specific skill bindings with priority
* fix: resolve cherry-pick conflicts for agentskills onto sandbox
- Remove non-existent external_kb service import
- Add skill_mgr mock to localagent sandbox_exec tests
- Keep database version at 24 (sandbox branch's latest)
* feat(skills): upgrade to package-backed skills with sandbox execution
Evolve the skills system from pure prompt-based to package-backed with
sandbox tool execution support:
- Add source_type/package_root/entry_file/skill_tools fields to Skill entity
- SkillManager loads SKILL.md from local package directories
- SkillToolLoader as 4th dispatch layer in ToolManager (query-scoped)
- LocalAgent injects skill tools into use_funcs on skill activation
- BoxService.execute_skill_tool() runs scripts in sandbox (ro mount, env params)
- Skill tool names auto-namespaced as skill__{skill}__{tool}
- API validation for package_root allowlist and entry path traversal
- Frontend source_type toggle, package_root input, skill_tools editor
- Migration renumbered to 025 with ALTER TABLE fallback for existing DBs
- Fix unclosed limitation section in i18n files
- Fix skills API methods misplaced outside BackendClient class
* fix: test info
* feat(skills): switch skills to package-backed storage and add import tooling
- skills 从 inline/package 双轨收敛成 package-first
- instructions 改为写入并读取 SKILL.md
- 新增本地目录扫描和 GitHub 安装 skill
- 前端把 skills 整合进 plugins 页,新增 SkillsComponent 和 GitHub 导入弹窗
- skill form 去掉 source_type / type 筛选,改成目录扫描驱动
- Box skill tool 挂载模式从 ro 改成 rw
- 测试和中英文文案同步更新
* feat: simplify langbot skill create and import
* refactor(skills): clean up legacy skill API and harden activation flow
* refactor(skills): remove skill dependency expansion and add skill_get
* fix: lint
* fix: delete
* fix(skills): align tool manager loader initialization
* refactor: remove sandbox execute skill
* fix(skills): hide activation markers and isolate skill activation flow
* refactor(skills): switch skill model to filesystem-backed packages
* refactor(skills): switch skill model to filesystem-backed packages
* refactor(skills): unify runtime skill access around filesystem paths
* refactor(skills): unify runtime skill access around filesystem paths
* feat(skills): align rw package design and fix skill activation, visibility, and lint issues
* refactor(skills): replace rich authoring API with import/reload flow and update
Box design doc
* feat(box): add sandbox_exec tool loop for local-agent calculations
* feat(box): add host workspace mounting and sandbox_exec guidance
* feat(box): add BoxProfile with resource limits and improved output truncation
- Implement head+tail output truncation (60/40 split) so LLM sees both
beginning and final results; add streaming byte-limited reads in backend
to prevent unbounded memory usage (_MAX_RAW_OUTPUT_BYTES = 1MB)
- Define BoxProfile model with locked fields and max_timeout_sec clamping
- Add four built-in profiles: default, offline_readonly, network_basic,
network_extended with differentiated resource and security constraints
- Add resource limit fields to BoxSpec (cpus, memory_mb, pids_limit,
read_only_rootfs) and pass corresponding container CLI flags
(--cpus, --memory, --pids-limit, --read-only, --tmpfs)
- Profile loaded from config (box.profile), applied in service layer
before BoxSpec validation; locked fields cannot be overridden by
tool-call parameters
* feat(box): add obs
* refactor(box): unify box service lifecycle and local runtime
management
* refactor(box): remove legacy in-process runtime code and clean up smells
After the architecture settled on always using an independent Box Runtime
service, several pieces of compatibility code and design shortcuts were
left behind. This commit cleans them up:
- Remove `LocalBoxRuntimeClient` and `create_box_runtime_client` from
production code (moved to test-only helper).
- Remove unused `_clip_bytes` method from backend.
- Remove `__langbot_session_placeholder__` hack by making `BoxSpec.cmd`
default to empty and validating non-empty only in `runtime.execute()`.
- Extract `get_box_config()` helper to eliminate 5× duplicated config
access boilerplate.
- Remove `session_id`/`host_path`/`host_path_mode` from the LLM-facing
tool schema to enforce request-scoped session isolation.
- Fix dual shutdown path: `NativeToolLoader.shutdown()` no longer calls
`box_service.shutdown()` (handled by `Application.dispose()`).
- Simplify `_assert_session_compatible` with a loop.
- Inline client creation in `BoxRuntimeConnector`.
- Remove redundant `BOX__RUNTIME_URL` env var from docker-compose
(auto-detected by code).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(box/mcp): integrate MCP stdio with Box sandbox — auto-isolation, dep install, security
## Summary
When Podman/Docker is available, all stdio-mode MCP servers now automatically
run inside Box containers with dependency installation, path rewriting, and
lifecycle management. When no container runtime exists, LangBot starts normally
and stdio MCP falls back to host-direct execution.
## What changed
### MCP stdio → Box integration (mcp.py)
- Add `MCPServerBoxConfig` pydantic model for structured box configuration
with validation and defaults (network, host_path_mode, timeouts, resources)
- Auto-infer `host_path` from command/args with venv detection: recognizes
`.venv/bin/python` patterns and walks up to the project root
- Rewrite host paths to container `/workspace` paths transparently
- Replace venv python commands with container-native `python`
- Auto-detect `pyproject.toml`/`setup.py`/`requirements.txt` and run
`pip install` inside the container before starting the MCP server
- Copy project to `/tmp` before install to handle read-only mounts
- Add retry with exponential backoff (3 retries, 2s/4s/8s delays)
- Add Box managed process health monitoring (poll every 5s)
- Fix session leak: `_cleanup_box_stdio_session()` now runs in `finally`
block of `_lifecycle_loop`, covering all exit paths
- Fix retry logic: `_ready_event` is only set after all retries exhaust
or on success, not on first failure
- Enhance `get_runtime_info_dict()` with `box_session_id` and `box_enabled`
### Box security (security.py — new)
- `validate_sandbox_security()` blocks dangerous host paths:
`/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`, `/run`,
docker.sock, podman socket
- Called at the start of `CLISandboxBackend.start_session()`
### Box models (models.py)
- Add `BoxHostMountMode.NONE` — skips volume mount entirely
- Adjust `validate_host_mount_consistency` to allow arbitrary workdir
when `host_path_mode=NONE`
### Box backend (backend.py)
- Add `validate_sandbox_security()` call in `start_session()`
- Add `langbot.box.config_hash` label on containers for drift detection
- Handle `BoxHostMountMode.NONE` — skip `-v` mount arg
- Add `cleanup_orphaned_containers()` to base class (no-op default) and
CLI implementation (single batched `rm -f` command)
### Box runtime (runtime.py)
- Call `cleanup_orphaned_containers()` during `initialize()` to remove
lingering containers from previous runs
### Box service (service.py)
- Graceful degradation: `initialize()` catches runtime errors and sets
`available=False` instead of crashing LangBot startup
- Add `available` property and guard on `execute_sandbox_tool()`
- Add `skip_host_mount_validation` parameter to `build_spec()` and
`create_session()` — MCP paths are admin-configured and trusted,
bypassing `allowed_host_mount_roots` restrictions meant for
LLM-generated sandbox_exec commands
### Default behavior
- stdio MCP servers automatically use Box when `box_service.available`
is True (Podman/Docker detected); no explicit `box` config needed
- When no container runtime exists, falls back to host-direct stdio
- MCP Box defaults: `network=on` (for pip install), `read_only_rootfs=false`
(for site-packages), `host_path_mode=ro`, `startup_timeout=120s`
### Tests
- `test_box_security.py`: blocked paths, safe paths, subpath rejection
- `test_mcp_box_integration.py`: config model, path rewriting, venv
unwrap, host_path inference, payload building, runtime info, box
availability check
- `test_box_service.py`: `BoxHostMountMode.NONE` validation tests
* feat(box/mcp): instance-based orphan cleanup, error classification, session API, and integration tests
## Changes
### Precise orphan container cleanup
- Runtime generates a unique instance_id on startup
- Every container gets a `langbot.box.instance_id` label
- `cleanup_orphaned_containers()` only removes containers from
previous instances, preserving containers owned by the current one
- Containers from older versions (no label) are also cleaned up
- `cleanup_orphaned_containers` added to `BaseSandboxBackend` as
a no-op default method, removing hasattr duck-typing
### Fine-grained MCP error classification
- New `MCPSessionErrorPhase` enum with 7 phases: session_create,
dep_install, process_start, relay_connect, mcp_init, runtime,
tool_call
- Each phase in `_init_box_stdio_server()` sets the error phase
before re-raising, enabling precise failure diagnosis
- `retry_count` tracked across retry attempts
- `get_runtime_info_dict()` exposes `error_phase` and `retry_count`
### GET /v1/sessions/{id} API
- `BoxRuntime.get_session()` returns session details including
managed process info when present
- `handle_get_session` HTTP handler + route in server.py
- `BoxRuntimeClient.get_session()` abstract method + remote impl
### stdio defaults to Box when runtime is available
- `_uses_box_stdio()` checks `box_service.available` instead of
requiring explicit `box` key in server_config
- `BoxService.initialize()` catches runtime errors gracefully,
sets `available=False` instead of crashing LangBot startup
- When no container runtime exists, stdio MCP falls back to
host-direct execution
### Code quality (from /simplify review)
- Extracted `_VENV_DIRS` / `_VENV_BIN_DIRS` module-level constants
- Removed dead `_box_network_mode()` method and unused `bc` variable
- Fixed broken import `from ....box.models` → `from ...box.models`
- Cached `_resolve_host_path()` result — computed once, passed through
- Config hash now includes `host_path` field
- Batched orphan cleanup into single `rm -f` command
### Session leak fix
- `_cleanup_box_stdio_session()` now runs in `_lifecycle_loop`'s
finally block, covering all exit paths (normal shutdown, error,
retry, final failure)
### Integration tests
- 6 end-to-end tests covering managed process lifecycle, WebSocket
stdio bidirectional IO, session cleanup verification, single
session query, process exit detection, and orphan cleanup safety
* refactor: use rpc
* fix: import
* refactor(box): clean up sandbox subsystem code quality and efficiency
- Fix O(n²) stderr trimming in runtime.py with running length tracker
- Remove dead code: RESERVED_CONTAINER_PATHS, _subprocess_wait_task,
unused config_hash computation, unused imports
- Deduplicate connection callback in BoxRuntimeConnector, parse URL once
- Use enum comparison instead of stringly-typed spec.network.value check
- Replace manual _result_to_dict/_session_to_dict with model_dump()
- Cache NativeToolLoader tool definition and sandbox system guidance
- Extract _is_path_under() helper to eliminate duplicated path checks
- Import SANDBOX_EXEC_TOOL_NAME from native.py instead of redefining
- Add JSON startswith guard in logging_utils to skip futile json.loads
- Fix ruff lint errors (F401 unused imports, F841 unused variables)
* fix: ruff
* refactor(sandbox): keep box logic out of pipeline and localagent
- Move sandbox system-prompt guidance from LocalAgentRunner into
BoxService.get_system_guidance() so all box domain knowledge stays
in the box module.
- Remove standalone logging_utils.py; merge format_result_log() into
MessageHandler base class alongside cut_str().
- Strip sandbox-specific JSON parsing from log formatting; tool
results now use generic truncation.
- Revert TYPE_CHECKING changes in stage.py and runner.py that were
unrelated to this feature.
- Skip two test files affected by a pre-existing circular import
(runner ↔ app) until the import cycle is resolved in a separate PR.
* refactor(box): move box runtime to langbot-plugin-sdk
Extract self-contained box runtime modules (actions, backend, client,
errors, models, runtime, security, server) to langbot-plugin-sdk and
update all imports to use `langbot_plugin.box.*`. Keep only service
and
connector in LangBot core as they depend on the Application context.
- Update docker-compose to use `langbot_plugin.box.server` entry
point
- Update pyproject.toml to use local SDK via `tool.uv.sources`
- Remove migrated source files and their unit/integration tests
- Update remaining test imports to match new module paths
* fix: ruff
* fix(box): tighten sandbox exposure and restore box integration coverage
* refactor(types): remove quoted annotations under postponed evaluation
* chore(sandbox): move MCP loader changes to follow-up branch
* refactor(plugins): simplify GitHub install flow to default master archive
* revert(api): restore plugin GitHub import flow in plugins controller
* Improve data-root handling and skill install previews
* Add managed skill authoring tools for local agents
* Refactor the skills UI around sidebar detail pages
* Document why managed skill authoring tools bypass box
* fix: lint
* feat(web): refactor plugin/skill install flows and fix skills page
- Fix sidebar skill icon
- Add skills route and error page component
- Refactor plugin GitHub install from dialog modal to inline card
- Add skill install dropdown menu (create/upload/github) in sidebar
- Wire sidebar → skills page communication via pendingSkillInstallAction context
- Add i18n keys for error page and skill install actions
* fix(web): persist sidebar collapsible section open state on navigation
Sections opened via sub-item navigation now retain their expanded state
when the user switches to a different section, instead of collapsing
because the isActive fallback becomes false.
---------
Co-authored-by: youhuanghe <1051233107@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Junyan Qin <rockchinq@gmail.com>
* feat(sandbox): add MCP box integration on top of sandbox base (#2083)
* refactor(mcp): extract box stdio runtime helper
* refactor(box): introduce reusable workspace session helper
* refactor(box): run Box Runtime as subprocess inside LangBot container
Remove the separate langbot_box_runtime Docker service. Box Runtime
now always launches as a local stdio subprocess, regardless of whether
LangBot runs in Docker or not. The WebSocket transport path is kept
only for explicit runtime_url configuration (remote deployment).
This simplifies deployment by eliminating cross-container path mapping
and network hops. Box Runtime is a pure scheduling process (talks to
Docker socket / nsjail), it does not execute user code or touch the
filesystem, so container isolation is unnecessary — unlike Plugin
Runtime.
* fix(web): prevent first-emission snapshot from swallowing unsaved changes in pipeline editor
When switching runner (e.g. local-agent → n8n), the newly mounted stage's
first emit would re-capture the saved snapshot, erasing the dirty state
caused by the runner change. The save button would incorrectly go dim.
- Skip snapshot re-capture in handleDynamicFormEmit when form is already dirty
- Add mount-time emit to N8nAuthFormComponent (matching DynamicFormComponent)
- Use stable onSubmitRef to prevent useEffect subscription churn
- Add previousInitialValues guard to prevent initialValues echo loops
* style(web): align plugin list header button heights
* docs(review): update Box architecture review documents
Replace old review docs with 5 focused documents:
- box-architecture.md: deep architecture analysis (LangBot + SDK)
- box-issues.md: 22 issues rated P0/P1/P2
- box-test-coverage.md: test coverage analysis
- box-tob-analysis.md: toB commercialization analysis
- box-vs-plugin-runtime.md: Box vs Plugin runtime comparison
* feat(web): improve login error layout and add Terms of Service link
- Improve backend connection error display with bordered container,
inline icon, and better visual hierarchy
- Extract actual error message from axios response object
- Add Terms of Service link (https://langbot.app/terms) to login footer
- Add termsOfService i18n key for all 7 locales
* refactor(web): replace all hardcoded SVG icons with lucide-react
Unify icon usage across the entire frontend by replacing 67 hardcoded
SVG icons with lucide-react components across ~25 files. This improves
consistency, maintainability, and reduces bundle duplication.
Key replacements:
- Sidebar nav: Zap, LayoutDashboard, Bot, Workflow, BookMarked, etc.
- MCP forms: Loader2, XCircle, Trash2
- Monitoring: Sparkles, MessageSquare, CheckCircle2, RefreshCw, etc.
- Cards: Clock, Star, Workflow, Hexagon, Puzzle, Github, etc.
- Misc: Paperclip, AudioLines, CloudUpload, Layers, Heart, Smile
Zero hardcoded <svg> tags remain in .tsx files.
* fix(web): stop polling plugin tasks when no active installs
The PluginInstallTaskProvider was unconditionally polling
getAsyncTasks every 3s on all /home/* routes. Now it only
syncs once on mount and starts periodic polling only when
there are active (non-terminal) install tasks.
* fix(deps): update langbot-plugin version and add new dependencies
* refactor: use Space API for release checks and stop idle polling
- version.py: switch release list API from GitHub to space.langbot.app,
remove unused in-place update logic (update_all, compare_version_str),
translate all comments/logs to English
- PluginInstallTaskContext: only poll when active install tasks exist
* feat(box): add --standalone-box flag and 3-way transport decision for Box runtime
Align Box runtime connection logic with Plugin runtime's pattern:
- Docker: WebSocket to langbot_box container (ws://langbot_box:5411)
- --standalone-box: WebSocket to external Box process (ws://localhost:5411)
- Windows: subprocess + WebSocket (workaround for async stdio limitation)
- Unix/macOS: subprocess + stdio pipe (unchanged)
BoxRuntimeConnector now inherits ManagedRuntimeConnector for subprocess
lifecycle reuse. Add langbot_box service to docker-compose.yaml.
* refactor(box): use single port with path-based routing for Box WS
Update connector to use ws://host:5410/rpc/ws instead of ws://host:5411.
Update review docs to reflect the single-port architecture.
* feat(web): show Box runtime status in plugin debug info popover
Add Box status section to the debug info popover on the plugin list page,
displaying connection status, backend info, profile, active sessions,
and recent error count. Fetched from GET /api/v1/box/status in parallel
with plugin debug info. Includes i18n for all 8 supported languages.
* fix(web): remove ephemeral sandbox count from Box status display
The active_sessions count reflects transient sandbox containers that
expire after 5 minutes of inactivity, making it misleading in the UI.
Keep only connection status, backend, profile, and error count.
* feat(box): configurable sandbox scope and unified skill containers
Replace the per-message session_id with a template-based system
configurable per pipeline via 'Sandbox Scope' in the local-agent panel.
Default scope is per-chat ({launcher_type}_{launcher_id}).
Unify skill exec into the same container as default exec — skills are
mounted at /workspace/.skills/{name}/ via extra_mounts instead of
getting separate containers. All pipeline-bound skills are injected
at container creation time.
- Add box-session-id-template to pipeline metadata (select, 4 options, 8 languages)
- Add resolve_box_session_id() and build_skill_extra_mounts() to BoxService
- Rewrite native.py skill exec path to use execute_tool with shared session
- Update tests for new session_id format
- Add design doc: docs/review/box-session-scope.md
* feat(web): show active sandbox details in Box status popover
Display sandbox count and a detailed list of active sessions including
session ID, image, backend, resources (CPU/memory), network mode, and
last used time. Fetched from GET /api/v1/box/sessions in parallel.
Includes i18n for all 8 supported languages.
* feat(box): add startup and availability logging for sandbox tools
Log Box runtime initialization result (success with profile info, or
failure warning). Log native tool availability status at ToolManager
startup so it's immediately clear whether exec/read/write/edit tools
are registered for the LLM.
* feat(box): support custom sandbox container image via config.yaml
Add 'image' field to box config section. When set, it overrides the
profile default image (python:3.11-slim) for all sandbox containers.
Priority: caller-specified > config.yaml image > profile default.
* feat(box): add heartbeat and reconnection for Box runtime connector
Add 20-second heartbeat ping loop to detect silent Box runtime
disconnections. On disconnect, set available=false and attempt
reconnection after 3 seconds via the disconnect callback chain.
- BoxRuntimeConnector: heartbeat loop, disconnect callback parameter,
disconnect detection in connection callback and WS failure handler
- BoxService: wire disconnect callback to toggle available state and
re-initialize the connector on reconnection
* feat(web): move runtime status to dashboard, clean up plugin debug popover
Add SystemStatusCards component to the monitoring dashboard showing
Plugin Runtime and Box Runtime connection status with details (backend,
profile, sandbox count). Remove all Box/session status from the plugin
page debug popover — it now only shows debug URL and key.
Includes i18n for all 8 supported languages.
* refactor(web): compact system status into a single card alongside metrics
Replace the separate two-card row with a single compact 'System Status'
card placed as the 5th column in the metrics grid. Shows green/red dots
for Plugin Runtime and Box Runtime. Click to expand a popover with
connection details (backend, profile, sandbox count).
* feat: show connector error details for Plugin and Box runtime status
Record Box connector error in BoxService and expose it as
'connector_error' in GET /api/v1/box/status when unavailable.
Display error messages in the dashboard System Status popover
for both Plugin Runtime (plugin_connector_error) and Box Runtime
(connector_error) when they are disconnected.
* fix(web): auto-refresh system status and show disconnect errors in real time
Poll Plugin Runtime and Box Runtime status every 30 seconds so the
dashboard reflects disconnections without a manual page refresh.
Also re-fetch when the popover is opened for immediate feedback.
* fix(box): handle RPC failure in get_status/get_sessions gracefully
When the Box runtime disconnects, there is a race between the heartbeat
flipping _available=false and the frontend polling get_status(). If the
poll arrives first, client.get_status() throws a ConnectionClosedError
which propagated as a 500, causing the frontend to show a grey dot
(null status) instead of a red dot with error details.
Now get_status() catches RPC errors and returns available=false with
the exception message as connector_error. get_sessions() returns an
empty list when unavailable or on RPC failure.
* fix(box): add persistent reconnection loop with exponential backoff
The previous disconnect handler only retried once and then gave up.
Now spawns a background task that retries with exponential backoff
(3s, 6s, 12s, ... up to 60s) until the Box runtime is reachable again.
Uses a _reconnecting guard to prevent duplicate loops. Calls
connector.dispose() before each retry to clean up stale tasks.
* fix(box): detect disconnect when handler.run() returns normally
The generic Handler.run() catches ConnectionClosedError and breaks out
of its loop (normal return) instead of raising, because it has no
disconnect_callback. The old code only triggered reconnection in the
except branch, so a clean WebSocket close was never detected.
Now treat handler.run() returning normally (after successful handshake)
as a disconnect event, triggering the reconnection callback.
* fix(web): refresh system status card when clicking Refresh Data button
Pass a refreshKey prop through OverviewCards to SystemStatusCard that
increments on each Refresh Data click, triggering a re-fetch of Plugin
and Box runtime status alongside the monitoring data refresh.
* fix(web): fix system status card stuck in loading state
fetchStatus(showLoading=false) never called setLoading(false), so the
initial loading=true was never cleared. Simplify to always setLoading
in the finally block — the spinner only shows on the very first load
since subsequent fetches complete near-instantly.
* feat(web): show active sandbox details in dashboard Box status popover
Fetch box sessions alongside status and display each active sandbox
in the popover with session ID, image, resources (CPU/memory), and
last used time.
* feat(box): add global sandbox scope option
Add a 'Global (shared by all)' option to the sandbox scope selector.
Uses a constant '{global}' template variable that always resolves to
'global', so all users and chats share one sandbox container.
* refactor(web): replace popover with dialog for system status details
Replace the dropdown popover with a proper Dialog for runtime status
details. Add a small info button on the System Status card that opens
the dialog. Session details now show in a spacious 2-column grid layout
with full image name, backend, CPU/memory, network, mount path, and
created/last-used timestamps.
* fix(web): widen system status dialog and fix scroll border issue
Use max-w-2xl (matching other dialogs) instead of max-w-lg. Move
overflow-y-auto to an inner container with overflow-hidden on
DialogContent to prevent padding bleed at scroll edges.
* feat(web): add tooltips for truncated fields in system status dialog
Wrap session_id, image, and mount path fields with Tooltip components
so hovering over truncated text shows the full value.
* feat: add download button
* feat: successfully install
* feat: delete old filter
* feat: youhua frontend
* fix: align box runtime launch args
* feat: translate
* feat: refactor market
* feat: youhua qianduan
* chore: rename extension zh translation
* feat(extensions): unify extensions endpoint and refresh extensions page UX
- Rename /home/plugins route to /home/extensions and update all sidebar links.
- Add unified GET /api/v1/extensions returning plugins, MCP servers and skills,
sorted by name; replace the three separate frontend fetches with this single call.
- Migrate the extensions page to shadcn primitives (Tabs/Card/Alert/Badge/Skeleton/
Switch/Label) and clean up hardcoded color tokens on the extension card.
- Add a localStorage-persisted "Group by type" switch that, when enabled in the
All Types tab, renders extensions grouped by type with a compact section header.
- Show a spinner while loading and rename the empty-state copy from
"No plugins installed" to "No extensions installed".
- Rename the "格式 / Formats" filter label to "类型 / Types" across all 8 locales.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(extensions): fallback lucide icon when extension icon is missing
Render a tinted lucide icon (Puzzle / Server / Sparkles) on the extension
card when the icon URL is empty or the image fails to load. Picked icons
distinct from EventListener (AudioWaveform) and KnowledgeEngine (Book) to
avoid visual collision with plugin component badges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(sidebar): unify installed-extensions list with plugins, MCP and skills
- Render plugins, MCP servers and skills together under the "Installed
Extensions" sidebar entry, alphabetically sorted to match the list page.
- Resolve per-item routes by extension type (plugin -> /home/extensions,
mcp -> /home/mcp, skill -> /home/skills) and gate the plugin-only hover
context menu on extensionType === 'plugin'.
- Lift the "group by type" toggle into SidebarDataContext (still persisted
in localStorage) so the sidebar groups items with section headers
whenever the list page has the toggle enabled.
- Show lucide fallback icons (Server / Sparkles / Puzzle) tinted in the
LangBot blue for MCP, skill, and missing-icon plugin items, overriding
the SidebarMenuSubButton svg color rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(extensions): mobile-friendly layout for extensions and add-extension pages
- Stack the extensions page header vertically on small screens, let the
filter Tabs scroll horizontally if they overflow, hide the debug
button label below sm and let the install/debug controls wrap.
- Constrain the debug popover and its inputs to the viewport width so
they no longer overflow on phone-sized screens.
- Drop the card grid from a fixed 30rem column to a min(100%, 22rem)
column at base / 28rem at sm, and reduce the gap, so cards render
cleanly at 360px+ widths in both flat and grouped views.
- Make the add-extension header actions wrap on lg- viewports and the
install dialog responsive instead of a hard 500px box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: change ui
* feat: delete version for mcp and skills
* fix: constrain home page content width
* fix: preserve monitoring card borders under sticky filters
* fix(box): restore sandbox config and shared mcp runtime
* fix(box): harden sandbox session isolation
* fix(skill): remove auto activation setting
* feat(skill): align skill system with Claude Code's Tool Call design
- Replace text marker activation with `activate` tool (Tool Call mechanism)
- Replace 7 authoring tools with 2: `activate` + `register_skill`
- Add builtin skills loading from templates/skills/
- Add create-skill as first builtin skill
- Remove SKILL_ACTIVATION_MARKER and text detection methods
- Tool Result returns SKILL.md content (protects KV Cache)
This aligns with Claude Code's progressive disclosure pattern:
- Metadata (name+description) always visible in tool description
- SKILL.md body loaded on activate via Tool Call
- Bundled resources accessible through virtual path mapping
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(tools): add glob and grep native sandbox tools
Add file discovery and content search capabilities to the sandbox:
- glob: Find files by pattern (supports ** recursive matching)
- grep: Search file contents with regex patterns
Both tools respect skill package paths and include safety limits
(max 100 files for glob, max 200 matches for grep).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* feat(skill): add skill file browsing capability
- Add API endpoints for listing/reading/writing skill files
- Add FileTree component in SkillForm for directory browsing
- Users can now view scripts/, references/, assets/ directories
- Files can be selected and edited in the instructions textarea
- Add translations for new file browsing features
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(skill): copy builtin skills to data/skills on startup
- Builtin skills (templates/skills/) are now copied to data/skills/
- Users can view and manage builtin skills in the UI
- Rename SkillAuthoringToolLoader to SkillToolLoader
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(skill): improve file browsing and fix path handling
- Fix nested directory display in skill file tree (preserve root entries)
- Fix file content display when clicking files in skill browser
- Add skill manager and tool manager as proper package modules
- Separate fileContent state to allow editing non-SKILL.md files
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(toolmgr): correct skill_tool_loader attribute name
Rename skill_authoring_tool_loader to skill_tool_loader in execute_func_call
and shutdown methods to match the attribute defined in initialize().
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(native): update tool descriptions to use register_skill
Replace references to removed import_skill_from_directory with
register_skill in exec/write/edit tool descriptions.
* feat(toolmgr): enhance tool initialization with backend availability checks
* refactor: remove unused imports and clean up code in various files
* feat: polish extension detail pages
* feat: persist sidebar list expansion
* fix: refine extension ui and backend errors
* fix: align add extension marketplace ui
* feat: manage skills through box runtime
* feat: support github skill installation
* fix: import github skill directories
* feat: install market extensions from card click
* feat(web): improve skill import flow
* feat: polish extension import flow
* fix(mcp): stabilize shared box managed processes
* fix(web): improve backend retry and sidebar scrolling
* docs(review): refresh box architecture review for feat/sandbox
Sync the docs/review/ suite to the current state of the feat/sandbox branch
(both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review.
- box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b}
config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill
Tool Call activation, shared multi-process MCP container, SkillManager,
BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect
- box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail
image conflict, frontend monitoring card) into a Resolved section; add
new P0 (INIT/backend ordering), P1 (extra_mounts immutability after
container creation), P2 (skill_store test gap, integration tests not in CI)
- box-session-scope.md: add §0 Implementation Status — Phase 1 shipped,
MCP unification landed earlier than originally scoped
- box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC),
add 7 new test files including SDK backend_selection/e2b/skill_store
- box-tob-analysis.md: connection recovery now满足基本要求; add E2B and
backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat
- box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned
with Plugin Runtime; revise remaining gaps (WS auth, shared base class)
* refactor(box): use unified env-override mechanism for box.local config
The box module hand-rolled its own LANGBOT_BOX_LOCAL_* env parsing in two
places (connector._get_box_config and service._local_config), duplicating
logic that LoadConfigStage._apply_env_overrides_to_config already provides
generically via the SECTION__SUBSECTION__KEY convention.
- Drop the bespoke LANGBOT_BOX_LOCAL_* parsing; read box.local straight
from instance_config (the unified BOX__LOCAL__* overrides are already
applied before BoxService initializes)
- Harden _load_allowed_mount_roots to accept a comma-separated string,
since the generic mechanism stores a freshly-created key as a raw
string when config.yaml has no box.local.allowed_mount_roots entry
- docker-compose: rename the langbot container env vars to
BOX__LOCAL__* (the canonical convention); remove them entirely from
the langbot_box container — the Box runtime never reads box.local from
env/config.yaml, it is configured via the INIT RPC action
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: repair stale skill/sandbox tests for feat/sandbox
The skill subsystem moved to Tool-Call activation and a Box-managed
skill store; several tests still asserted removed APIs and a sys.modules
stub leaked across the suite. Full unit suite now green (was 23 failing).
- test_skill_tools: drop TestSkillManagerActivation (text-marker API
removed); rewrite TestSkillActivationHelper around the current
skill.activation.register_activated_skill; replace the CRUD
TestSkillAuthoringToolLoader with TestSkillToolLoader covering the
current activate/register_skill tools and sandbox-availability gating
- test_tool_manager_native: ToolManager attr is skill_tool_loader (not
skill_authoring_tool_loader); native loader now exposes 6 tools
(exec/read/write/edit/glob/grep) and requires initialize() with a
backend-available get_status()
- test_localagent_sandbox_exec: remove obsolete activation-marker
leakage tests and their helper providers
- test_model_service / pipeline conftest: give the mocks skill_mgr=None
so PreProcessor's local-agent skill-binding guard short-circuits
- test_n8nsvapi: stop permanently overwriting sys.modules
('langbot.pkg.provider.runner' etc.); save and restore around the
import so other modules get the real LocalAgentRunner base class
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(tests): run unit tests on every push to feat/** branches
- Add feat/** to push branches so long-lived feature branches are
tested on every push (they accumulate large changes before a PR)
- Drop the push path filter entirely: every push to master/develop/
feat/** now runs the full unit suite (the old 'pkg/**' filter never
matched the real source path 'src/langbot/pkg/**', so backend-only
pushes silently skipped tests)
- Fix the same broken path glob on the pull_request trigger
('pkg/**' -> 'src/langbot/pkg/**')
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skill): harden mount/reload paths and HTTP errors against stale skill cache
The Box backends behave inconsistently when extra_mounts reference a
missing host directory (nsjail aborts the entire sandbox start, Docker
silently creates a root-owned empty dir on the host, E2B silently skips
the upload). The cache in skill_mgr.skills is only refreshed on
in-process mutations, so out-of-band changes — container rebuilds,
manual rm in the box volume, anything the LangBot API didn't drive —
leave a stale skill that later produces one of those bad mount paths.
- box/service.py: build_skill_extra_mounts now filters skills whose
package_root is not isdir on the LangBot-visible filesystem and logs
a warning, instead of passing the bad mount through to the backend
- skill/manager.py: reload_skills (Box path) drops skills whose
package_root is missing on the LangBot-side filesystem before they
reach the in-memory cache, with a summary warning
- api/http/controller/groups/skills.py: file/CRUD handlers now also
catch BoxError (RuntimeError subclass, previously slipping past
``except ValueError`` and surfacing as 500); list/get handlers gain
a try/except so a transient Box RPC failure becomes a clean 400
instead of a stack trace
Tests added for build_skill_extra_mounts (skip missing, skip empty,
no skill manager) and SkillManager.reload_skills (drop missing on Box
path). Full unit suite: 279 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(box): add box.enabled toggle and gate consumers on availability
Make the Box sandbox runtime optional. When ``box.enabled`` is false in
config (or when an enabled Box fails to connect), every dependent feature
degrades to the same disabled-state UX rather than crashing or silently
falling back to less safe code paths.
Backend:
- config.yaml: new top-level ``box.enabled: true`` flag (default true)
- BoxService:
- Read box.enabled on construction
- initialize() short-circuits when disabled — no remote WS connect, no
stdio subprocess fork
- _on_runtime_disconnect is a no-op when disabled (no reconnect loop
on a deliberately-off service)
- get_status() now exposes ``enabled`` so the frontend can tell
"disabled in config" from "configured but failed"
- MCP stdio loader (mcp_stdio.uses_box_stdio): requires box_service to
be available, not just installed
- MCP _init_stdio_python_server: when ap.box_service exists but is
unavailable, refuse the stdio server with an actionable error instead
of silently falling through to host-stdio (which bypasses the sandbox
the operator asked for). Setups without ap.box_service installed at
all keep the legacy host-stdio fallback for pre-Box dev mode
- SkillService._require_box_for_write: refuses create/update/install/
write_skill_file when ap.box_service is installed but unavailable.
Distinguishes disabled vs failed in the error message so the UI can
surface the right hint. Legacy setups (no ap.box_service) keep the
local fallback path — that distinction is what keeps the existing
local-skills tests valid
Tests:
- Box disabled-state behavior (4 cases)
- Skill write refusal in disabled & failed states (7 cases)
- MCP stdio runtime info policy updated to match new refuse-when-down
behavior
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): surface Box disabled/unavailable state across consumers
When Box is disabled in config (``box.enabled = false``) or fails to
connect, every dependent UI surface now degrades visibly:
- ``useBoxStatus`` hook: shared, polled 30s, exposes ``available``,
``disabled`` (config-off) and a single ``hint`` key so callers don't
have to re-derive the three states
- ``BoxUnavailableNotice`` reusable Alert banner driven by that hint
- Dashboard SystemStatusCards: three-state dot + label
(connected / disabled-gray / disconnected-red); disabled state shows
the ``boxDisabled`` hint, failed state continues to show the connector
error. Plugin block kept untouched
- Skills page (create view) and SkillDetailContent (edit view):
Save button disabled and banner inserted above the form when Box is
unavailable — matches the backend gate added in the previous commit
- PipelineExtension skill section: ``enable_all_skills`` switch, Add
Skill button and Remove buttons all gate on Box availability;
banner inline under the section header
- PipelineFormComponent: banner above the ``local-agent`` stage card
when Box is unavailable, since that stage carries the sandbox-bound
``box-session-id-template`` field
- Box status payload type (``ApiRespBoxStatus.enabled``) and 8 locale
files updated with ``boxDisabled`` / ``boxUnavailable`` /
``boxRequiredHint`` strings
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(box): document the box.enabled toggle and gate behavior matrix
- docker-compose: move ``langbot_box`` under compose profiles
(``box`` and ``all``) so ``docker compose up`` no longer requires
the sandbox container. Inline comment explains how to pair the
profile choice with ``box.enabled`` so the langbot service does not
thrash trying to reach a runtime that was never started
- docs/review/box-architecture.md:
- Annotate ``box.enabled`` in the config.yaml example, listing the
exact side effects (no remote/stdio connect; tools/skills/MCP
stdio off; reads still work)
- Replace the bare compose snippet with the actual profile-driven
invocation and the BOX__ENABLED pairing
- New "关闭/连接失败时的行为矩阵" section: a single table mapping
every consumer (native tools, activate/register_skill, stdio MCP,
skill list/CRUD, pipeline AI config, extensions page, dashboard)
to its disabled-state behavior, plus the legacy ``ap.box_service``
distinguisher note
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(pipeline-form): swap Box banner for field-level disable_if + tooltip
The previous commit hard-coded a BoxUnavailableNotice banner above the
``local-agent`` stage card. That works, but it shouts at the user about
every field in that stage when in reality only one field —
``box-session-id-template`` — depends on the sandbox.
Use the dynamic-form schema's existing variable-injection mechanism
(``__system.*`` references via ``systemContext``) and add a sibling to
``show_if``: ``disable_if`` + ``disabled_tooltip``. The field stays
visible, becomes inert, and an info icon next to its label exposes the
reason on hover. The rest of the AI tab is left untouched.
- entities/form/dynamic.ts: extend IDynamicFormItemSchema with
``disable_if: IShowIfCondition`` and ``disabled_tooltip: I18nObject``
- DynamicFormComponent: evaluate disable_if with the same resolver as
show_if; OR the result into isFieldDisabled; render an Info tooltip
trigger next to the label when the condition matches
- ai.yaml metadata: attach disable_if (__system.box_available eq false)
and a localized disabled_tooltip to box-session-id-template
- PipelineFormComponent: drop the BoxUnavailableNotice import and the
per-stage banner; pass ``systemContext={ box_available: boxAvailable }``
only for the local-agent stage so other stages aren't paying the
re-render cost
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp): friendly UI message when stdio MCP refused by Box state
Previously the MCP detail dialog dumped the raw RuntimeError text from
``_init_stdio_python_server`` — English-only, prefixed with "Failed
after 4 attempts", and exposing internal config names. The retry
wrapper also kept retrying a refusal that is deterministically going
to fail again, polluting logs.
Replace the raw text with a structured signal:
- New ``MCPSessionErrorPhase.BOX_UNAVAILABLE`` enum value. The stdio
refusal path sets it before raising and uses a short opaque
discriminator (``box_disabled_in_config`` / ``box_unavailable``) as
the message body — never user-facing
- ``_lifecycle_loop_with_retry`` short-circuits on
``BOX_UNAVAILABLE``: surfaces the error immediately, no retries,
no "Failed after N attempts" prefix. Silences the warning storm
seen during smoke-testing
- ``MCPServerRuntimeInfo`` (TS type) now declares ``error_phase``,
``retry_count``, ``box_session_id``, ``box_enabled`` to match what
the backend already returns in get_runtime_info_dict()
- Both MCP detail forms (``mcp/components/mcp-form/MCPForm.tsx`` and
``plugins/mcp-server/mcp-form/MCPFormDialog.tsx``) detect
``error_phase === 'box_unavailable'`` and render a two-line
localized notice: state line ("Box disabled / unreachable") plus
remediation line ("enable Box or switch to http/sse")
- 8 locale files (en/zh-Hans/zh-Hant/ja/ru/vi/th/es) get
``mcp.boxDisabledStdioRefused``, ``mcp.boxUnavailableStdioRefused``,
``mcp.boxStdioRefusedSuggestion``
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(mcp-web): block stdio MCP creation at the form when Box is unavailable
When Box is disabled in config (``box.enabled = false``) or unreachable,
saving a new MCP server in stdio mode produced one that could never
start — the user would only learn that from the runtime error on the
detail page. Stop the user before they save instead.
Both MCP forms (the page-level ``MCPForm.tsx`` and the older dialog
``MCPFormDialog.tsx``) now:
- Disable the ``stdio`` option in the mode select when Box is
unavailable, with a small "(requires Box)" suffix so the reason is
obvious. Existing stdio configs still display their current value
- Show ``BoxUnavailableNotice`` inline under the mode select when the
currently-selected mode is stdio and Box is unavailable, so editing
a stale stdio config makes the cause visible
- Disable the Save / Submit button while stdio is selected under that
condition. ``MCPForm`` exposes a new ``onSaveBlockedChange`` prop
so the parent ``MCPDetailContent`` can disable both its Submit and
Save buttons. ``MCPFormDialog`` disables its Save button locally
- Refuse the submit handler too (Enter-key path) with a toast carrying
the same i18n message
i18n: ``mcp.boxRequired`` (short tag in the disabled option) and
``mcp.stdioBlockedByBoxToast`` added to all 8 locales.
Backend runtime gate (``_init_stdio_python_server`` refusal +
``BOX_UNAVAILABLE`` error_phase + retry short-circuit) stays in place
as the last line of defence for API bypass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(web): prevent plugin config form overflow
* refactor(skill): remove all local-filesystem fallbacks; Box is the sole source
Skills now flow exclusively through the Box runtime. Every read and write
method funnels through ``_box_service()``; when Box is unavailable
(disabled in config, connection failed, or simply not installed) the
operation either returns an empty surface (``list_skills`` → []) or
raises with a clear ``Box runtime ... not initialised / disabled /
unavailable: ...`` message via the new ``_require_box(action)`` helper.
Why: the legacy local-fallback path scanned ``data/skills/``, but Box
manages its own ``box.local.skills_root`` (default ``data/box/skills/``).
The two diverging directories caused stale / phantom skill lists when
Box flapped, and the local-fallback writes silently bypassed all the
sandboxing the operator had configured.
SkillService (``api/http/service/skill.py``):
- New ``_require_box(action)`` returns the box service or raises a
structured ValueError. ``_require_box_for_write`` kept as alias
- ``list_skills`` → returns [] when Box is down so the UI can render
the disabled banner cleanly
- ``get_skill`` / ``get_skill_by_name`` → return None
- All read-file / write-file / scan-dir / create / update / delete /
install / preview methods → ``_require_box`` then box delegate.
Local fallback bodies (shutil.copytree, tempfile.mkdtemp, preview
pipelines) removed entirely
SkillManager (``pkg/skill/manager.py``):
- ``reload_skills`` returns early with empty cache when Box is down.
data/skills/ discovery loop removed
- ``refresh_skill_from_disk`` now just reports cache presence; the
on-disk re-parse is gone since Box is the only writer
Tests:
- Drop 11 obsolete test_skill_service.py tests that exercised the
removed local-fallback paths (create/install/file/delete/update)
- Add list-empty + read-refused tests; flip the legacy-allow test to
legacy-refuses-too
- Rewrite refresh_skill_from_disk test to match the new behaviour
Several helper methods (_managed_skill_path, _resolve_skill_path,
_preview_skill_candidates, _install_preview_candidates, etc.) are now
unreachable; a follow-up commit will prune them so this diff stays
reviewable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(skill): prune dead local-filesystem helpers left over from Box migration
Follow-up to the Box-only refactor. The previous commit removed the
local-fallback BRANCHES from every public method; this one removes the
HELPERS those branches called, which are now unreachable.
SkillService (service/skill.py): 787 → 449 lines
Removed: scan_directory (sync), _read_skill_package, _write_skill_md,
_resolve_create_field, _managed_skill_path,
_managed_install_root_for_package, _normalize_package_root,
_resolve_skill_path, _find_skill_entry, _discover_skill_directories,
_safe_extract_zip, _extract_uploaded_skill_to_temp,
_download_github_skill_to_temp, _resolve_github_source_root,
_build_preview_target_dir, _preview_skill_candidates,
_select_preview_candidates, _install_preview_candidates,
_preview_source_root, _resolve_installed_skills, plus the
module-level _FRONTMATTER_FIELDS and _build_skill_md.
Kept (still needed by the surviving GitHub-import path):
_download_github_asset, _download_github_skill_directory_as_zip,
_find_github_skill_archive_entry, _copy_github_skill_directory_to_zip,
_is_github_skill_md_url, _parse_github_skill_md_url,
_resolve_github_skill_md_package_name, _validate_github_asset_url,
_uploaded_skill_target_stem, _validate_skill_name.
Imports dropped: shutil, tempfile, yaml, ....utils.paths.
SkillManager (skill/manager.py): 187 → 88 lines
Removed: get_managed_skills_root, _discover_skill_directories,
_find_skill_entry, _load_skill_file, _normalize_package_root.
Imports dropped: datetime, parse_frontmatter, paths.
Tests:
- test_skill_service.py: drop the 3 sync scan_directory tests +
skill_service fixture + _create_skill_file helper
- test_skill_tools.py: drop test_load_skill_file_success; rename
TestSkillManagerPackageLoading → TestSkillManagerCache
Full unit suite: 277 passed, 1 skipped. ``ruff check`` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(skill): re-inject skill index into local-agent system prompt
The contributor's original PR (#1917) appended an ``Available Skills``
index to the system prompt before the LLM saw the user message, so the
LLM could decide whether to activate a skill. ``7145447b`` removed the
text-marker activation flow and, together with it, the entire system
prompt injection — but the Tool Call replacement only put the available
skills inside the ``activate`` tool's description. In practice the LLM
ignores tool descriptions for selection and goes straight to native
tools, so user-visible skill activation silently broke.
Restore the injection, adapted for the Tool Call era:
- SkillManager regains ``get_skill_index(bound_skills)`` and
``build_skill_aware_prompt_addition(bound_skills)``. The addendum
carries only ``name (display_name): description`` for each
pipeline-visible skill plus one instruction line pointing at the
``activate`` tool. No SKILL.md contents — KV cache stays clean
- PreProcessor appends the addendum to the first system message (or
inserts a new one) of ``query.prompt.messages`` for the local-agent
runner. Handles plain-string and ContentElement[] bodies. Skips
cleanly when no skills are visible
- 3 new test_preproc cases: injection happens, bound-skills subset
honoured, empty addendum touches nothing. 280 passed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(box): downgrade get_status.available when backend probed unavailable
Until now ``BoxService.get_status`` returned ``available: true`` whenever
the runtime connector was healthy, even if the runtime itself reported
``backend: { available: false }`` (operator selected nsjail without the
binary, Docker daemon crashed mid-session, E2B credentials wrong, ...).
The dashboard / ``useBoxStatus`` hook / skill_service gate consumed the
top-level flag and showed "connected" while every actual call to native
exec or skill management would fail.
The native-tool loader already polled ``status.backend.available``
independently and hid its tools correctly, but every other consumer
(dashboard banner, the disabled-state hint, the LLM-facing message)
disagreed with it.
Combine the two in the payload: ``available = self._available AND
status.backend.available``. When ``backend.available`` is false we now
also surface a ``connector_error`` that names the backend
("Configured sandbox backend \"nsjail\" is unavailable") so the dialog
shows the actionable reason instead of an empty error pane. The
detailed ``backend`` object is preserved unchanged for the dialog.
Internal ``box_service.available`` (used by ``skill_service`` writes,
``mcp_stdio.uses_box_stdio``, the reconnect callback) is intentionally
NOT changed — it still tracks connector health only, so a backend blip
does not trigger spurious reconnect loops.
Tests:
- ``test_get_status_downgrades_available_when_backend_dead`` — exercise
the new branch (connector OK, backend.available=false → top-level
available=false, connector_error mentions the backend name)
- ``test_get_status_keeps_available_true_when_backend_ok`` — guard
against regressing the happy path
Live-verified with ``box.backend: nsjail`` on macOS (no nsjail binary):
``GET /api/v1/box/status`` now returns ``available: false`` with the
named connector_error, instead of the previous misleading
``available: true``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(web): surface the specific Box failure reason in unavailable banner
When Box is configured but the runtime reports its backend is dead
(e.g. ``box.backend = nsjail`` but the binary is missing, or Docker
daemon crashed), the backend now returns a structured
``connector_error`` like ``Configured sandbox backend "nsjail" is
unavailable``. The previous notice only said "Box sandbox is
unavailable" + a generic "enable Box" hint, hiding the actionable
detail.
- ``useBoxStatus``: derive ``reason`` from ``status.connector_error``.
Only exposed for the failed-state (``hint === 'boxUnavailable'``),
since the disabled-by-config message already carries its reason
- ``BoxUnavailableNotice``: insert the reason as a small monospaced
line between the state message and the action hint. The disabled
variant is unchanged (operator chose the state)
- Wire ``reason`` through every existing call site (Skills page +
detail, PipelineExtension, both MCP forms). Old unused ``context``
prop dropped
Net layout (3 lines, still compact):
⚠ Box sandbox is unavailable — sandbox tools, skill add/edit, ...
Configured sandbox backend "nsjail" is unavailable
This feature requires the Box runtime. Enable it in config ...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: reconcile master's unit tests with feat/sandbox refactors
The merge from master brought in new unit tests that target pre-refactor
APIs on feat/sandbox. Reconcile each:
- factories/app.py: FakeApp now exposes a Mock skill_mgr (with empty .skills
dict + inert prompt-addition builder) and a Mock pipeline_service so the
PreProcessor skill-index injection branch can run end-to-end in tests.
- pipeline/conftest.py: eagerly import langbot.pkg.pipeline.pipelinemgr so
pipeline.stage is fully initialised before any individual stage test
(preproc, longtext, ...) tries to lazy-load it. Without this preload,
running test_preproc.py in isolation hit a circular-import error via the
stage -> app -> pipelinemgr -> stage chain.
- provider/test_tool_manager.py: ToolManager now probes four loaders
(native -> plugin -> mcp -> skill). Inject inert native + skill mocks in
the execute_func_call fixture and assert all four shutdowns fire.
- utils/test_paths.py: drop the three cwd-dependent _check_if_source_install
cases. The refactor walks Path(__file__).resolve().parents looking for
pyproject.toml + main.py, so cwd no longer factors in and there's no
file read to mock-fail. The positive case and caching test still apply.
- utils/test_version.py: delete entirely. is_newer and compare_version_str
were removed when VersionManager was refactored to use the Space API for
release checks (
|
||
|
|
f0061817ea |
fix: remove /debug/exec endpoint that allows authenticated RCE via exec() (#2178)
The /api/v1/system/debug/exec endpoint passes user-supplied HTTP body directly to Python exec(), enabling arbitrary code execution for any authenticated user when debug_mode is enabled. This is a critical security risk (CWE-94): a single misconfiguration or compromised JWT grants full server-side code execution. Remove the endpoint entirely. The /debug/plugin/action endpoint (which does not use exec()) is left intact as it serves a different, scoped purpose. Co-authored-by: Junyan Chin <rockchinq@gmail.com> |
||
|
|
17bbc8bf10 |
Feat/test build (#2174)
* fix(ci): update unit-test workflow paths to match current source layout Replace stale pkg/** filter with src/langbot/** and add uv.lock. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(tests): update README to reflect current test layout - Fix stale paths: tests/pipeline → tests/unit_tests/pipeline - Update CI Python versions: 3.11, 3.12, 3.13 - Add test directory structure for box, config, platform, plugin, provider, storage - Document pytest markers and uv commands - Mention planned E2E tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add shared test factories package Create tests/factories/ with reusable test factories: - FakeApp: mock application with all dependencies - Message chains: text_chain, mention_chain, image_chain - Query factories: text_query, group_text_query, command_query, etc. No test changes - maintains backward compatibility. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake provider factory Add tests/factories/provider.py with: - FakeProvider: deterministic fake LLM provider - Error simulation: timeout, auth, rate-limit, malformed - Request capture for assertions - fake_model: mock model with attached provider Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake platform factory Add tests/factories/platform.py with: - FakePlatform: simulated platform adapter - Inbound message construction: friend/group/image - Mention-bot flag simulation - Outbound message capture for assertions - Streaming output support simulation - Send failure simulation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add comprehensive message/query factories Extend tests/factories/message.py with: - file_query: file attachment query - unsupported_query: unknown message segment - voice_query: audio/voice query - at_all_query: group @All mention - query_with_session: query with session object - query_with_config: query with custom pipeline config Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add fake message flow smoke test Create tests/smoke/test_fake_message_flow.py: - TestFakeMessageFlow: factory verification tests - TestMessageFlowIntegration: minimal flow smoke test - Tests FakeApp, FakeProvider, FakePlatform, query factories - Verifies LANGBOT_FAKE_PONG marker response - Captures outbound messages for assertions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer test-quick command Add scripts/test-quick.sh and Makefile with: - test-quick: runs ruff check + unit tests + smoke tests - No real provider keys or platform accounts required - Suitable for local branch self-test Update tests/README.md: - Document test-quick command - Document test factories package - Add smoke tests and factories directory structure Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(test): make test-quick reliable as developer gate Fixes for D-001验收问题: 1. test-quick.sh: use set -euo pipefail, uv run ruff, no tail pipe 2. Remove unused imports in factories (app.py, platform.py, provider.py) 3. Fix unused variable in smoke test 4. Add noqa: E402 to test_n8nsvapi.py lazy imports 5. Update smoke test docs: "minimal fake flow" not full pipeline Now test-quick is a reliable gate: lint failures exit 1, test failures propagate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add preproc and taskmgr unit tests U-001: Pipeline Preprocessor tests - Normal text message processing - Empty message handling - Image segment with/without vision model - Model selection and fallback - Variable extraction U-004: Core Task Manager tests (pattern-based) - Task creation and tracking patterns - Task cancellation patterns - Scope-based cancellation - Task type filtering - Pruning completed tasks - Wait all tasks Taskmgr tests use pattern-based approach to avoid circular import in source code (taskmgr → app → http_controller → migration → taskmgr). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add config loader unit tests U-005: Config Loader tests - Valid YAML config loading - Valid JSON config loading - Invalid YAML/JSON error behavior - Missing config file creation from template - Template completion for missing keys - ConfigManager load/dump operations - Exists check for both YAML and JSON All tests use tmp_path fixture, no real project config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): add chat and command handler pattern tests U-002: Chat Handler tests (pattern-based) - Normal message event emission pattern - prevent_default handling - User message alteration pattern - Runner selection pattern - Streaming/non-streaming response patterns - Exception handling modes (show-error, show-hint, hide) - Message history update pattern - Telemetry payload pattern U-003: Command Handler tests (pattern-based) - Command parsing and text extraction - Event creation pattern - Privilege/admin check pattern - Command result handling (text, error, image) - prevent_default handling - String truncation helper Uses pattern-based testing to avoid circular import issues in source code. Direct imports of handler modules trigger circular import chain. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: fix unused imports after ruff auto-fix Remove unused imports in test files: - test_config_loader.py: remove unused os - test_taskmgr.py: remove unused Mock - test_preproc.py: remove unused unsupported_query, image_chain Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(unit): improve taskmgr tests to test real classes U-004 improved: Tests now import and test actual classes: - TaskContext: new(), trace(), to_dict(), placeholder() - TaskWrapper: task creation, context, exception/result capture, cancel, to_dict - AsyncTaskManager: create_task, create_user_task, cancel_task, cancel_by_scope - Task pruning behavior Uses pre-mocking technique: - Mock langbot.pkg.core.app before import (breaks circular chain) - Mock langbot.pkg.core.entities with proper Enum All 24 tests now test real class behavior, not patterns. taskmgr.py coverage should improve significantly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(test): consolidate FakeApp and add sys.modules isolation utility - Extract tests/utils/import_isolation.py with isolated_sys_modules context manager - Extend tests/factories/app.py FakeApp with handler-specific attributes - Refactor test_chat_handler.py to use centralized FakeApp and cached imports - Refactor test_command_handler.py with mock_execute_factory fixture - Refactor test_smoke.py to move import-time sys.modules manipulation into fixture - Add SQLite migration integration tests (G-002) - Add HTTP API smoke integration tests (G-005) - Update CI workflow to call pytest for SQLite migrations (G-004) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add developer quality gate consolidation (G-007) - Add scripts/test-integration-fast.sh for fast integration tests - Add scripts/test-coverage.sh with 12% baseline threshold - Update Makefile with test-integration-fast, test-coverage, test-all-local - Update CI workflow with integration and coverage jobs - Add smoke marker to pytest.ini - Update tests/README.md with quality gate layers documentation - Add tests/integration/pipeline/ for pipeline stage-chain tests Quality gate layers: - Quick: ruff + unit + smoke (~2 min) - Fast Integration: SQLite/API/Pipeline (~3 min) - Coverage: 12% threshold gate (~8 min) - Full Local: all three combined Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): add PostgreSQL migration slow integration tests (G-003) - Add tests/integration/persistence/test_migrations_postgres.py - All tests marked with @pytest.mark.slow - Tests skip when TEST_POSTGRES_URL is not set (no local PostgreSQL) - Database isolation via clean_tables and clean_alembic_version fixtures - Update CI workflow to use pytest instead of inline Python script - Remove TODO(G-003) comment - Update tests/README.md with PostgreSQL test documentation Covered scenarios: - Baseline stamp sets revision - Upgrade from baseline to head - Upgrade idempotent - Get current on unstamped DB returns None Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(test): Phase 1.5 coverage expansion - COV-001 to COV-013 Coverage baseline raised from 13.65% to 26% (+12.35%) Gate raised from 12% to 18% Tasks completed: - COV-001: Command system unit tests (100% coverage) - COV-002: API service unit tests batch 1 (user/apikey/model/provider) - COV-003: Provider model manager unit tests - COV-004: Pipeline remaining stage tests (aggregator/cntfilter/longtext/msgtrun) - COV-005: Storage and utils coverage pass - COV-006: Gate ratchet 12%→15% - COV-007: Gate ratchet 15%→18% - COV-008: API service batch 2 (bot/pipeline/webhook/space/maintenance/mcp) - COV-009: Blocked - API controller circular import issue documented - COV-010: Plugin runtime unit tests (+0.08%) - COV-011: RAG and vector unit tests (+0.68%) - COV-012: Core boot and migration unit tests - COV-013: Provider requester logic unit tests (+0.62%) Key additions: - tests/utils/import_isolation.py: sys.modules isolation for circular imports - Provider requester mock tests: proved HTTP-dependent code can be tested locally - Vector filter utilities: 100% coverage on pure functions - API services: fake persistence pattern for unit testing Blocked issue COV-009 documented in langbot-test-plan/1.5/issues/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase1): add unit tests for telemetry, plugin, rag, persistence Add initial unit tests for Phase 1 of test coverage improvement: - telemetry: test initialization, payload sanitization, early returns (14.3% → 62.9%) - plugin: test _parse_plugin_id static method - rag: test _to_i18n_name static method - persistence: test serialize_model with datetime handling Overall core coverage: 41.9% → 42.2% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(phase2): add unit tests for core, persistence, plugin, utils - Add test_handler_helpers.py for plugin handler helpers (7 tests) - Add test_mgr_methods.py for persistence manager (5 tests) - Add test_app_config_validation.py for core app config (12 tests) - Add test_knowledge_service.py for API knowledge service (22 tests) - Add test_kbmgr.py for RAG knowledge base manager (39 tests) - Add test_survey_manager.py for survey manager (22 tests) - Add test_connector_methods.py for plugin connector (24 tests) - Add test_funcschema.py for utils function schema (9 tests) - Add test_platform.py for utils platform detection (7 tests) - Add test_extract_deps.py for plugin deps extraction (7 tests) - Add test_database_decorator.py for persistence decorator (7 tests) - Add test_load_config.py for core config loading (19 tests) - Add COVERAGE_EXCLUSIONS.md documenting external adapter exclusions - Fix test_chat_session_limit.py path for portability Coverage: core 28% → 30%, persistence 24% → 24.4%, plugin 27% → 28% Total: 1082 tests passed, core module coverage 45.5% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add API controller integration tests - Add test_pipelines.py (10 tests) covering pipelines CRUD operations - GET/POST/PUT/DELETE on /api/v1/pipelines - Extensions endpoint - Metadata endpoint - Coverage: pipelines controller 27% → 80% - Add test_providers.py (10 tests) covering provider/model management - Provider CRUD with model counts - LLM model CRUD - Coverage: providers controller 23% → 81%, models 29% → 45% Tests use Quart TestClient with mocked services for real HTTP behavior without external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add knowledge, bots, and model endpoints tests - Add test_knowledge.py (10 tests) covering knowledge base management - CRUD operations on /api/v1/knowledge/bases - Files management endpoints - Retrieve endpoint with validation - Coverage: knowledge/base.py 26% → 91% - Add test_bots.py (9 tests) covering bot management - CRUD operations on /api/v1/platform/bots - Logs endpoint - Send message endpoint with validation - Coverage: platform/bots.py 24% → 87% - Extend test_providers.py (+4 tests) for embedding/rerank models - Embedding models CRUD - Rerank models CRUD - Coverage: provider/models.py 29% → 60% Total integration tests: 53 (smoke 12 + pipelines 10 + providers 14 + knowledge 10 + bots 9) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(integration): add embed and monitoring endpoint tests Add integration tests for embed widget and monitoring API endpoints: - test_embed.py: 15 tests for widget.js, logo, turnstile, messages, reset, feedback - test_monitoring.py: 15 tests for overview, messages, llm-calls, sessions, errors, export Coverage improvements: - embed.py: 17% → 56% - monitoring.py: 17% → 93% Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(e2e): add minimal startup E2E tests Add E2E tests for LangBot startup flow: - tests/e2e/utils/config_factory.py: minimal config generation - tests/e2e/utils/process_manager.py: LangBot subprocess management - tests/e2e/conftest.py: E2E fixtures (session-scoped process) - tests/e2e/test_startup.py: 12 tests for startup verification Tests verify: - boot.py + stages execution - database initialization (SQLite) - API availability - migrations applied Uses embedded databases (SQLite, Chroma) - no external dependencies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(quality): fix fake tests and add missing coverage P0 fixes: - telemetry: rewrite fake tests with real behavior verification (25 tests) - config: delete copied-source tests, use proper imports (2 deleted) - persistence: fix try-except pass to verify specific errors P1 fixes: - pipeline: add real FixedWindowAlgo tests instead of mocks (12 tests) - provider: add SessionManager and ToolManager tests (25 tests) - storage: add S3StorageProvider tests with moto mock (16 tests) - plugin: add handler action tests for setting inheritance (15 tests) - rag: add file storage and ZIP processing tests (21 tests) - vector: add VDB filter conversion tests (30 tests) P2 fixes: - pipeline/msgtrun: strengthen assertions for exact message count - api: add response structure validation in integration tests New test files: - provider/test_session_manager.py - provider/test_tool_manager.py - storage/test_s3storage.py - plugin/test_handler_actions.py - rag/test_file_storage.py - vector/test_vdb_filter_conversion.py Source code bugs documented: - provider: TokenManager.next_token() ZeroDivisionError - telemetry: send_tasks class variable shared state - command: empty command IndexError, unused parameters - utils: funcschema KeyError - entity: vector.py independent declarative_base Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(test): update coverage stats and test structure - Update coverage from 22% to 30% - Add new test files to structure: - provider: session_manager, tool_manager - storage: s3storage - plugin: handler_actions - rag: file_storage - vector: vdb_filter_conversion - telemetry: rewritten tests - Update module coverage percentages Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: add 105 new unit tests for untested core functionality Add comprehensive tests for B-class issues (core functionality untested): Pipeline: - test_pool.py: QueryPool ID generation, caching, async context (12 tests) - test_ratelimit.py: Fixed timing-sensitive test tolerance - test_pipelinemgr.py: Use real Pydantic StageProcessResult instead of Mock Utils: - test_version.py: Version comparison functions (20 tests) - test_logcache.py: Log page management and retrieval (18 tests) - test_httpclient.py: HTTP session pool management (10 tests) - test_proxy.py: Proxy configuration from env and config (10 tests) - test_image.py: URL parsing and base64 extraction (12 tests) - test_pkgmgr.py: Pip command generation (8 tests) Discover: - test_engine.py: I18nString, Metadata, Component manifest (15 tests) Test count: 1193 → 1298 (+105 tests) Note: Some B-class issues cannot be tested due to circular import bugs filed as GitHub issues #2175 (pipeline) and #2176 (persistence). * test: tighten phase 1 coverage contracts * test: align ci integration isolation --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
4a4c0921a4 | fix(plugin): use specific runtime not connected error (#2199) | ||
|
|
e425cf079a | fix(pipeline): return query from QueryPool.add_query (#2198) | ||
|
|
245e798b79 | fix(pipeline): handle empty longtext response chain (#2197) | ||
|
|
27fdccce16 | fix(pipeline): preserve routed flag when aggregating (#2196) | ||
|
|
484643c0ee | fix(api): validate api key prefix (#2195) | ||
|
|
ec61459619 | fix(api): avoid mutating bot update payload (#2194) | ||
|
|
66ef744447 | fix(rag): reject unsafe runtime file paths (#2193) | ||
|
|
10d3a9cc92 | fix(api): avoid mutating pipeline update payload (#2192) | ||
|
|
885320e9ae | fix(utils): preserve QQ image URL scheme (#2188) | ||
|
|
ed02ac4710 |
fix(utils): classify runner URLs safely (#2191)
* fix(utils): classify runner URLs safely * fix(utils): keep runner parse failures unknown |
||
|
|
e4841edbaf | fix pkgmgr install requirements default (#2190) | ||
|
|
ef7a06b0db | fix telemetry send task isolation (#2187) | ||
|
|
6fe20c1812 | fix(core): handle sigint before app startup (#2189) | ||
|
|
9e8c8f79df | fix(plugin): validate plugin id format (#2185) | ||
|
|
01d06898fb | fix(provider): ignore empty token rotation (#2184) | ||
|
|
0a669c7016 | fix(utils): handle missing funcschema parameter docs (#2186) | ||
|
|
6713b57d01 | feat: enhance API key normalization and improve Space OAuth callback handling | ||
|
|
ea13ef87f2 | feat(provider): add API key normalization and update OpenAI requester initialization | ||
|
|
1fcdbd472f |
fix model runtime uuid after updates (#2160)
* fix model runtime uuid after updates * test: avoid local agent constructor coupling |
||
|
|
b9662250a6 |
add conversation expire config & user query text to dingtalk card (#2147)
* add conversation expire config * add user query text to card * fix(pipeline): move session limit to AI config * test(pipeline): cover AI session limit config * refactor(pipeline): merge session expire-time into AI runner stage Move the session validity duration field out of the standalone session-limit stage into the runner stage so it actually renders in the AI tab (the tab only shows the runner stage and the stage matching the selected runner — any other stage is filtered out). Read path, default config, metadata description, and tests updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): expire conversations from last update time * fix(n8n): sync generated conversation id into payload --------- Co-authored-by: RockChinQ <rockchinq@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
195f6efeff |
fix: prevent path traversal in LocalStorageProvider via key parameter (#2087)
Add _safe_resolve() helper that uses os.path.realpath() to canonicalize the joined path and verifies it stays within LOCAL_STORAGE_PATH. All six public methods (save, load, exists, delete, size, delete_dir_recursive) now validate the key before performing any I/O. This prevents absolute-path injection (e.g. key="/etc/passwd") and relative traversal (e.g. key="../../etc/passwd") from escaping the storage root directory. CWE-22 |