mirror of
https://github.com/langbot-app/LangBot.git
synced 2026-06-22 13:34:24 +00:00
test(qa): skill all-tool acceptance matrix + mcp-gateway discovery case
- references/skill-all-tool-acceptance.md: acceptance matrix for the skill all-tool model (runner x lifecycle x backend), case status, exit criteria, and the #2271 known issue (pre-existing box nested-mount, not this branch) - cases/skill-discovery-via-mcp-gateway.yaml: schema-valid case proving an external harness discovers skills via langbot_list_assets (the new 'skills' asset class); marked blocked-env until remote claude-code is responsive
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
id: skill-discovery-via-mcp-gateway
|
||||
title: "External harness discovers LangBot skills via langbot_list_assets (all-tool model)"
|
||||
mode: agent-browser
|
||||
area: sandbox
|
||||
type: regression
|
||||
priority: p2
|
||||
risk: medium
|
||||
ci_eligible: false
|
||||
tags:
|
||||
- skills
|
||||
- mcp-gateway
|
||||
- acp-agent-runner
|
||||
- all-tool-model
|
||||
- tools
|
||||
skills:
|
||||
- langbot-env-setup
|
||||
- langbot-testing
|
||||
env:
|
||||
- LANGBOT_FRONTEND_URL
|
||||
- LANGBOT_BACKEND_URL
|
||||
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
|
||||
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
|
||||
preconditions:
|
||||
- "An external-harness runner pipeline (e.g. ACP remote claude-code) is configured with langbot-assets-enabled=true so the LangBot MCP gateway is exposed to the harness."
|
||||
- "The remote harness (claude-code) is reachable and responsive (claude -p returns within the runner timeout)."
|
||||
- "At least one pipeline-visible skill exists in the Box skill store (otherwise the count is 0, which is still a valid pass for the discovery surface)."
|
||||
automation: scripts/e2e/pipeline-debug-chat.mjs
|
||||
automation_env:
|
||||
- LANGBOT_FRONTEND_URL
|
||||
- LANGBOT_BROWSER_PROFILE
|
||||
- LANGBOT_CHROMIUM_EXECUTABLE
|
||||
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
|
||||
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
|
||||
automation_pipeline_url_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
|
||||
automation_pipeline_name_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
|
||||
automation_prompt: "You have LangBot tools available via an MCP server (tools prefixed langbot_). Call langbot_list_assets with asset_types = [\"skills\",\"tools\"]. Then reply with one single line: the literal token PROBEDONE, a space, the number of skills you found, a space, and the number of tools you found."
|
||||
automation_expected_text: "PROBEDONE"
|
||||
automation_response_timeout_ms: "540000"
|
||||
steps:
|
||||
- "Open LANGBOT_FRONTEND_URL and navigate to the external-harness (ACP) pipeline."
|
||||
- "Open Debug Chat with langbot-assets-enabled on the runner."
|
||||
- "Send the automation_prompt asking the harness to call langbot_list_assets with asset_types [skills, tools]."
|
||||
- "Capture the final reply, backend logs, and the MCP gateway call trace."
|
||||
checks:
|
||||
- "UI: final reply contains PROBEDONE followed by a skill count and a tool count."
|
||||
- "Logs: backend shows the harness invoked langbot_list_assets and the response included a 'skills' asset class (this is the all-tool-model discovery surface added on this branch)."
|
||||
- "Behavior parity: a local-agent runner reaches the same skills via use_funcs / activate; the external harness reaches them via langbot_list_assets + langbot_call_tool."
|
||||
evidence_required:
|
||||
- ui
|
||||
- screenshot
|
||||
- backend_log
|
||||
expected_failures:
|
||||
- "runner.timeout when the remote claude-code harness is unauthenticated or slow to start — this is an environment issue, not a discovery-surface regression."
|
||||
diagnostics:
|
||||
- "If runner.timeout: ssh into the harness host and confirm `claude -p 'hi'` returns quickly; the ACP runner cannot complete until the harness responds."
|
||||
- "Activated-skill OPERATE on docker+shared-fs is tracked separately by issue #2271 and is out of scope for this discovery case."
|
||||
troubleshooting:
|
||||
- sandbox-native-tools-unavailable
|
||||
@@ -0,0 +1,55 @@
|
||||
# Acceptance matrix — skill all-tool model
|
||||
|
||||
Acceptance criteria for the branch that unifies LangBot skills as **authorized
|
||||
tools** (`feat/agent-runner-plugin`). Skills are no longer gated behind the
|
||||
`skill_authoring` capability; `activate` / `register_skill` / native `exec` are
|
||||
exposed like native tools, gated only on **sandbox + skill_mgr**. Discovery is
|
||||
tool-driven (`langbot_list_assets` gains a `skills` asset class for external
|
||||
harnesses). Host persists activated skills to `host.activated_skills`
|
||||
(last-write-wins) and prefills `ToolResource.parameters` so runners skip
|
||||
per-tool `get_tool_detail`.
|
||||
|
||||
## What changed (scope under test)
|
||||
|
||||
| Layer | Change |
|
||||
| --- | --- |
|
||||
| host | `toolmgr.get_all_tools` drops `include_skill_authoring`; `SkillToolLoader` self-gates on sandbox+skill_mgr |
|
||||
| host | `preproc` drops the `include_skill_authoring` branch; bound-skills + skills resource gate on `skill_mgr` |
|
||||
| host | `resource_builder` stops gating skills on `skill_authoring`; fills `ToolResource.parameters` via `tool_mgr.get_tool_schema` |
|
||||
| host | `persist_activated_skill` writes `host.activated_skills` (conversation scope) |
|
||||
| sdk | `ToolResource.parameters` (full JSON schema); `langbot_list_assets` `skills` asset class |
|
||||
| local-agent | `build_llm_tools` prefers `ctx.resources.tools.parameters`, falls back to `get_tool_detail`; `DEFAULT_MAX_TOOL_ITERATIONS` 20→100 |
|
||||
|
||||
## Dimensions
|
||||
|
||||
- **Runner**: `local-agent` (in-process logic, direct Run API, skill tools in `use_funcs`) · `acp-agent-runner` (external harness, remote-ssh claude-code, MCP gateway) · `claude-code-agent` (external harness, claude-code CLI, MCP gateway — *no pipeline yet*).
|
||||
- **Lifecycle**: discover → activate → operate (native exec under the activated mount path) → register.
|
||||
- **Backend**: docker · nsjail · e2b.
|
||||
|
||||
## Cases & status
|
||||
|
||||
| Case | Asserts | Runner(s) | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| `skill-tool-exposure-no-capability` | skill tools offered to a tool-calling runner **without** `skill_authoring`; gated only on sandbox+skill_mgr | local-agent | **covered (unit)** — `test_tool_manager_native.py`, `test_preproc.py` |
|
||||
| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` |
|
||||
| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
|
||||
| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
|
||||
| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs |
|
||||
| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | acp / claude-code | **blocked (env)** — remote claude-code unresponsive (`runner.timeout`); link is alive (runner started, reached execution) |
|
||||
| `skill-activation-cross-runner-parity` | local-agent and external harness both reach `activate` via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + acp | **blocked (env)** |
|
||||
|
||||
## Known issues
|
||||
|
||||
- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs.
|
||||
|
||||
## Exit criteria
|
||||
|
||||
1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
|
||||
2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
|
||||
3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)**
|
||||
4. `skill-discovery-via-mcp-gateway` + `skill-activation-cross-runner-parity` pass on acp once remote claude-code is responsive. **(BLOCKED on env)**
|
||||
|
||||
## How to run
|
||||
|
||||
- **Unit**: LangBot `make test`; SDK `uv run pytest`; local-agent `uv run pytest tests/`.
|
||||
- **Browser e2e**: per-pipeline Debug Chat; canonical skill prompt pattern in [`sandbox-skill-authoring.md`](./sandbox-skill-authoring.md). Automatable cases use the `automation_*` fields + `scripts/e2e/pipeline-debug-chat.mjs`.
|
||||
Reference in New Issue
Block a user