test(qa): skill all-tool acceptance matrix + mcp-gateway discovery case

- references/skill-all-tool-acceptance.md: acceptance matrix for the skill all-tool model (runner x lifecycle x backend), case status, exit criteria, and the #2271 known issue (pre-existing box nested-mount, not this branch) - cases/skill-discovery-via-mcp-gateway.yaml: schema-valid case proving an external harness discovers skills via langbot_list_assets (the new 'skills' asset class); marked blocked-env until remote claude-code is responsive
2026-06-22 13:34:24 +00:00 · 2026-06-21 23:46:22 +08:00
parent 190028d5ab
commit e5a5188442
2 changed files with 113 additions and 0 deletions
@@ -0,0 +1,58 @@
+id: skill-discovery-via-mcp-gateway
+title: "External harness discovers LangBot skills via langbot_list_assets (all-tool model)"
+mode: agent-browser
+area: sandbox
+type: regression
+priority: p2
+risk: medium
+ci_eligible: false
+tags:
+  - skills
+  - mcp-gateway
+  - acp-agent-runner
+  - all-tool-model
+  - tools
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
+  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
+preconditions:
+  - "An external-harness runner pipeline (e.g. ACP remote claude-code) is configured with langbot-assets-enabled=true so the LangBot MCP gateway is exposed to the harness."
+  - "The remote harness (claude-code) is reachable and responsive (claude -p returns within the runner timeout)."
+  - "At least one pipeline-visible skill exists in the Box skill store (otherwise the count is 0, which is still a valid pass for the discovery surface)."
+automation: scripts/e2e/pipeline-debug-chat.mjs
+automation_env:
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_BROWSER_PROFILE
+  - LANGBOT_CHROMIUM_EXECUTABLE
+  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
+  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
+automation_pipeline_url_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
+automation_pipeline_name_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
+automation_prompt: "You have LangBot tools available via an MCP server (tools prefixed langbot_). Call langbot_list_assets with asset_types = [\"skills\",\"tools\"]. Then reply with one single line: the literal token PROBEDONE, a space, the number of skills you found, a space, and the number of tools you found."
+automation_expected_text: "PROBEDONE"
+automation_response_timeout_ms: "540000"
+steps:
+  - "Open LANGBOT_FRONTEND_URL and navigate to the external-harness (ACP) pipeline."
+  - "Open Debug Chat with langbot-assets-enabled on the runner."
+  - "Send the automation_prompt asking the harness to call langbot_list_assets with asset_types [skills, tools]."
+  - "Capture the final reply, backend logs, and the MCP gateway call trace."
+checks:
+  - "UI: final reply contains PROBEDONE followed by a skill count and a tool count."
+  - "Logs: backend shows the harness invoked langbot_list_assets and the response included a 'skills' asset class (this is the all-tool-model discovery surface added on this branch)."
+  - "Behavior parity: a local-agent runner reaches the same skills via use_funcs / activate; the external harness reaches them via langbot_list_assets + langbot_call_tool."
+evidence_required:
+  - ui
+  - screenshot
+  - backend_log
+expected_failures:
+  - "runner.timeout when the remote claude-code harness is unauthenticated or slow to start — this is an environment issue, not a discovery-surface regression."
+diagnostics:
+  - "If runner.timeout: ssh into the harness host and confirm `claude -p 'hi'` returns quickly; the ACP runner cannot complete until the harness responds."
+  - "Activated-skill OPERATE on docker+shared-fs is tracked separately by issue #2271 and is out of scope for this discovery case."
+troubleshooting:
+  - sandbox-native-tools-unavailable
@@ -0,0 +1,55 @@
+# Acceptance matrix — skill all-tool model
+
+Acceptance criteria for the branch that unifies LangBot skills as **authorized
+tools** (`feat/agent-runner-plugin`). Skills are no longer gated behind the
+`skill_authoring` capability; `activate` / `register_skill` / native `exec` are
+exposed like native tools, gated only on **sandbox + skill_mgr**. Discovery is
+tool-driven (`langbot_list_assets` gains a `skills` asset class for external
+harnesses). Host persists activated skills to `host.activated_skills`
+(last-write-wins) and prefills `ToolResource.parameters` so runners skip
+per-tool `get_tool_detail`.
+
+## What changed (scope under test)
+
+| Layer | Change |
+| --- | --- |
+| host | `toolmgr.get_all_tools` drops `include_skill_authoring`; `SkillToolLoader` self-gates on sandbox+skill_mgr |
+| host | `preproc` drops the `include_skill_authoring` branch; bound-skills + skills resource gate on `skill_mgr` |
+| host | `resource_builder` stops gating skills on `skill_authoring`; fills `ToolResource.parameters` via `tool_mgr.get_tool_schema` |
+| host | `persist_activated_skill` writes `host.activated_skills` (conversation scope) |
+| sdk | `ToolResource.parameters` (full JSON schema); `langbot_list_assets` `skills` asset class |
+| local-agent | `build_llm_tools` prefers `ctx.resources.tools.parameters`, falls back to `get_tool_detail`; `DEFAULT_MAX_TOOL_ITERATIONS` 20→100 |
+
+## Dimensions
+
+- **Runner**: `local-agent` (in-process logic, direct Run API, skill tools in `use_funcs`) · `acp-agent-runner` (external harness, remote-ssh claude-code, MCP gateway) · `claude-code-agent` (external harness, claude-code CLI, MCP gateway — *no pipeline yet*).
+- **Lifecycle**: discover → activate → operate (native exec under the activated mount path) → register.
+- **Backend**: docker · nsjail · e2b.
+
+## Cases & status
+
+| Case | Asserts | Runner(s) | Status |
+| --- | --- | --- | --- |
+| `skill-tool-exposure-no-capability` | skill tools offered to a tool-calling runner **without** `skill_authoring`; gated only on sandbox+skill_mgr | local-agent | **covered (unit)** — `test_tool_manager_native.py`, `test_preproc.py` |
+| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` |
+| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
+| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
+| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs |
+| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | acp / claude-code | **blocked (env)** — remote claude-code unresponsive (`runner.timeout`); link is alive (runner started, reached execution) |
+| `skill-activation-cross-runner-parity` | local-agent and external harness both reach `activate` via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + acp | **blocked (env)** |
+
+## Known issues
+
+- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs.
+
+## Exit criteria
+
+1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
+2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
+3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)**
+4. `skill-discovery-via-mcp-gateway` + `skill-activation-cross-runner-parity` pass on acp once remote claude-code is responsive. **(BLOCKED on env)**
+
+## How to run
+
+- **Unit**: LangBot `make test`; SDK `uv run pytest`; local-agent `uv run pytest tests/`.
+- **Browser e2e**: per-pipeline Debug Chat; canonical skill prompt pattern in [`sandbox-skill-authoring.md`](./sandbox-skill-authoring.md). Automatable cases use the `automation_*` fields + `scripts/e2e/pipeline-debug-chat.mjs`.