test(qa): skill all-tool acceptance matrix + mcp-gateway discovery case

- references/skill-all-tool-acceptance.md: acceptance matrix for the skill
  all-tool model (runner x lifecycle x backend), case status, exit criteria,
  and the #2271 known issue (pre-existing box nested-mount, not this branch)
- cases/skill-discovery-via-mcp-gateway.yaml: schema-valid case proving an
  external harness discovers skills via langbot_list_assets (the new 'skills'
  asset class); marked blocked-env until remote claude-code is responsive
This commit is contained in:
huanghuoguoguo
2026-06-21 23:46:22 +08:00
parent 190028d5ab
commit e5a5188442
2 changed files with 113 additions and 0 deletions
@@ -0,0 +1,58 @@
id: skill-discovery-via-mcp-gateway
title: "External harness discovers LangBot skills via langbot_list_assets (all-tool model)"
mode: agent-browser
area: sandbox
type: regression
priority: p2
risk: medium
ci_eligible: false
tags:
- skills
- mcp-gateway
- acp-agent-runner
- all-tool-model
- tools
skills:
- langbot-env-setup
- langbot-testing
env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BACKEND_URL
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
preconditions:
- "An external-harness runner pipeline (e.g. ACP remote claude-code) is configured with langbot-assets-enabled=true so the LangBot MCP gateway is exposed to the harness."
- "The remote harness (claude-code) is reachable and responsive (claude -p returns within the runner timeout)."
- "At least one pipeline-visible skill exists in the Box skill store (otherwise the count is 0, which is still a valid pass for the discovery surface)."
automation: scripts/e2e/pipeline-debug-chat.mjs
automation_env:
- LANGBOT_FRONTEND_URL
- LANGBOT_BROWSER_PROFILE
- LANGBOT_CHROMIUM_EXECUTABLE
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
- LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
automation_pipeline_url_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
automation_pipeline_name_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
automation_prompt: "You have LangBot tools available via an MCP server (tools prefixed langbot_). Call langbot_list_assets with asset_types = [\"skills\",\"tools\"]. Then reply with one single line: the literal token PROBEDONE, a space, the number of skills you found, a space, and the number of tools you found."
automation_expected_text: "PROBEDONE"
automation_response_timeout_ms: "540000"
steps:
- "Open LANGBOT_FRONTEND_URL and navigate to the external-harness (ACP) pipeline."
- "Open Debug Chat with langbot-assets-enabled on the runner."
- "Send the automation_prompt asking the harness to call langbot_list_assets with asset_types [skills, tools]."
- "Capture the final reply, backend logs, and the MCP gateway call trace."
checks:
- "UI: final reply contains PROBEDONE followed by a skill count and a tool count."
- "Logs: backend shows the harness invoked langbot_list_assets and the response included a 'skills' asset class (this is the all-tool-model discovery surface added on this branch)."
- "Behavior parity: a local-agent runner reaches the same skills via use_funcs / activate; the external harness reaches them via langbot_list_assets + langbot_call_tool."
evidence_required:
- ui
- screenshot
- backend_log
expected_failures:
- "runner.timeout when the remote claude-code harness is unauthenticated or slow to start — this is an environment issue, not a discovery-surface regression."
diagnostics:
- "If runner.timeout: ssh into the harness host and confirm `claude -p 'hi'` returns quickly; the ACP runner cannot complete until the harness responds."
- "Activated-skill OPERATE on docker+shared-fs is tracked separately by issue #2271 and is out of scope for this discovery case."
troubleshooting:
- sandbox-native-tools-unavailable
@@ -0,0 +1,55 @@
# Acceptance matrix — skill all-tool model
Acceptance criteria for the branch that unifies LangBot skills as **authorized
tools** (`feat/agent-runner-plugin`). Skills are no longer gated behind the
`skill_authoring` capability; `activate` / `register_skill` / native `exec` are
exposed like native tools, gated only on **sandbox + skill_mgr**. Discovery is
tool-driven (`langbot_list_assets` gains a `skills` asset class for external
harnesses). Host persists activated skills to `host.activated_skills`
(last-write-wins) and prefills `ToolResource.parameters` so runners skip
per-tool `get_tool_detail`.
## What changed (scope under test)
| Layer | Change |
| --- | --- |
| host | `toolmgr.get_all_tools` drops `include_skill_authoring`; `SkillToolLoader` self-gates on sandbox+skill_mgr |
| host | `preproc` drops the `include_skill_authoring` branch; bound-skills + skills resource gate on `skill_mgr` |
| host | `resource_builder` stops gating skills on `skill_authoring`; fills `ToolResource.parameters` via `tool_mgr.get_tool_schema` |
| host | `persist_activated_skill` writes `host.activated_skills` (conversation scope) |
| sdk | `ToolResource.parameters` (full JSON schema); `langbot_list_assets` `skills` asset class |
| local-agent | `build_llm_tools` prefers `ctx.resources.tools.parameters`, falls back to `get_tool_detail`; `DEFAULT_MAX_TOOL_ITERATIONS` 20→100 |
## Dimensions
- **Runner**: `local-agent` (in-process logic, direct Run API, skill tools in `use_funcs`) · `acp-agent-runner` (external harness, remote-ssh claude-code, MCP gateway) · `claude-code-agent` (external harness, claude-code CLI, MCP gateway — *no pipeline yet*).
- **Lifecycle**: discover → activate → operate (native exec under the activated mount path) → register.
- **Backend**: docker · nsjail · e2b.
## Cases & status
| Case | Asserts | Runner(s) | Status |
| --- | --- | --- | --- |
| `skill-tool-exposure-no-capability` | skill tools offered to a tool-calling runner **without** `skill_authoring`; gated only on sandbox+skill_mgr | local-agent | **covered (unit)**`test_tool_manager_native.py`, `test_preproc.py` |
| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)**`test_skill_tools.py` |
| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)**`test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs |
| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | acp / claude-code | **blocked (env)** — remote claude-code unresponsive (`runner.timeout`); link is alive (runner started, reached execution) |
| `skill-activation-cross-runner-parity` | local-agent and external harness both reach `activate` via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + acp | **blocked (env)** |
## Known issues
- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs.
## Exit criteria
1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)**
4. `skill-discovery-via-mcp-gateway` + `skill-activation-cross-runner-parity` pass on acp once remote claude-code is responsive. **(BLOCKED on env)**
## How to run
- **Unit**: LangBot `make test`; SDK `uv run pytest`; local-agent `uv run pytest tests/`.
- **Browser e2e**: per-pipeline Debug Chat; canonical skill prompt pattern in [`sandbox-skill-authoring.md`](./sandbox-skill-authoring.md). Automatable cases use the `automation_*` fields + `scripts/e2e/pipeline-debug-chat.mjs`.