diff --git a/skills/skills/langbot-testing/cases/skill-discovery-via-mcp-gateway.yaml b/skills/skills/langbot-testing/cases/skill-discovery-via-mcp-gateway.yaml new file mode 100644 index 000000000..93e13f1d9 --- /dev/null +++ b/skills/skills/langbot-testing/cases/skill-discovery-via-mcp-gateway.yaml @@ -0,0 +1,58 @@ +id: skill-discovery-via-mcp-gateway +title: "External harness discovers LangBot skills via langbot_list_assets (all-tool model)" +mode: agent-browser +area: sandbox +type: regression +priority: p2 +risk: medium +ci_eligible: false +tags: + - skills + - mcp-gateway + - acp-agent-runner + - all-tool-model + - tools +skills: + - langbot-env-setup + - langbot-testing +env: + - LANGBOT_FRONTEND_URL + - LANGBOT_BACKEND_URL + - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL + - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME +preconditions: + - "An external-harness runner pipeline (e.g. ACP remote claude-code) is configured with langbot-assets-enabled=true so the LangBot MCP gateway is exposed to the harness." + - "The remote harness (claude-code) is reachable and responsive (claude -p returns within the runner timeout)." + - "At least one pipeline-visible skill exists in the Box skill store (otherwise the count is 0, which is still a valid pass for the discovery surface)." +automation: scripts/e2e/pipeline-debug-chat.mjs +automation_env: + - LANGBOT_FRONTEND_URL + - LANGBOT_BROWSER_PROFILE + - LANGBOT_CHROMIUM_EXECUTABLE + - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL + - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME +automation_pipeline_url_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL +automation_pipeline_name_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME +automation_prompt: "You have LangBot tools available via an MCP server (tools prefixed langbot_). Call langbot_list_assets with asset_types = [\"skills\",\"tools\"]. Then reply with one single line: the literal token PROBEDONE, a space, the number of skills you found, a space, and the number of tools you found." +automation_expected_text: "PROBEDONE" +automation_response_timeout_ms: "540000" +steps: + - "Open LANGBOT_FRONTEND_URL and navigate to the external-harness (ACP) pipeline." + - "Open Debug Chat with langbot-assets-enabled on the runner." + - "Send the automation_prompt asking the harness to call langbot_list_assets with asset_types [skills, tools]." + - "Capture the final reply, backend logs, and the MCP gateway call trace." +checks: + - "UI: final reply contains PROBEDONE followed by a skill count and a tool count." + - "Logs: backend shows the harness invoked langbot_list_assets and the response included a 'skills' asset class (this is the all-tool-model discovery surface added on this branch)." + - "Behavior parity: a local-agent runner reaches the same skills via use_funcs / activate; the external harness reaches them via langbot_list_assets + langbot_call_tool." +evidence_required: + - ui + - screenshot + - backend_log +expected_failures: + - "runner.timeout when the remote claude-code harness is unauthenticated or slow to start — this is an environment issue, not a discovery-surface regression." +diagnostics: + - "If runner.timeout: ssh into the harness host and confirm `claude -p 'hi'` returns quickly; the ACP runner cannot complete until the harness responds." + - "Activated-skill OPERATE on docker+shared-fs is tracked separately by issue #2271 and is out of scope for this discovery case." +troubleshooting: + - sandbox-native-tools-unavailable diff --git a/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md new file mode 100644 index 000000000..bc9ae7046 --- /dev/null +++ b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md @@ -0,0 +1,55 @@ +# Acceptance matrix — skill all-tool model + +Acceptance criteria for the branch that unifies LangBot skills as **authorized +tools** (`feat/agent-runner-plugin`). Skills are no longer gated behind the +`skill_authoring` capability; `activate` / `register_skill` / native `exec` are +exposed like native tools, gated only on **sandbox + skill_mgr**. Discovery is +tool-driven (`langbot_list_assets` gains a `skills` asset class for external +harnesses). Host persists activated skills to `host.activated_skills` +(last-write-wins) and prefills `ToolResource.parameters` so runners skip +per-tool `get_tool_detail`. + +## What changed (scope under test) + +| Layer | Change | +| --- | --- | +| host | `toolmgr.get_all_tools` drops `include_skill_authoring`; `SkillToolLoader` self-gates on sandbox+skill_mgr | +| host | `preproc` drops the `include_skill_authoring` branch; bound-skills + skills resource gate on `skill_mgr` | +| host | `resource_builder` stops gating skills on `skill_authoring`; fills `ToolResource.parameters` via `tool_mgr.get_tool_schema` | +| host | `persist_activated_skill` writes `host.activated_skills` (conversation scope) | +| sdk | `ToolResource.parameters` (full JSON schema); `langbot_list_assets` `skills` asset class | +| local-agent | `build_llm_tools` prefers `ctx.resources.tools.parameters`, falls back to `get_tool_detail`; `DEFAULT_MAX_TOOL_ITERATIONS` 20→100 | + +## Dimensions + +- **Runner**: `local-agent` (in-process logic, direct Run API, skill tools in `use_funcs`) · `acp-agent-runner` (external harness, remote-ssh claude-code, MCP gateway) · `claude-code-agent` (external harness, claude-code CLI, MCP gateway — *no pipeline yet*). +- **Lifecycle**: discover → activate → operate (native exec under the activated mount path) → register. +- **Backend**: docker · nsjail · e2b. + +## Cases & status + +| Case | Asserts | Runner(s) | Status | +| --- | --- | --- | --- | +| `skill-tool-exposure-no-capability` | skill tools offered to a tool-calling runner **without** `skill_authoring`; gated only on sandbox+skill_mgr | local-agent | **covered (unit)** — `test_tool_manager_native.py`, `test_preproc.py` | +| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` | +| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` | +| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures | +| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs | +| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | acp / claude-code | **blocked (env)** — remote claude-code unresponsive (`runner.timeout`); link is alive (runner started, reached execution) | +| `skill-activation-cross-runner-parity` | local-agent and external harness both reach `activate` via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + acp | **blocked (env)** | + +## Known issues + +- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs. + +## Exit criteria + +1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)** +2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)** +3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)** +4. `skill-discovery-via-mcp-gateway` + `skill-activation-cross-runner-parity` pass on acp once remote claude-code is responsive. **(BLOCKED on env)** + +## How to run + +- **Unit**: LangBot `make test`; SDK `uv run pytest`; local-agent `uv run pytest tests/`. +- **Browser e2e**: per-pipeline Debug Chat; canonical skill prompt pattern in [`sandbox-skill-authoring.md`](./sandbox-skill-authoring.md). Automatable cases use the `automation_*` fields + `scripts/e2e/pipeline-debug-chat.mjs`.