From 4b34d4cffda4f2afe525551a1e90c8b0defdd4d1 Mon Sep 17 00:00:00 2001 From: huanghuoguoguo <60681390+huanghuoguoguo@users.noreply.github.com> Date: Mon, 22 Jun 2026 11:24:03 +0800 Subject: [PATCH] test(qa): sandbox-skill-authoring OPERATE passes on nsjail + docker (#2271 fixed) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - nsjail: full create→exec→register→activate→exec-from-activated-path chain returns exit 0; activated mount runs scripts/use.py (reads data/input.json) and writes activated_writeback.txt through to the host skill store. - docker: same chain now passes after langbot-plugin-sdk#87 (recreate sandbox container when extra_mounts change). Corrected #2271 root cause from 'docker masks nested bind mount' to container-reuse: extra_mounts was not in the box session compatibility check, so docker reused a running container and could not append the activated skill's bind mount. - Exit criterion 3 (real end-to-end skill use) now DONE; all 5 criteria met. - Documents the nsjail stale-docker-artifact environment gotcha. --- .../references/skill-all-tool-acceptance.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md index d443f7b54..a72c1db70 100644 --- a/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md +++ b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md @@ -45,19 +45,20 @@ This is a **runner-plugin transport detail, not a host all-tool-branch issue** | `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` | | `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` | | `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures | -| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs | +| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **PASS (nsjail + docker)** — full chain green via local-agent Debug Chat (pipeline `3e645b04`): create+run in `/workspace` → `exit 0` `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`; register+activate; exec in `/workspace/.skills/` runs `scripts/use.py` (reads `data/input.json`) and writes `activated_writeback.txt` → `exit 0`, both markers, file written through to host skill store. Verified on **nsjail** first, then on **docker** after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)). | | `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | claude-code / acp | **PASS (both)** — clean single-instance runtime, remote-ssh→101. claude-code-agent (pipeline `28fd37ac`, stdio bridge): `PROBEDONE skills=1 tools=15`. acp-agent-runner (pipeline `b00794d2`, HTTP proxy + SSH reverse tunnel, **no public-url**): `PROBEDONE skills=1 tools=17`, 8–24s. Both prove the all-tool `skills` asset class is discoverable end-to-end by an external harness. | | `skill-activation-cross-runner-parity` | local-agent and external harness both reach skills via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + claude-code + acp | **PASS** — local-agent (use_funcs) ✓, claude-code-agent (stdio gateway, `skills=1 tools=15`) ✓, and acp-agent-runner (HTTP-proxy gateway over reverse tunnel, `skills=1 tools=17`) ✓ all discover skills. `skills` count matches (1==1); the `tools` count (17 vs 15) is claude's self-reported tally and not yet checked against the authoritative gateway count — most likely model-counting variance, not an asset difference. | ## Known issues -- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs. +- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/` `scripts/`/`data/` missing on the docker backend. **FIXED** by [langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87) (`fix(box): recreate sandbox container when extra_mounts change`), rebased into this branch. **Corrected root cause:** not "docker masks the nested bind mount" (disproven) — the real bug is **container reuse**: `extra_mounts` was not part of the box session compatibility check, so when a skill is activated mid-conversation docker reused the already-running container and could not append the new bind mount; the activated skill therefore appeared empty. The fix records a mount signature on the session and recreates the container when the mount set changes (idempotent, no data loss). Pre-existing (Feat/sandbox #2072), reproduced on pure `origin/master` + the built-in local-agent runner, so not introduced by this branch — this branch only exposed the path end-to-end for the first time. After the fix, the OPERATE step passes on **both** docker and nsjail (see exit criterion 3). Merging needs a new SDK release + a `langbot-plugin` pin bump in LangBot's `pyproject.toml` to reach a released LangBot. +- **nsjail + stale docker workspace artifacts (environment, not a code bug).** If a prior docker run left root-owned dirs under the workspace (e.g. `data/box/default/.skills/`, created root-owned because docker runs as root), nsjail — which runs as the invoking user — cannot create the nested skill mount target under that root-owned dir and `runChild()` fails with `Launching child process failed`, poisoning **every** exec in the session (the exact symptom documented in `box/service.py::build_skill_extra_mounts`). Fix: remove the root-owned leftovers (`sudo rm -rf data/box/default/.skills data/box/default/`) before running nsjail e2e. New nsjail runs create user-owned artifacts, so this is a one-time cleanup after switching off docker. ## Exit criteria 1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)** 2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)** -3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)** +3. `sandbox-skill-authoring-e2e` OPERATE step passes on a real backend, proving end-to-end skill use. **(DONE — nsjail + docker)** — full create→exec→register→activate→exec-from-`/workspace/.skills/` chain returns `exit 0`; the activated mount runs `scripts/use.py` (reads `data/input.json` → `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`) and writes `activated_writeback.txt` through to the host skill store. Verified on nsjail, then on docker after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)). 4. `skill-discovery-via-mcp-gateway` passes on an external harness. **(DONE — claude-code-agent: skills=1 tools=15, 24s)** 5. `skill-activation-cross-runner-parity` passes on acp. **(DONE — acp: skills=1 tools=17, 8s, via SSH reverse tunnel with no public-url; clean single-instance runtime)**