From 4b34d4cffda4f2afe525551a1e90c8b0defdd4d1 Mon Sep 17 00:00:00 2001
From: huanghuoguoguo <60681390+huanghuoguoguo@users.noreply.github.com>
Date: Mon, 22 Jun 2026 11:24:03 +0800
Subject: [PATCH] test(qa): sandbox-skill-authoring OPERATE passes on nsjail +
 docker (#2271 fixed)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- nsjail: full create→exec→register→activate→exec-from-activated-path chain
  returns exit 0; activated mount runs scripts/use.py (reads data/input.json)
  and writes activated_writeback.txt through to the host skill store.
- docker: same chain now passes after langbot-plugin-sdk#87 (recreate sandbox
  container when extra_mounts change). Corrected #2271 root cause from
  'docker masks nested bind mount' to container-reuse: extra_mounts was not in
  the box session compatibility check, so docker reused a running container and
  could not append the activated skill's bind mount.
- Exit criterion 3 (real end-to-end skill use) now DONE; all 5 criteria met.
- Documents the nsjail stale-docker-artifact environment gotcha.
---
 .../references/skill-all-tool-acceptance.md                | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md
index d443f7b54..a72c1db70 100644
--- a/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md
+++ b/skills/skills/langbot-testing/references/skill-all-tool-acceptance.md
@@ -45,19 +45,20 @@ This is a **runner-plugin transport detail, not a host all-tool-branch issue** 
 | `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` |
 | `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
 | `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
-| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs |
+| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **PASS (nsjail + docker)** — full chain green via local-agent Debug Chat (pipeline `3e645b04`): create+run in `/workspace` → `exit 0` `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`; register+activate; exec in `/workspace/.skills/<name>` runs `scripts/use.py` (reads `data/input.json`) and writes `activated_writeback.txt` → `exit 0`, both markers, file written through to host skill store. Verified on **nsjail** first, then on **docker** after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)). |
 | `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | claude-code / acp | **PASS (both)** — clean single-instance runtime, remote-ssh→101. claude-code-agent (pipeline `28fd37ac`, stdio bridge): `PROBEDONE skills=1 tools=15`. acp-agent-runner (pipeline `b00794d2`, HTTP proxy + SSH reverse tunnel, **no public-url**): `PROBEDONE skills=1 tools=17`, 8–24s. Both prove the all-tool `skills` asset class is discoverable end-to-end by an external harness. |
 | `skill-activation-cross-runner-parity` | local-agent and external harness both reach skills via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + claude-code + acp | **PASS** — local-agent (use_funcs) ✓, claude-code-agent (stdio gateway, `skills=1 tools=15`) ✓, and acp-agent-runner (HTTP-proxy gateway over reverse tunnel, `skills=1 tools=17`) ✓ all discover skills. `skills` count matches (1==1); the `tools` count (17 vs 15) is claude's self-reported tally and not yet checked against the authoritative gateway count — most likely model-counting variance, not an asset difference. |
 
 ## Known issues
 
-- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs.
+- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` `scripts/`/`data/` missing on the docker backend. **FIXED** by [langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87) (`fix(box): recreate sandbox container when extra_mounts change`), rebased into this branch. **Corrected root cause:** not "docker masks the nested bind mount" (disproven) — the real bug is **container reuse**: `extra_mounts` was not part of the box session compatibility check, so when a skill is activated mid-conversation docker reused the already-running container and could not append the new bind mount; the activated skill therefore appeared empty. The fix records a mount signature on the session and recreates the container when the mount set changes (idempotent, no data loss). Pre-existing (Feat/sandbox #2072), reproduced on pure `origin/master` + the built-in local-agent runner, so not introduced by this branch — this branch only exposed the path end-to-end for the first time. After the fix, the OPERATE step passes on **both** docker and nsjail (see exit criterion 3). Merging needs a new SDK release + a `langbot-plugin` pin bump in LangBot's `pyproject.toml` to reach a released LangBot.
+- **nsjail + stale docker workspace artifacts (environment, not a code bug).** If a prior docker run left root-owned dirs under the workspace (e.g. `data/box/default/.skills/`, created root-owned because docker runs as root), nsjail — which runs as the invoking user — cannot create the nested skill mount target under that root-owned dir and `runChild()` fails with `Launching child process failed`, poisoning **every** exec in the session (the exact symptom documented in `box/service.py::build_skill_extra_mounts`). Fix: remove the root-owned leftovers (`sudo rm -rf data/box/default/.skills data/box/default/<stale-skill>`) before running nsjail e2e. New nsjail runs create user-owned artifacts, so this is a one-time cleanup after switching off docker.
 
 ## Exit criteria
 
 1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
 2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
-3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)**
+3. `sandbox-skill-authoring-e2e` OPERATE step passes on a real backend, proving end-to-end skill use. **(DONE — nsjail + docker)** — full create→exec→register→activate→exec-from-`/workspace/.skills/<name>` chain returns `exit 0`; the activated mount runs `scripts/use.py` (reads `data/input.json` → `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`) and writes `activated_writeback.txt` through to the host skill store. Verified on nsjail, then on docker after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)).
 4. `skill-discovery-via-mcp-gateway` passes on an external harness. **(DONE — claude-code-agent: skills=1 tools=15, 24s)**
 5. `skill-activation-cross-runner-parity` passes on acp. **(DONE — acp: skills=1 tools=17, 8s, via SSH reverse tunnel with no public-url; clean single-instance runtime)**