test(qa): sandbox-skill-authoring OPERATE passes on nsjail + docker (#2271 fixed)

- nsjail: full create→exec→register→activate→exec-from-activated-path chain
  returns exit 0; activated mount runs scripts/use.py (reads data/input.json)
  and writes activated_writeback.txt through to the host skill store.
- docker: same chain now passes after langbot-plugin-sdk#87 (recreate sandbox
  container when extra_mounts change). Corrected #2271 root cause from
  'docker masks nested bind mount' to container-reuse: extra_mounts was not in
  the box session compatibility check, so docker reused a running container and
  could not append the activated skill's bind mount.
- Exit criterion 3 (real end-to-end skill use) now DONE; all 5 criteria met.
- Documents the nsjail stale-docker-artifact environment gotcha.
This commit is contained in:
huanghuoguoguo
2026-06-22 11:24:03 +08:00
parent 96f5b5e365
commit 4b34d4cffd
@@ -45,19 +45,20 @@ This is a **runner-plugin transport detail, not a host all-tool-branch issue**
| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)**`test_skill_tools.py` |
| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)**`test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **partial** — authorization chain passes (agent calls exec/register/activate, skill registered 0→1); **OPERATE step blocked by [#2271](https://github.com/langbot-app/LangBot/issues/2271)** on docker+shared-fs |
| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **PASS (nsjail + docker)** — full chain green via local-agent Debug Chat (pipeline `3e645b04`): create+run in `/workspace``exit 0` `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`; register+activate; exec in `/workspace/.skills/<name>` runs `scripts/use.py` (reads `data/input.json`) and writes `activated_writeback.txt``exit 0`, both markers, file written through to host skill store. Verified on **nsjail** first, then on **docker** after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)). |
| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | claude-code / acp | **PASS (both)** — clean single-instance runtime, remote-ssh→101. claude-code-agent (pipeline `28fd37ac`, stdio bridge): `PROBEDONE skills=1 tools=15`. acp-agent-runner (pipeline `b00794d2`, HTTP proxy + SSH reverse tunnel, **no public-url**): `PROBEDONE skills=1 tools=17`, 824s. Both prove the all-tool `skills` asset class is discoverable end-to-end by an external harness. |
| `skill-activation-cross-runner-parity` | local-agent and external harness both reach skills via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + claude-code + acp | **PASS** — local-agent (use_funcs) ✓, claude-code-agent (stdio gateway, `skills=1 tools=15`) ✓, and acp-agent-runner (HTTP-proxy gateway over reverse tunnel, `skills=1 tools=17`) ✓ all discover skills. `skills` count matches (1==1); the `tools` count (17 vs 15) is claude's self-reported tally and not yet checked against the authoritative gateway count — most likely model-counting variance, not an asset difference. |
## Known issues
- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` missing `scripts/`/`data/` on docker backend (nested bind mount). **Pre-existing** (Feat/sandbox #2072), not introduced by this branch (the mount/register chain is byte-identical to `origin/master` across host loader, `box/service.py`, SDK box backend, SDK box runtime). This branch only **exposed** the path end-to-end for the first time. Blocks the OPERATE step on docker+shared-fs.
- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` `scripts/`/`data/` missing on the docker backend. **FIXED** by [langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87) (`fix(box): recreate sandbox container when extra_mounts change`), rebased into this branch. **Corrected root cause:** not "docker masks the nested bind mount" (disproven) — the real bug is **container reuse**: `extra_mounts` was not part of the box session compatibility check, so when a skill is activated mid-conversation docker reused the already-running container and could not append the new bind mount; the activated skill therefore appeared empty. The fix records a mount signature on the session and recreates the container when the mount set changes (idempotent, no data loss). Pre-existing (Feat/sandbox #2072), reproduced on pure `origin/master` + the built-in local-agent runner, so not introduced by this branch — this branch only exposed the path end-to-end for the first time. After the fix, the OPERATE step passes on **both** docker and nsjail (see exit criterion 3). Merging needs a new SDK release + a `langbot-plugin` pin bump in LangBot's `pyproject.toml` to reach a released LangBot.
- **nsjail + stale docker workspace artifacts (environment, not a code bug).** If a prior docker run left root-owned dirs under the workspace (e.g. `data/box/default/.skills/`, created root-owned because docker runs as root), nsjail — which runs as the invoking user — cannot create the nested skill mount target under that root-owned dir and `runChild()` fails with `Launching child process failed`, poisoning **every** exec in the session (the exact symptom documented in `box/service.py::build_skill_extra_mounts`). Fix: remove the root-owned leftovers (`sudo rm -rf data/box/default/.skills data/box/default/<stale-skill>`) before running nsjail e2e. New nsjail runs create user-owned artifacts, so this is a one-time cleanup after switching off docker.
## Exit criteria
1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
3. `sandbox-skill-authoring-e2e` OPERATE step passes on at least one backend once #2271 is fixed (or a backend that avoids nested mounts), proving real end-to-end skill use. **(BLOCKED on #2271)**
3. `sandbox-skill-authoring-e2e` OPERATE step passes on a real backend, proving end-to-end skill use. **(DONE — nsjail + docker)** — full create→exec→register→activate→exec-from-`/workspace/.skills/<name>` chain returns `exit 0`; the activated mount runs `scripts/use.py` (reads `data/input.json``SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`) and writes `activated_writeback.txt` through to the host skill store. Verified on nsjail, then on docker after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)).
4. `skill-discovery-via-mcp-gateway` passes on an external harness. **(DONE — claude-code-agent: skills=1 tools=15, 24s)**
5. `skill-activation-cross-runner-parity` passes on acp. **(DONE — acp: skills=1 tools=17, 8s, via SSH reverse tunnel with no public-url; clean single-instance runtime)**