test(skills): clarify manual QA perf gates

test(skills): add debug chat timing and isolation probes
test(skills): extend fake provider load profiles
2026-06-25 15:04:19 +00:00 · 2026-06-25 20:46:31 +08:00 · 2026-06-25 13:34:30 +08:00 · 2026-06-25 12:54:08 +08:00 · 2026-06-25 11:48:59 +08:00 · 2026-06-25 10:07:04 +08:00
240 changed files with 14483 additions and 29323 deletions
@@ -48,6 +48,7 @@ coverage.xml
 .coverage
 src/langbot/web/
 testsdk/
+.qa/

 # Build artifacts
 /dist
@@ -52,6 +52,15 @@ RUN apt-get update \
    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian $(. /etc/os-release && echo \"$VERSION_CODENAME\") stable" > /etc/apt/sources.list.d/docker.list \
    && apt-get update \
    && apt-get install -y --no-install-recommends docker-ce-cli \
+    # Install Node.js LTS so the sandbox (nsjail/Docker box) can run npx-based
+    # stdio MCP servers. node/npx land in /usr/bin, which is on the nsjail
+    # read-only mount whitelist (_READONLY_SYSTEM_MOUNTS), so they are bound
+    # into the sandbox chroot automatically. Without node, any npx-launched
+    # MCP server exits with return_code=127 (command not found).
+    && curl -fsSL https://deb.nodesource.com/setup_22.x -o /tmp/nodesource_setup.sh \
+    && bash /tmp/nodesource_setup.sh \
+    && apt-get install -y --no-install-recommends nodejs \
+    && rm -f /tmp/nodesource_setup.sh \
    && python -m pip install --no-cache-dir uv \
    && uv sync \
    && apt-get purge -y --auto-remove curl gnupg \
@@ -55,6 +55,12 @@ LangBot is an **open-source, production-grade platform** for building AI-powered

 ---

+## 😎 Stay Updated
+
+Click the Star and Watch buttons in the top-right corner of the repository to get the latest updates.
+
+![star gif](https://langbot.app/star.gif)
+
 ## Quick Start

 ### ☁️ LangBot Cloud (Recommended)
@@ -74,7 +80,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### One-Click Cloud Deploy
@@ -55,6 +55,12 @@ LangBot 是一个**开源的生产级平台**，用于构建 AI 驱动的即时

 ---

+## 😎 保持更新
+
+点击[仓库首页](https://github.com/langbot-app/LangBot)右上角 Star 和 Watch 按钮，获取最新动态。
+
+![star gif](https://langbot.app/star.gif)
+
 ## 快速开始

 ### ☁️ LangBot Cloud（推荐）
@@ -74,7 +80,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### 一键云部署
@@ -54,6 +54,12 @@ LangBot es una **plataforma de código abierto y grado de producción** para con

 ---

+## 😎 Manténgase Actualizado
+
+Haga clic en los botones Star y Watch en la esquina superior derecha del repositorio para obtener las últimas actualizaciones.
+
+![star gif](https://langbot.app/star.gif)
+
 ## Inicio Rápido

 ### ☁️ LangBot Cloud (Recomendado)
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### Despliegue en la Nube con un Clic
@@ -54,6 +54,12 @@ LangBot est une **plateforme open-source de niveau production** pour créer des

 ---

+## 😎 Restez à Jour
+
+Cliquez sur les boutons Star et Watch dans le coin supérieur droit du dépôt pour obtenir les dernières mises à jour.
+
+![star gif](https://langbot.app/star.gif)
+
 ## Démarrage Rapide

 ### ☁️ LangBot Cloud (Recommandé)
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### Déploiement Cloud en un Clic
@@ -54,6 +54,12 @@ LangBot は、AI搭載のインスタントメッセージングボットを構

 ---

+## 😎 最新情報を入手
+
+リポジトリの右上にある Star と Watch ボタンをクリックして、最新の更新を取得してください。
+
+![star gif](https://langbot.app/star.gif)
+
 ## クイックスタート

 ### ☁️ LangBot Cloud（推奨）
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### ワンクリッククラウドデプロイ
@@ -54,6 +54,12 @@ LangBot은 AI 기반 인스턴트 메시징 봇을 구축하기 위한 **오픈

 ---

+## 😎 최신 정보 받기
+
+리포지토리 오른쪽 상단의 Star 및 Watch 버튼을 클릭하여 최신 업데이트를 받으세요.
+
+![star gif](https://langbot.app/star.gif)
+
 ## 빠른 시작

 ### ☁️ LangBot Cloud (추천)
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### 원클릭 클라우드 배포
@@ -54,6 +54,12 @@ LangBot — это **платформа с открытым исходным к

 ---

+## 😎 Оставайтесь в курсе
+
+Нажмите кнопки Star и Watch в правом верхнем углу репозитория, чтобы получать последние обновления.
+
+![star gif](https://langbot.app/star.gif)
+
 ## Быстрый старт

 ### ☁️ LangBot Cloud (Рекомендуется)
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### Облачное развертывание одним кликом
@@ -56,6 +56,12 @@ LangBot 是一個**開源的生產級平台**，用於建構 AI 驅動的即時

 ---

+## 😎 保持更新
+
+點擊倉庫右上角 Star 和 Watch 按鈕，獲取最新動態。
+
+![star gif](https://langbot.app/star.gif)
+
 ## 快速開始

 ### ☁️ LangBot Cloud（推薦）
@@ -75,7 +81,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### 一鍵雲端部署
@@ -54,6 +54,12 @@ LangBot là một **nền tảng mã nguồn mở, cấp sản xuất** để x

 ---

+## 😎 Cập nhật Mới nhất
+
+Nhấp vào các nút Star và Watch ở góc trên bên phải của kho lưu trữ để nhận các bản cập nhật mới nhất.
+
+![star gif](https://langbot.app/star.gif)
+
 ## Bắt đầu nhanh

 ### ☁️ LangBot Cloud (Khuyên dùng)
@@ -73,7 +79,7 @@ uvx langbot
 ```bash
 git clone https://github.com/langbot-app/LangBot
 cd LangBot/docker
-docker compose up -d
+docker compose --profile all up -d
 ```

 ### Triển khai đám mây một cú nhấp
@@ -1,149 +0,0 @@
-# Agent-owned Context 协议设计
-
-本文档描述插件化 AgentRunner 场景下的上下文边界**设计理由**。结论先行：LangBot 不应成为最终 agentic context manager；它提供 context substrate，AgentRunner 或其背后的 runtime 自己决定如何管理历史、压缩、召回和 KV cache。
-
-> 涉及的数据结构（`AgentRunContext`、`ContextAccess`、`AgentRunAPIProxy` 等）唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。本文只讲语义和约束，不重抄 schema。
-
-## 1. 设计原则
-
-### 1.1 Agent 拥有上下文策略
-
-不同 runner 背后的 runtime 差异很大：
-
- 官方 local-agent 可能依赖 LangBot 的模型、工具、知识库和存储。
- Claude Code SDK / Codex 类 runtime 有自己的 session、transcript、tool loop 和上下文压缩。
- Pi Agent SDK 或外部 agent 平台可能只需要当前事件和一个外部 conversation key。
-
-因此 LangBot 不应强行决定最终传给模型的历史窗口。Host 只提供：当前事件的完整结构化信息、稳定身份和会话引用、可授权读取的 history / event / state API、sandbox/workspace 文件能力、可投影给外部 harness 的 scoped context / SDK-owned MCP bridge / resource handles、payload hard cap 和权限 guardrail。
-
-### 1.2 Host 不定义通用历史窗口
-
-历史窗口策略不是 AgentRunner 协议或 Query entry adapter 的核心概念。Host 只提供 history pull API、cursor、hard cap 和权限边界；runner 自己决定是否读取、读取多少、如何截断和压缩。
-
-正确的问题不是"LangBot 每轮裁几轮历史给 agent"，而是：
-
- 这类 runner 是否自管 context？
- 事件到来时 host 应 inline 哪些最小信息？
- agent 需要更多上下文时通过什么 API 拉取？
- host 如何保证安全、可审计和可分页？
-
-### 1.3 Host 保存事实源，Agent 管理 working context
-
-三类数据要分开：
-
- `EventLog`: Host 保存原始事件、工具调用、投递结果、错误和系统事件。
- `Transcript`: Host 从 EventLog 投影出的对话视图，用于 UI、审计和按需历史读取。
- `Working context`: Agent 本轮实际送进模型或 runtime 的上下文，由 AgentRunner 决定。
-
-LangBot 不提供 host-side inline history window。简单 runner 如果需要历史窗口，应在 runner 内部通过 Host history API 拉取并裁剪。
-
-## 2. Event 到来时传什么
-
-默认 `AgentRunContext`（PROTOCOL_V1 §5.2）应尽量小且稳定。默认规则：
-
- Host MUST NOT inline full history by default.
- Host SHOULD inline only current event / input and context handles.
- Runner owns working-context assembly.
- Runner MAY use Host history / event / state / storage API and sandbox/workspace file tools when authorized.
- Official runners MUST consume Host infrastructure through the same public API as third-party runners.
-
-### 2.1 必须 inline 的内容
-
-当前 event 的类型/id/时间/source；当前输入文本和结构化内容；附件/文件/图片的 metadata、path 或 URL；actor / subject / conversation / thread / bot / workspace；delivery 能力；已授权资源列表；context cursors 和可用 API 能力；Agent/runner config。这些是 agent 决定下一步所需的最低信息。
-
-### 2.2 默认不 inline 的内容
-
-完整历史消息、大文件全文、大工具结果、全量知识库内容、平台原始 payload 大对象、每轮重新生成的大段 summary。这些会破坏跨进程序列化成本、泄露范围、KV cache 稳定性，也会迫使 host 替 agent 做 context 策略。
-
-### 2.3 不提供 Host Inline History Window
-
-`AgentRunContext` 不包含 `bootstrap` 字段。Host 不下发历史窗口，也不通过 Pipeline 配置决定窗口大小。runner 若需要类似 `recent_tail` 的策略，应在自己的 manifest/config schema 中声明参数，并在 runner 内部通过 history API 读取、裁剪和压缩。Host 只负责权限、分页、hard cap 和事实源。
-
-## 3. ContextAccess 的作用
-
-`ContextAccess`（PROTOCOL_V1 §5.8）是 host 交给 agent 的上下文读取入口描述，告诉 agent：当前事件位于哪条 conversation / thread、若需要更多历史从哪个 cursor 开始拉、host inline 了什么没 inline 什么、当前 run 有哪些 context API 权限。
-
-## 4. Agent 如何获取更多上下文
-
-所有 API 都走 `AgentRunAPIProxy`（PROTOCOL_V1 §8），由 host 用 `run_id` 校验。
-
-外部 harness 不能直接访问 LangBot 资源。无论是 history、event、state、model、tool、knowledge base，还是 LangBot skills，都必须通过 SDK runtime 转发到 Host API，并由 Host 按 active `run_id`、runner identity、binding resource policy 和 caller plugin identity 校验。当前运行文件进入授权 sandbox/workspace 后，再由 runner 用 read/write/exec 类工具按需访问。harness 自己的 native tools 只属于 harness 执行环境，不能绕过 SDK runtime 访问 LangBot 内部资源。
-
-### 4.1 History
-
-```python
-await api.history_page(conversation_id=ctx.context.conversation_id,
-                       before_cursor=ctx.context.latest_cursor,
-                       limit=50, direction="backward", include_attachments=False)
-```
-
-返回 `HistoryPage`（schema 见 PROTOCOL_V1 §8）。
-
-约束：`limit` 有 host hard cap；默认只能读当前 conversation / thread；跨会话读取需 binding policy / run authorization snapshot 授权；可返回 attachment ref，不默认返回大文件内容。
-
-### 4.2 Search
-
-```python
-await api.history_search(query="用户之前提到的数据库连接信息",
-                         filters={"conversation_id": ..., "event_types": ["message.received"]},
-                         top_k=10)
-```
-
-Search 可先用数据库全文索引，后续接 embedding recall。它是 host 检索能力，不等于 agent 的长期记忆策略。
-
-### 4.3 Event / State
-
- Event API（`events.get` / `events.page`）用于读取非消息事件、工具事件、系统事件。Agent 不应把所有事件都当成 user/assistant message。
- State API（`state.get` / `set`）是可选寄宿能力。自管 runtime 可以完全不用；依附 LangBot 的官方 runner 可以使用，例如 `external.session_id`、`summary.checkpoint`。
-
-### 4.4 大文件与工具协作
-
-大文件、多模态输入和工具产物不要内联进 prompt 或 tool result：message/content 里只放小文本和必要摘要；当前事件附件由 Host staged 到授权 sandbox/workspace，并在 input attachment 中给出轻量 metadata/path。工具之间传递大结果时传 sandbox path 或 attachment ref，不传完整 blob。Host 只保证当前 run 授权范围，默认不允许插件直接读任意本地路径；临时文件由 sandbox 生命周期和清理机制管理。
-
-### 4.5 External harness context projection
-
-外部 harness 的总体边界以 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) §4.8 为准。本节只描述 context projection 的推荐形态。
-
-Claude Code、Codex、Kimi Code 这类 runtime 通常已有自己的 session、工具 loop、MCP 加载、上下文压缩和工作目录。LangBot 不应把它们改造成"host prompt assembler"，而应提供可审计的事件和资源投影。推荐 projection 形态：
-
- `agent-context.json`：结构化 JSON，包含 `run_id`、`event`、`actor`、`subject`、`input`、`delivery`、`resources`、`context`、`state`、`runtime`。
- `LANGBOT_CONTEXT.md`：人类可读摘要。
- `resources`：只包含本次 run 授权后的资源句柄和能力摘要，不暴露 Host 内部私有对象、secret 或资源内容。
- `skills`：LangBot skills 不是直接投影给 harness native tool loop 的文件能力，而是**一组被授权的 tool**。发现走 `list_skills`（或 `langbot_list_assets` 增加 skills 一类），激活/注册走 `activate` / `register_skill`，包内操作走 native exec/read/write，统一通过 `ctx.resources.tools`、`AgentRunAPIProxy` 或 SDK-owned MCP bridge 暴露。Host 不向 prompt 注入 skill 索引（无 progressive-disclosure 注入）；harness 通过调用发现工具主动查询 skill 清单。`agent-context.json` 的 `skills` 字段仅作发现工具的数据来源与可选 `suggested_skill_prompt` 的输入。
- `MCP config`：只投影 per-run、scoped 的 SDK-owned bridge 或外部 MCP 连接配置；LangBot 资源访问必须回到 SDK runtime / Host API，不允许 harness 通过自带 MCP/native tool 直接读 Host 内部资源。
- `state pointers`：外部 session id、working directory、checkpoint 等小型 JSON 状态通过 Host state API 保存。
-
-当前官方外部 harness 路径由 ACP / Claude Code / Codex 等 runner 插件承担（现状见 OFFICIAL_RUNNER_PLUGINS §7）。这类 projection 是"把 LangBot 事实源和授权资源句柄交给 harness"，不是"把 LangBot 资源本体或内部权限交给 harness"，也不是"由 LangBot 决定最终模型上下文"。
-
-## 5. Runner 上下文边界
-
-Host 只给当前事件、当前输入和 context handles。Runner 是否能拉取历史、事件、state 或 storage、是否能访问 sandbox/workspace 文件，以运行时 `ctx.context.available_apis` 和工具授权为准；runner 自己决定是否拉取历史、是否搜索、何时摘要、如何构造最终 prompt。
-
-## 6. KV cache 友好的上下文管理
-
-支持 Claude Code SDK、Codex、Pi Agent SDK 等 runtime 时，必须避免每轮由 LangBot 重组大块 prompt：
-
- 稳定 session key：`workspace/bot/binding/runner/conversation/thread`。
- 每轮只传 delta：当前 event、attachment refs/path、少量 runtime metadata。
- 历史 append-only：不要每轮改写同一段 history 文本。
- Summary checkpoint 稳定：只有压缩发生时产生新 checkpoint。
- 大文件和工具结果写入 sandbox/workspace。
- Tool/context API schema 稳定，数据通过 API 拉取而非塞入 prompt。
- 对自管 runtime，优先让它复用自身 session/cache，而不是强制 LangBot 每轮重放 transcript。
- 模型窗口元信息应作为 resource/runtime metadata 暴露给 runner，由 runner 决定预算和压缩策略。
-
-稳定 session key 的用途是隔离外部 runtime 的 resume/cache/state，不是改变 PROTOCOL_V1 §13 定义的 Agent 复用和 dispatch 边界。只有当某个外部 harness 的同一 native session 不支持并发 turn 时，runner 或 future runtime control plane 才应按 external session key 做 turn-level 串行化。
-
-对长期运行的 external harness / daemon，推荐运行形态是 reader 与 writer 分离：一个 session reader 独占读取 stdout/SSE/native event stream，并把 native event 转成 `AgentRunResult` 或 task progress；用户输入只作为 turn write 进入该 session。当前一次性 CLI subprocess runner 可以继续在单次 `run(ctx)` 内同步收集 stdout，但后续改成长连接时不应让多个 request 同时读取同一 native stream。
-
-## 7. Host guardrail
-
-Agent 自管 context 不代表无限制访问。LangBot 仍必须控制：每次 run 的 active `run_id`、runner identity、当前 binding 的 resource policy、conversation / actor / subject scope、page size / sandbox file read size / API rate limit、跨会话读取权限、数据脱敏和敏感变量过滤、审计日志。Host 不负责"最佳上下文策略"，但负责"不越权、不爆内存、不不可审计"。
-
-外部 harness 的 native tools、shell、MCP 或 skill 机制不构成 LangBot 资源授权边界。只要访问的是 LangBot 持有的资源，就必须经 SDK runtime 转发并接受 Host 校验；完整边界见 HOST_SDK §4.8。
-
-## 8. 官方 runner 与业务编排边界
-
-官方 runner 插件可以把状态寄宿在 LangBot，但必须和第三方 runner 一样通过公开 Host API 消费。LangBot core 不内置官方 agent 的业务流程（prompt 组装、tool loop、RAG 编排、summary/compaction、"local-agent 专用"状态字段）。
-
-官方 local-agent 应作为"依附 LangBot 基础设施的复杂 runner 参考实现"：transcript/history 通过 `api.history_page()` / `api.history_search()` 读取，summary/checkpoint/外部 session id/用户偏好通过 `api.state_get()` / `api.state_set()` 或 storage 方法保存，图片/文件/工具大结果通过 sandbox/workspace read/write 工具访问，模型/工具/知识库通过 `api.invoke_llm()` / `api.call_tool()` / `api.retrieve_knowledge()` 调用。这样 LangBot 保持为通用 agent host，不变成内置 agent 框架。具体迁移要求见 [OFFICIAL_RUNNER_PLUGINS.md](./OFFICIAL_RUNNER_PLUGINS.md)。
@@ -1,227 +0,0 @@
-# Agent Runner QA 指南
-
-本文档是 agent-runner 插件化下一轮测试的唯一 QA 入口。它合并并取代旧的 Phase 1 验收矩阵与 2026-05-18 / 2026-05-29 两份本地 QA 报告。
-
-目标不是保留完整历史流水账，而是指导测试 agent 用最小但高价值的路径判断当前分支是否仍然健康。
-
-## 1. 测试边界
-
-当前主线验证的是 AgentRunner Protocol v1：
-
-```text
-event -> binding -> runner.run(ctx) -> result stream
-```
-
-本指南验证：
-
- Host 能通过当前 Query entry adapter 进入 event-first `run(event, binding)` 主链路。
- Runner 来自插件 registry，而不是旧内置 runner 分支。
- `local-agent` 能消费 Host 模型、工具、知识库、history、state、sandbox 文件等基础设施。
- 外部 harness runner（ACP / Claude Code / Codex 等直接 runner 插件）能消费 event-first context，并把外部 session 指针写回 host-owned state。
- 错误、权限裁剪、无输出、timeout 等路径不会破坏主聊天流程。
-
-本指南不验证：
-
- Runtime Control Plane v2。
- EventGateway / EventRouter 完整落地由外部 EBA 分支联调；本指南只验证本分支 Host 底座。
- 发布级 path isolation、secret filtering、MCP allowlist、资源配额和 workspace cleanup。
- 所有外部服务 runner 的真实凭据联调。
-
-这些属于后续能力或发布门槛，分别见 [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) 与 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
-
-## 2. 状态定义
-
-测试报告只使用以下状态：
-
-| 状态 | 含义 |
-| --- | --- |
-| PASS | 按步骤执行，用户可见行为和日志证据都满足通过条件。 |
-| FAIL | 环境可用，但行为不满足通过条件。 |
-| BLOCKED | 凭据、CLI、外部服务、测试数据或本地配置缺失导致无法执行。必须写清阻塞原因。 |
-| N/A | 当前 runner 或平台明确不支持该能力。必须引用 manifest、文档或配置说明。 |
-
-不能使用“看起来正常”“大概通过”“基本没问题”等模糊状态。
-
-## 3. 执行顺序
-
-推荐按以下顺序执行，前一层失败时不要继续扩大测试面：
-
-1. Host / SDK / runner 单测。
-2. WebUI 登录与 Pipeline Debug Chat 基础 smoke。
-3. `local-agent` 高价值场景。
-4. 外部 code-agent harness smoke。
-5. 权限和错误路径补充检查。
-6. 汇总 PASS / FAIL / BLOCKED，并给出下一步建议。
-
-用户可见流程必须通过 WebUI 或真实消息平台验证。API / curl 只能作为诊断证据，不能单独让 UI case PASS。
-
-## 4. 必跑基线
-
-### 4.1 单测基线
-
-在 LangBot 仓库运行：
-
-```bash
-uv run --frozen pytest tests/unit_tests/agent
-```
-
-如果本次改动只触及默认配置或 API service，也至少补跑相关目标测试，例如：
-
-```bash
-uv run pytest tests/unit_tests/api/test_pipeline_service_defaults.py
-```
-
-通过条件：
-
- agent 单测全 PASS，或失败项已确认与本次 agent-runner 路径无关。
- 若失败来自 `context_builder`、`orchestrator`、`session_registry`、`resource_builder`、`plugin/handler.py` 的 run action 权限路径，不应进入 UI smoke。
-
-### 4.2 环境基线
-
-用 `langbot-skills` 做环境检查：
-
-```bash
-cd "$LANGBOT_SKILLS_REPO"
-bin/lbs env doctor
-bin/lbs case list
-```
-
-`LANGBOT_SKILLS_REPO` 指向当前工作区里的 `langbot-skills` 仓库。优先使用已有 case，而不是临时发明测试路径。
-
-推荐首批 case：
-
- `webui-login-state`
- `pipeline-debug-chat`
- `local-agent-basic-debug-chat`
- `local-agent-rag-debug-chat`（改动涉及 RAG / knowledge）
- `local-agent-plugin-tool-call-debug-chat`（改动涉及 tool / resource policy）
-
-## 5. WebUI 主链路 Smoke
-
-### 5.1 Runner registry
-
-步骤：
-
-1. 打开 WebUI Pipeline 配置页。
-2. 查看 AI runner 下拉列表。
-3. 选择 `plugin:langbot/local-agent/default`。
-4. 保存并刷新页面。
-
-通过条件：
-
- runner 选项来自插件 registry。
- 保存后配置仍为 `ai.runner.id` + `ai.runner_config[id]`。
- `runner_config` 表示 Agent/runner config，不表示插件实例状态。
- 不读取或回写旧 `ai.runner.runner` 字段。
- 不出现旧内置 runner stage 名（例如裸 `local-agent`）作为当前选中项或配置 surface。
- 插件没有循环重启或 metadata 加载失败。
-
-### 5.2 主聊天路径
-
-步骤：
-
-1. 使用绑定 `plugin:langbot/local-agent/default` 的 Pipeline。
-2. 在 Debug Chat 发送确定性普通文本。
-3. 查看 WebUI 回复和后端日志。
-
-通过条件：
-
- 用户可见回复正常。
- 后端日志显示走 `AgentRunOrchestrator` / `RUN_AGENT`。
- 不走旧内置 local-agent 主执行分支。
- conversation transcript 写入用户消息和助手消息。
-
-## 6. `local-agent` 高价值测试
-
-只保留最能覆盖架构边界的场景。
-
-| ID | 场景 | 操作 | 通过条件 |
-| --- | --- | --- | --- |
-| LA-01 | 绑定 prompt | 配置 system prompt 后发送文本。 | runner 使用 `ctx.config.prompt`，不读取 `ctx.adapter.extra["prompt"]`；回复体现绑定 prompt。 |
-| LA-02 | history API | 连续两轮对话，第二轮引用第一轮 marker。 | runner 通过 Host history API 或自管上下文读取历史，不依赖 inline history window。 |
-| LA-03 | 流式 / 非流式 | 分别用支持流式和关闭流式的路径发送文本。 | 流式 UI 不重复、不空白；非流式只输出最终消息。 |
-| LA-04 | 工具调用 | 绑定测试工具，发送会触发工具的 prompt。 | `ctx.resources.tools` 只包含授权工具；工具调用 started/completed；最终回复包含工具结果。 |
-| LA-05 | RAG | 绑定测试知识库，发送命中文档的 prompt。 | `ctx.resources.knowledge_bases` 包含所选知识库；runner 通过授权 API 检索；回复使用检索内容。 |
-| LA-06 | 多模态 | 发送图片输入。 | `ctx.input.contents` 保留图片；支持视觉模型时正常处理，不支持时受控失败。 |
-| LA-07 | fallback / 错误 | 模拟 primary 模型失败或 runner 抛错。 | fallback 或 `run.failed` 行为受控；后续请求不受影响。 |
-| LA-08 | 无输出保护 | 测试 runner 完成但不产出消息。 | 不产生空白成功回复；按受控失败或明确缺陷处理。 |
-| LA-09 | steering / 运行中追加消息 | 使用支持 steering 的 runner，第一条消息触发长 run；run 未结束时在同 conversation 追加第二条消息。 | 第二条消息被 active run claim，不启动并发 run；runner 通过 `steering_pull` 看到追加输入；EventLog 有 `queued` -> `steering.injected`，若未消费则有 `steering.dropped` 终态；后续普通消息仍可处理。 |
-
-Rerank、remove-think、文件输入等场景只在本次改动直接涉及时补测，不作为每轮必跑项。
-
-## 7. Code-agent Harness Smoke
-
-这些测试用于验证 ACP、Claude Code、Codex 这类自管 runtime 能走同一条 Host 协议路径。若目标 harness 没有 CLI/daemon、登录态、代理配置或远端 workspace，标记 BLOCKED，不要伪造 PASS。
-
-Smoke 前应优先保留一层轻量单测或 fixture 测试：session 创建/复用、消息发送、结果解析、`run_id` 注入和 LangBot MCP gateway 必须有稳定测试覆盖。WebUI smoke 证明真实链路可用，但不能替代转换层和错误映射测试。
-
-### 7.1 外部 harness runner
-
-步骤：
-
-1. 确认目标 harness（例如 ACP daemon、Claude Code 或 Codex）在对应机器上可执行且已登录。
-2. 绑定目标 runner，例如 `plugin:langbot/acp-agent-runner/default`、`plugin:langbot/claude-code-agent/default` 或 `plugin:langbot/codex-agent/default`。
-3. 配置 runner 必要字段，例如 remote target、workspace、provider、startup timeout、reuse session 等。
-4. 在 Debug Chat 执行一次确定性真实 smoke。
-5. 检查 LangBot MCP gateway、`run_id` 回填和 host-owned state。
-
-通过条件：
-
- WebUI 可见回复包含预期 sentinel。
- 发送给 harness 的消息包含当前 LangBot `run_id` 和可访问资源摘要。
- Harness 通过 gateway 调用 `langbot_history_page`、`langbot_retrieve_knowledge` 或 `langbot_call_tool` 时必须携带正确 `run_id`；错误 run id 被拒绝。
- `external.session_id` 写入 host-owned state。
- 外部 harness 错误、timeout、empty output 都转成受控 `run.failed`。
- resume 到同一 external session 时，全局锁边界符合 PROTOCOL_V1 §13。
-
-### 7.2 API 型外部 runner
-
-Dify、n8n、Coze、DashScope、Langflow、Tbox 等外部服务 runner 不作为每轮必跑项。只有在本次改动触及对应 runner 或凭据已经可用时执行 smoke。
-
-通过条件：
-
- runner 可选，配置可保存。
- 请求成功，或外部服务错误被清晰返回。
- 外部服务凭据缺失时标记 BLOCKED，并记录缺失项。
-
-## 8. 权限与隔离补充
-
-以下优先用单测 / targeted fixture 覆盖，不要求每次通过 UI 人工构造恶意 runner。
-
-| 场景 | 推荐证据 |
-| --- | --- |
-| 未授权模型调用被拒绝 | `plugin/handler.py` run action 权限测试或目标单测。 |
-| 未授权工具调用被拒绝 | `ctx.resources.tools` 与 host action 拒绝日志。 |
-| 未授权知识库检索被拒绝 | `ctx.resources.knowledge_bases` 与 host action 拒绝日志。 |
-| run_id 结束后复用被拒绝 | session registry 注销测试。 |
-| 插件身份不匹配被拒绝 | `caller_plugin_identity` mismatch 测试。 |
-| 绑定插件身份的 run_id 省略 caller identity 被拒绝 | `_validate_run_authorization(..., caller_plugin_identity=None)` 返回错误。 |
-| 未注册 Runtime 连接伪造插件身份被剥离 | SDK runtime forwarding 测试：请求自带 `caller_plugin_identity` 时，未注册连接转发前必须 `pop`，已注册连接必须覆盖为真实插件身份。 |
-| storage/state scope 越权被拒绝 | state/storage proxy 单测。 |
-| steering claim 异常不杀 consumer loop | controller 单测：无效 runner / registry 异常只让当前消息回到普通 session 槽位路径，消息消费循环继续。 |
-| steering queue 未消费有终态 | session registry / orchestrator 单测：队列有上限；run unregister 时未 pull 项写 `steering.dropped` 审计。 |
-
-如果这些单测失败，不能用 WebUI 正常回复替代。
-
-## 9. 证据要求
-
-每轮测试报告至少记录：
-
- LangBot commit、SDK commit、相关 runner 插件 commit。
- Pipeline UUID/name、runner id、关键 runner config 摘要。
- WebUI 截图或 Playwright 操作记录。
- 后端日志中对应 query id / run id 的关键行。
- `langbot-skills` case/report 路径。
- 外部 harness runner 的 context 文件、session id、working directory、CLI 错误摘要。
- FAIL/BLOCKED 的复现步骤和归属仓库建议。
-
-报告结论必须回答：
-
- 是否建议继续进入下一阶段测试。
- 是否存在主聊天路径阻塞。
- 是否只是凭据 / 外部服务 / 本机 CLI 缺失导致 BLOCKED。
- 是否需要进入 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md) 的发布级验收。
-
-## 10. 历史高价值记录
-
-历史高价值记录与当前 runner 验收状态见 [STATUS.md](./STATUS.md)。本指南只保留可重复执行的测试步骤和证据要求。
@@ -1,92 +0,0 @@
-# Event Based Agent 接入设计
-
-> 本文记录 EBA 如何接入当前 AgentRunner Protocol v1 / Host 底座。EventGateway、EventRouter、Event subscription/notification 由外部 EBA 分支实现并联调；本分支只保留 event-first 入口和 envelope/binding models。
->
-> 数据结构唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)（runner 可见）与 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)（Host 内部模型）；本文只讲 EBA 语义，不重抄 schema。
-> 与当前 runner 外化分支、后续 Agent Platform / Runtime Control Plane 的边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
-
-本文描述 EBA 接入时，事件如何进入 LangBot、如何触发 AgentRunner，以及如何复用插件化 agent 基础设施。本分支不实现完整 EventBus / EventRouter / Platform API；这些能力正在外部 EBA 分支联调。这里的目标是把协议边界说清楚，避免当前消息入口继续绑死 Pipeline 和用户文本消息。
-
-## 1. 设计目标
-
- 消息、撤回、入群、好友申请、定时任务、API 调用都能抽象为 host event。
- EventRouter 可以根据 event type、bot、workspace、conversation、actor、subject 解析 `AgentBinding`。
- AgentRunner 通过同一套 orchestrator 被调用。
- 非消息事件不伪造成用户文本消息。
- 平台动作执行通过显式 capability / permission / result type 预留，不混入普通文本回复。
-
-## 2. 事件不是消息
-
-`message.received` 只是事件的一种。协议不应假设：一定有用户文本、一定有 conversation history、一定要返回一条聊天消息、actor 一定等于 sender、subject 一定等于当前消息。
-
-| event_type | actor | subject | input |
-| --- | --- | --- | --- |
-| `message.received` | 发消息的人 | 当前消息 | 文本、图片、文件等 |
-| `message.recalled` | 撤回操作者，未知时为系统 | 被撤回消息 | 通常为空 |
-| `group.member_joined` | 新成员或邀请人 | 群/成员关系 | 通常为空 |
-| `friend.request_received` | 申请人 | 好友申请 | 验证消息或申请理由 |
-| `schedule.triggered` | 系统 | 定时任务 | 任务 payload |
-| `api.invoked` | API caller | API request | request payload |
-
-## 3. 稳定事件名
-
-先保留的稳定事件名（作为插件协议的一部分保持稳定）：
-
- `message.received`
- `message.recalled`
- `group.member_joined`
- `friend.request_received`
-
-平台原始事件名只能进入 `ctx.event.source_event_type` / `raw_ref`，不能成为 `ctx.event.event_type` 的公共契约。
-
-## 4. Event Envelope 与 Binding
-
- 入口事件用 `AgentEventEnvelope`（HOST_SDK §4.1）承载；顶层字段使用 LangBot 稳定协议名，平台原始事件名和原始 payload 放 `metadata` / `raw_ref`。
- 触发关系用 `AgentBinding`（HOST_SDK §4.2）表达。EBA 阶段 binding 通过 `event_types`、`scope`、`filters` 决定哪些事件触发当前 bot / channel 绑定的 Agent。
-
-EBA dispatch 基数、Agent 复用和 fan-out 边界以 PROTOCOL_V1 §13 为准；本节只说明外部 EBA 分支的 EventRouter 如何产出当前 v1 主线需要的 binding。
-
-Binding scope 示例：workspace 全局、bot 级、platform channel 级、conversation / group / thread 级、user / actor 级。旧 Pipeline 可迁移为 `message.received` 的临时 binding source，但目标持久配置应是 Agent，不是 Pipeline。
-
-Event Source 可包括：`platform_adapter`（飞书、QQ、微信、Telegram 等）、`webui`、`http_api`、`scheduler`、`system`。EventRouter 不应写死平台 adapter 的类名。
-
-## 5. EventRouter 调用链
-
-```text
-Platform Adapter / WebUI / API
-  -> Event Gateway normalize payload
-  -> EventLog append raw event
-  -> EventRouter resolve one effective AgentBinding
-  -> AgentRunOrchestrator.run(event, binding)
-  -> AgentRunContextBuilder.build(event, binding)
-  -> PluginRuntimeConnector.run_agent()
-  -> AgentRunResult stream
-  -> DeliveryController render / platform action
-```
-
-约束：必须复用现有 orchestrator，不能为 EBA 单独实现另一套 plugin runner 调用协议；非消息事件不能绕过 resource authorization；delivery 和 platform action 走统一权限模型；外部 harness runner 也通过同一套 envelope/binding/context/result 协议接入，不为 Claude Code / Codex / Kimi 单独发明队列协议。observer / fan-out / parallel arbitration 的额外语义仍按 PROTOCOL_V1 §13 处理。
-
-## 6. 平台动作执行
-
-EBA 后 `action.requested`（PROTOCOL_V1 §7.3，当前仅 telemetry 不执行）将用于请求 host 执行平台动作：
-
-```json
-{ "type": "action.requested",
-  "data": { "action": "friend.request.accept",
-            "target": {"platform": "wechat", "request_id": "..."},
-            "payload": {"reason": "policy matched"} } }
-```
-
-Host 必须校验：binding / platform action policy 是否授权该 action、actor / bot / workspace 是否允许、是否需要人工审批，以及当前 run session / caller identity 是否匹配。EBA 还可能预留 `delivery.requested`（请求投递到某 surface）。
-
-Delivery 方面，event 不一定回复到当前聊天窗口：消息事件通常带 reply target；系统事件可能没有默认 reply target，需要 runner 返回 `action.requested` 或由 binding 的 delivery policy 决定投递位置（`DeliveryContext` 见 PROTOCOL_V1 §5.7）。
-
-## 7. 与 Context 协议的关系
-
-EBA 事件进入 AgentRunner 时仍遵循 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md)：inline 当前事件、大 payload 用 raw/staged file ref、不默认 inline 完整 history、agent 按需通过 API 拉取、Host 保留 EventLog 和权限 guardrail。非消息事件可以被投影进 Transcript，但不能强制伪装为 user message；AgentRunner 根据 event type 自己决定是否纳入模型上下文。
-
-## 8. EBA 分支联调内容
-
-外部 EBA 分支负责联调 EventGateway 完整实现、EventRouter 与 BindingResolver 集成、`AgentBinding` 持久模型和 UI、`DeliveryContext` 完整实现、platform action permission model 和执行器、真实平台事件接入。
-
-当前底座已完成：① 把当前 Pipeline 消息入口适配成 `message.received` event → ② 增加 `AgentBinding` 抽象，先由 current config 生成 → ③ context builder 改为从 event + binding 构造 → ④ 引入 EventLog / Transcript。外部 EBA 分支在此基础上联调：⑤ 非消息事件协议测试与真实事件来源 → ⑥ 真实 EventRouter、binding persistence / UI 和 platform action。
@@ -1,51 +0,0 @@
-# AgentRunner 外化扩展边界矩阵
-
-本文用于回答一个问题：本分支只做 AgentRunner 外化时，哪些能力已经作为扩展底座完成，哪些由外部 EBA / Agent Platform / Runtime Control Plane 分支接入，后续分支接入时应该走哪个扩展点。
-
-结论：本分支不实现完整 Agent Platform，也不实现完整 EBA。EBA 完整事件网关与事件路由由外部 EBA 分支联调。本分支必须把 runner 外化的 Host / SDK 边界做干净，让外部分支只需要接入持久模型、事件路由或 runtime task，而不需要重写 `AgentRunner Protocol v1`。
-
-调度基数、Agent 复用、插件实例无状态、Pipeline adapter 和 fan-out 边界的单一事实源是 [PROTOCOL_V1.md](./PROTOCOL_V1.md) §13；本矩阵只说明后续能力应该接入哪个扩展点。
-
-## 1. 分支边界
-
-| 范围 | 本分支职责 | 不在本分支做 |
-| --- | --- | --- |
-| AgentRunner Protocol v1 | 定义 Host 调用 runner 的稳定合同：discovery、`AgentRunContext`、result stream、Host pull API、错误和权限边界。 | 不定义 Agent Platform 的产品数据库模型；不定义 runtime task queue。 |
-| Host runner 外化底座 | 提供 `AgentEventEnvelope`、`AgentBinding` 运行投影、`run(event, binding)`、resource authorization、run-scoped session、EventLog / Transcript / State / sandbox 文件边界。 | 不实现 EventGateway、scheduler、integration provider、Agent 管控面 UI。 |
-| 当前 Pipeline 入口 | 通过 `QueryEntryAdapter` 把旧 Query / Pipeline config 投影成 event + binding，作为迁移期入口。 | 不继续把 Pipeline 当作长期 agent 配置中心。 |
-| 官方 runner 插件 | 作为协议消费者验证 local-agent / 外部 harness runner 能接入 Host 基础设施。 | 不让官方 runner 的内部实现反向决定 Host / SDK 协议形态。 |
-
-## 2. 扩展矩阵
-
-| 能力 | 当前分支状态 | 后续归属 | 后续接入方式 | 禁止事项 |
-| --- | --- | --- | --- | --- |
-| Product `Agent` | 已有运行期 `AgentConfig` / `AgentBinding` 投影；还没有正式持久化产品对象。 | Agent Platform / binding persistence UI。 | 持久 Agent 保存 runner id、runner config、resource/state/delivery policy；运行前投影为 `AgentBinding`。 | 不把持久 Agent schema 加进 SDK 协议；插件实例边界见 PROTOCOL_V1 §13。 |
-| Bot / channel 绑定 Agent | 已有单次运行前的 `AgentBinding` 解析投影；目标调度语义见 PROTOCOL_V1 §13。 | EBA / Agent Platform。 | EventRouter 根据 bot、channel、workspace、conversation、event type 解析有效 `AgentBinding`。 | 不在本矩阵重定义 fan-out / observer 语义；需要时按 §3 新增设计。 |
-| Agent session / run | 当前只有 `run_id` 和 active `AgentRunSessionRegistry`，用于权限校验和生命周期。 | Agent Platform / Runtime Control Plane。 | 如需要可新增持久 `AgentRun` / `AgentSession` / task 表，但执行仍回到 `run(event, binding)` 或 runtime-managed 等价入口。 | 不把持久 session 字段塞进 `AgentRunContext` 顶层；不要求所有 runner 长期持有 LangBot session。 |
-| EventLog / Transcript / Sandbox files | 已完成 Host-owned store、history pull API 和 sandbox 文件边界；runner 不直接写 DB。 | 本分支持续维护底座；Agent Platform 可复用。 | 外部 EBA、scheduler、integration、runtime task 都写同一套 EventLog / Transcript；当前 run 文件通过 sandbox/workspace staging 共享。 | 不让 runner / sandbox 直接访问 Host DB；不把大 payload 内联进 prompt。 |
-| Host-owned state / storage | 已有 state snapshot、`state.updated` 处理和 State API；storage 作为授权能力保留。 | 本分支持续维护底座；Runtime / Platform 可复用。 | 外部 session id、working directory、checkpoint 等小 JSON 用 state；当前 run 大对象用 sandbox/workspace 文件。 | 不把跨轮次状态存在插件实例内；不绕过 run-scoped authorization。 |
-| EventGateway / EventRouter | 本分支只提供 event-first envelope 和 `run(event, binding)` 入口。 | EBA 分支（联调中）。 | EventGateway 规范化平台/WebUI/API/scheduler 事件；EventRouter 解析一个 binding；调用现有 orchestrator。 | 不为 EBA 新增另一套 runner 调用协议；不把非消息事件伪装成 user message。 |
-| Scheduler / Automation | 不实现。文档中只把 `scheduler` 作为 future event source。 | EBA / Agent Platform。 | 定时任务触发 `schedule.triggered` host event，复用 EventGateway -> EventRouter -> `run(event, binding)`。 | 不直接调用某个 runner 插件；不绕过 EventLog / authorization。 |
-| Integration provider | 不实现。IM platform adapter 仍是当前平台接入系统。 | EBA / Agent Platform。 | OAuth/webhook/outbound provider 应先转成 canonical host event 或 platform action，再交给 AgentRunner。 | 不把 Linear/Slack/GitHub 等 provider 私有 payload 扩散到 runner 协议顶层。 |
-| Platform action / delivery | `action.requested` 已预留但当前仅 telemetry，不执行。`DeliveryContext` 只作为上下文/策略投影。 | EBA / platform action executor。 | 后续 executor 校验 runner capability、binding policy、actor/bot/workspace 权限和审批后执行。 | 不让 runner 直接调用平台 adapter 私有 API；不把平台动作伪装成文本回复副作用。 |
-| Runtime registry / worker / task queue | 不实现。当前官方外部 harness 通过 ACP、远端 daemon、本机 subprocess 或外部 HTTP API runner 调用目标运行环境，不在本分支维护通用 worker。 | Runtime Control Plane v2。 | 第一阶段先补 Host-owned `AgentRun` / `AgentRunEvent` / run control primitives；完整 runtime registry、heartbeat、task queue、daemon claim、progress/audit 是后续可选阶段。 | 不把 heartbeat/task/warm pool 放进 Protocol v1；不让管理插件拥有 runtime/task 事实源。 |
-| Warm pool / reconcile / diagnose | 不实现。 | Runtime Control Plane v2 / deployment layer。 | 作为 task/runtime 的运维能力，围绕 Host-owned runtime/task/audit 表实现。 | 不把 runtime 运维语义写进普通 runner 协议；不把 pod/task 细节泄漏给普通 runner。 |
-| Agent memory | 不实现通用长期记忆产品层；提供 history/state/storage 和 sandbox 文件基础能力。 | Agent Platform 或具体 runner/plugin。 | 平台 memory 可通过 Host storage/state 或独立产品表实现，runner 通过授权 API 拉取。 | 不在 Host core 内置通用 agentic memory 策略；不默认把 memory 全量 inline 到 context。 |
-| External harness native session | ACP / Claude Code / Codex 等 runner 支持 external session id state handoff 和 LangBot resource projection。 | 官方 runner 后续增强；Runtime Control Plane v2 可接管执行。 | 外部 harness 调用继续走 `runner.run(ctx)`；如后续引入长连接/daemon 模式，按 external session key 串行 turn，reader 独占 native stream。 | 不把具体 provider native wire 变成 LangBot 协议；全局锁边界见 PROTOCOL_V1 §13。 |
-
-## 3. 后续分支接入规则
-
-外部 EBA、Agent Platform 或 Runtime Control Plane 分支接入时，默认遵守以下规则：
-
- 新入口只生产或解析 Host 内部模型：`AgentEventEnvelope`、持久 Agent 投影出的 `AgentBinding`、以及必要的 delivery/resource/state policy。
- runner 调用仍走 `AgentRunOrchestrator.run(event, binding)`，除非 Runtime Control Plane 明确引入 runtime-managed 执行模式；即便如此，runner 可见合同仍应保持 Protocol v1。
- Host-owned facts 继续写入 EventLog / Transcript / State，当前 run 文件继续走 sandbox/workspace；产品层可以新增更高阶视图，但不能替代这些事实源。
- 新能力如果需要持久化，优先加 Host-owned 表或 service；不要把事实源藏在插件 storage 或 runner subprocess 内。
- 新 result type 可以按 Protocol v1 的演进规则增加；不能用入口 adapter 私有字段绕过 schema。
- 任何 fan-out、observer agent、parallel arbitration、platform action execution 都必须单独定义 delivery、state conflict、approval 和 audit 语义。
-
-## 4. 与 Agent Platform 产品层的关系
-
-这里的 Agent Platform 指面向 agent 产品层的实体拆分：`Agent` 描述可配置 agent，`Session` / `SessionMessage` 描述会话事实，`Automation` 描述自动触发，`IntegrationBinding` 描述外部集成连接，`Memory` 描述长期记忆，`WarmTask` 描述预热/后台任务。这些拆分对 LangBot 后续产品层有参考价值，但不能直接搬进本分支。
-
-LangBot 当前分支的对应目标是更底层的：把 IM/WebUI/API 等入口统一投影到 Host event，把 Agent / binding 配置统一投影到 runner binding，把 runner 能力统一收束到 Protocol v1。完整 Agent Platform 可以在这个底座之上构建，而不应反过来污染本分支的 runner 外化边界。
@@ -1,263 +0,0 @@
-# LangBot Host 与 SDK 基础设施设计
-
-本文档描述 LangBot 作为 agent host 的内部能力与分层架构，以及 Host 内部模型。
-
- SDK ↔ Host 的协议数据结构（`AgentRunContext`、`AgentRunnerManifest`、`AgentRunResult`、`AgentRunAPIProxy` 等）的**唯一定义在** [PROTOCOL_V1.md](./PROTOCOL_V1.md)；本文只引用，不重抄。
- 测试执行入口和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)；安全发布门槛见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
- 本文定义的 Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、`AgentRunnerDescriptor`）不属于 SDK 协议字段。
-
-## 1. 目标
-
-LangBot 要转为 agent host，而不是内置 runner 容器：
-
- 接收 IM、WebUI、API 和外部 EBA 分支 EventRouter 产生的事件。
- 根据事件、bot、workspace、scope 解析应该调用的 Agent / agent binding。
- 发现、校验和调用插件提供的 AgentRunner。
- 为每次 run 提供受限资源、状态、存储、上下文引用和生命周期控制。
- 接收 AgentRunner 返回的事件流，投递到 IM、WebUI 或其他 output surface。
-
-## 2. 非目标
-
- 不把 Pipeline 当作长期架构中心。
- 不要求所有 AgentRunner 依赖 LangBot 的上下文管理。
- 不要求官方 local-agent 的旧行为反向塑造 host 协议。
- 不在 host 中实现通用 agentic prompt assembler。
- 不强制 runner 使用 LangBot state / storage；只提供可选、受控的寄宿能力。
- 不实现 EventGateway / EventRouter：它们由外部 EBA 分支提供并联调。本分支只定义 host-side envelope/binding models 和 `run(event, binding)` 入口。
-
-## 3. 分层架构
-
-```text
-IM / WebUI / API / EventRouter (external EBA branch)
-        |
-        v
-Event Gateway (external EBA branch)
-        |
-        v
-AgentBindingResolver
-        |
-        v
-AgentRunOrchestrator
-        |-- AgentRunnerRegistry
-        |-- AgentResourceBuilder
-        |-- AgentContextBuilder
-        |-- AgentRunSessionRegistry
-        |-- PersistentStateStore / EventLogStore / TranscriptStore
-        |-- Sandbox / workspace file tools
-        v
-Plugin Runtime / AgentRunner
-        |
-        v
-AgentRunResult stream
-        |
-        v
-Delivery / Renderer / Platform API
-```
-
-目标产品模型、单绑定调度、Agent 复用、插件实例无状态和 fan-out 边界以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) §13 为准。本文只说明 Host 如何把当前入口投影为内部模型。当前 Pipeline 只应接入在 Query entry adapter 位置：它可以继续产生 `message.received` 并投影出临时 `AgentConfig` / `AgentBinding`，但不应再拥有 runner 选择、上下文裁剪和业务 agent 执行的核心语义。EventGateway / EventRouter 由外部 EBA 分支实现并联调。
-
-## 4. LangBot 侧能力
-
-### 4.1 Event Gateway / EventRouter（External EBA Branch Integration Point）
-
-> EventGateway / EventRouter 由外部 EBA 分支实现并联调，不在本分支范围。本分支只保留 event-first 入口和 envelope/binding models。
-
-Event Gateway 将把入口统一成 host event（IM 平台消息、WebUI debug chat、API 触发、后续非消息事件），输出稳定的 `AgentEventEnvelope`（Host 内部模型）：
-
-```python
-class AgentEventEnvelope(BaseModel):
-    event_id: str
-    event_type: str
-    event_time: int | None
-    source: str
-    bot_id: str | None
-    workspace_id: str | None
-    conversation_id: str | None
-    thread_id: str | None
-    actor: ActorRef | None
-    subject: SubjectRef | None
-    input: AgentInput          # 见 PROTOCOL_V1 §5.6
-    delivery: DeliveryContext  # 见 PROTOCOL_V1 §5.7
-    raw_ref: RawEventRef | None
-    metadata: dict[str, Any] = {}
-```
-
-`AgentEventEnvelope` 是 Host 内部入口模型；投影给 runner 的是 `ctx.event`（PROTOCOL_V1 §5.4）。原始平台 payload 存为 raw event 或 staged file reference，不扩散到 runner 协议顶层。
-
-**当前 adapter source**：`QueryEntryAdapter.query_to_event(query)` 从 Query 生成 `AgentEventEnvelope`。
-
-### 4.2 AgentConfig 与 AgentBinding
-
-`AgentConfig` 是迁移期的 Host 内部 Agent 配置投影（不暴露给 SDK）。当前 Query entry adapter 从 Pipeline config 投影出它；未来持久 Agent 也应先投影成这个运行期配置，再由 BindingResolver 结合事件和 scope 解析为 `AgentBinding`。
-
-```python
-class AgentConfig(BaseModel):
-    agent_id: str | None = None
-    runner_id: str
-    runner_config: dict[str, Any] = {}
-    resource_policy: ResourcePolicy = ResourcePolicy()
-    state_policy: StatePolicy = StatePolicy()
-    delivery_policy: DeliveryPolicy = DeliveryPolicy()
-    event_types: list[str] = ["message.received"]
-    enabled: bool = True
-    metadata: dict[str, Any] = {}
-```
-
-`AgentBinding` 是"什么事件调用哪个 AgentRunner、带什么 Agent 配置"的 Host 内部运行投影（不暴露给 SDK）。它是 EventRouter / 当前 QueryEntryAdapter 在一次运行前解析出的有效绑定。
-
-```python
-class AgentBinding(BaseModel):
-    binding_id: str
-    enabled: bool
-    scope: BindingScope
-    event_types: list[str]
-    filters: list[EventFilter] = []   # EBA 阶段使用，见 EVENT_BASED_AGENT
-    runner_id: str
-    runner_config: dict[str, Any]
-    resource_policy: ResourcePolicy
-    state_policy: StatePolicy
-    delivery_policy: DeliveryPolicy
-```
-
-BindingResolver 的基数、fan-out 和冲突处理约束见 PROTOCOL_V1 §13；本节只定义 Host 内部投影形态。
-
-**当前 adapter source**：`QueryEntryAdapter.config_to_agent_config(query, runner_id)`
-先把 current config 投影为迁移期 `AgentConfig`，再由
-`AgentBindingResolver.resolve_one(event, [agent_config])` 解析出唯一
-`AgentBinding`。Pipeline 当前只是迁移期 Agent config source（AI runner config
-→ runner_config、extension preference → resource_policy、output settings →
-delivery_policy），但新设计不再把这些字段命名为 Pipeline 专属概念。
-
-### 4.3 AgentRunnerRegistry
-
-Registry 收集 runner descriptor（来自插件 runtime、开发期本地插件）：
-
-```python
-class AgentRunnerDescriptor(BaseModel):
-    id: str
-    source: Literal["plugin"]
-    label: I18nObject
-    description: I18nObject | None = None
-    plugin_author: str
-    plugin_name: str
-    runner_name: str
-    capabilities: AgentRunnerCapabilities    # 见 PROTOCOL_V1 §4.3
-    permissions: AgentRunnerPermissions      # 见 PROTOCOL_V1 §4.4
-    config_schema: list[DynamicFormItemSchema]
-    plugin_version: str | None = None
-    raw_manifest: dict[str, Any] = {}
-```
-
-职责：调用 `plugin_connector.list_agent_runners()` 拉取 runner、校验 typed `AgentRunnerManifest`、输出 descriptor、缓存 discovery 结果并提供 `refresh()`。单个插件 manifest 失败只记 warning，不影响其它 runner。`plugin:author/name/runner` 是稳定 id 格式；插件实例边界见 PROTOCOL_V1 §13。
-
-Host 内置 runner / adapter 不能作为 `AgentRunnerDescriptor.source` 绕过插件
-runtime、`run_id`、`ctx.resources` 和 `AgentRunAPIProxy` 权限链。若需要
-开发期调试 adapter，应放在 Host 内部测试入口，不进入可选 runner 列表。
-
-刷新触发点：插件安装/卸载/升级/重启后；Pipeline metadata 请求时发现缓存为空；可选 TTL（优先保证正确性）。
-
-### 4.4 AgentRunOrchestrator
-
-Orchestrator 是唯一运行入口：
-
-```text
-run(event, binding)
-  -> resolve runner descriptor
-  -> build resources
-  -> build context
-  -> register run session
-  -> call plugin runtime
-  -> normalize result stream
-  -> update state
-  -> unregister run session
-```
-
-它负责：`run_id` 生成和生命周期、timeout/deadline/cancellation、插件异常隔离、result schema 校验和大小限制、`state.updated` 处理、delivery backpressure 和 telemetry。
-
-典型 run 时序：
-
-```text
-QueryEntryAdapter / EventRouter
-  -> AgentRunOrchestrator.run(event, binding)
-  -> AgentRunnerRegistry.resolve(runner_id)
-  -> AgentResourceBuilder.freeze_snapshot(binding, event)
-  -> AgentRunSessionRegistry.register(run_id, runner_id, snapshot)
-  -> AgentContextBuilder.build(event, binding, snapshot)
-  -> PluginRuntimeConnector.run_agent(ctx)
-       -> AgentRunAPIProxy action
-          -> validate active run session + caller identity + snapshot
-          -> Host API / Store
-       <- AgentRunResult stream
-  -> apply state.updated to PersistentStateStore
-  -> write message.completed to Transcript
-  -> keep current-run files and large tool outputs in sandbox/workspace
-  -> render delivery or raise RunnerExecutionError
-  -> AgentRunSessionRegistry.unregister(run_id)
-```
-
-`run_from_query()` 保留为 Query entry adapter 入口，但内部转换成 event + binding 后走统一 `run()`。约束：`ChatMessageHandler` 不解析 `plugin:*`、不实例化 wrapper、不知道 runner 组件细节；`PipelineService` 从 registry 读取 metadata，不直接访问插件 runtime；跨请求持久化状态必须走授权 storage / 外部服务。
-
-### 4.5 Resource Authorization
-
-LangBot 在每次 run 前生成 `ctx.resources`（PROTOCOL_V1 §6），来自 manifest permissions 与 binding policy 的交集：
-
-1. `descriptor.permissions` 声明 runner 需要的 LangBot 资源访问上限。
-2. binding / resource policy 允许的资源范围。
-3. Agent/runner config 中选择的模型、知识库、文件等资源。
-4. 当前 event / actor / bot / workspace 的实际权限。
-5. `ctx.context.available_apis` 暴露的 pull API 能力。
-
-这次裁剪结果必须冻结为 run-scoped authorization snapshot，并由
-`AgentRunSessionRegistry` 按 `run_id` 保存。`ctx.resources` 是投影给 runner
-看的同一份授权结果；运行期每个 proxy action 只依据该 snapshot 校验 active
-run session、caller plugin identity、resource id、scope、payload size、rate
-limit 和 deadline。Handler 不应重新执行授权裁剪，否则 build-time 与 runtime
-授权逻辑会漂移。
-
-SDK 侧本地校验只用于开发体验，host 侧 run authorization snapshot 才是安全边界。`spec.capabilities` 只帮助 Host 判断 runner 是否需要 tool / knowledge 等资源投影，不能替代 permissions 或 binding policy。skill 不由独立 capability 决定是否投影——它通过统一 tool 授权（`resource_policy.allowed_tool_names`）消费，`skill_authoring` 仅作为「一键授权这组 skill tool + sandbox」的便捷开关。
-
-资源裁剪应通用，不写死 local-agent。selector 与资源的映射示例：`model-fallback-selector` → primary/fallback LLM、`llm-model-selector` → LLM、`rerank-model-selector` → rerank 模型、`knowledge-base-multi-selector` → 知识库；新增 selector 时在 resource builder 中统一扩展。
-
-构造 `ctx.resources.tools` 时，Host 一次塞齐每个工具的完整 schema（`ToolResource.parameters`），runner 不需再逐个 `get_tool_detail` 拉取，减少 N 次往返。
-
-执行/文件/skill/MCP 等能力的接入方向：先由 Host / sandbox 封装成普通 scoped tool，再通过 `ctx.resources.tools` 和 SDK runtime 转发进入 runner；runner 不应识别或硬编码执行环境 provider。外部 harness 的 native tools 不能直接访问 LangBot 资源。skill 的整个生命周期都走统一 tool：发现走 `list_skills` / `langbot_list_assets`，激活/注册走 `activate` / `register_skill`，包内操作走 native exec/read/write——runner 不需要独立的 skill 渲染或门控。
-
-### 4.6 State / Storage
-
-LangBot 可提供 host-owned state 让 runner 寄宿状态（conversation / actor / subject / runner / binding / workspace state），但**不是强制**。Host 只需提供：授权开关、scope key、get/set/list/delete API（见 PROTOCOL_V1 §8）、持久化 backend、审计和清理策略。外部 agent runtime 可维护自己的 session 和 memory。进程内 state store 只能作为过渡实现，不能作为正式生产语义。
-
-部分 host-owned state 由 Host 自身直接写：例如 `activate` tool 在 Host 侧执行时，把已激活 skill 写入 conversation scope 的 `host.activated_skills`。host 直接写与 runner `state.updated` 写到同一 key 时按 **last-write-wins** 合并，runner 可覆盖。
-
-### 4.7 EventLog / Transcript / Sandbox Files（事实源）
-
- `EventLog`: durable append-only，保存原始事件、系统事件、工具调用、投递结果、错误。
- `Transcript`: 从 EventLog 投影出的对话视图，用于 UI、审计和按需历史读取。
- `Sandbox / workspace files`: 当前 run 的上传文件、平台附件、工具大结果和临时产物。Host 负责 staging 与授权边界，runner 通过 read/write/exec 类工具按需访问。
-
-三类数据与 working context 的边界、读取约束见 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md)。AgentRunner 可读取这些能力，但不被迫使用 LangBot 作为唯一记忆系统。
-
-### 4.8 External harness resource projection
-
-Claude Code、Codex、Kimi Code 等外部 harness runner 可能不直接调用 LangBot 的 model/tool loop，而是把 LangBot 事件和授权资源句柄投影到自己的 harness 执行。Host 侧仍保持统一边界：Host 负责构造 event-first context、资源授权、state/storage、EventLog/Transcript、sandbox/workspace 文件边界和审计；Host 或 binding policy 决定哪些 MCP bridge、skill-backed tool、sandbox path、history/state 句柄可投影给 runner；runner plugin 把 scoped projection 转成目标 harness 可消费形式；所有 LangBot 资源访问必须经 SDK runtime / `AgentRunAPIProxy` / SDK-owned MCP bridge 转发并接受 Host 校验；外部 harness 负责自己的 native session、tool loop、压缩、权限模式和 resume，但不能用 native tools 绕过 Host 授权。
-
-投影的具体形态（context 文件、resource handles、LangBot MCP gateway、state pointers）见 AGENT_CONTEXT_PROTOCOL §4.5；当前 code-agent harness runner 形态见 OFFICIAL_RUNNER_PLUGINS §7。发布级隔离要求见 SECURITY_HARDENING。
-
-## 5. SDK 侧协议
-
-SDK 组件入口如下；所有数据结构定义见 PROTOCOL_V1。
-
-```python
-class AgentRunner(BaseComponent):
-    __kind__ = "AgentRunner"
-
-    @classmethod
-    def get_config_schema(cls) -> list[dict]: ...
-
-    async def run(self, ctx: AgentRunContext) -> AsyncGenerator[AgentRunResult, None]: ...
-    # ctx: PROTOCOL_V1 §5.2 ; AgentRunResult: PROTOCOL_V1 §7
-```
-
- Manifest / capabilities / effective access：PROTOCOL_V1 §4。Capabilities 来自组件 manifest 的 `spec.capabilities`，不是 SDK 基类 classmethod。
- `AgentRunContext`：PROTOCOL_V1 §5.2。`messages` / `bootstrap` 不是协议字段。
- `AgentRunResult`：PROTOCOL_V1 §7。
- `AgentRunAPIProxy`：PROTOCOL_V1 §8，是 runner 访问 host 能力的唯一入口，所有请求带 `run_id`。
@@ -1,138 +0,0 @@
-# 官方 AgentRunner 插件迁移计划
-
-本文档描述内置 `RequestRunner` 迁出 LangBot 后，官方 runner 插件如何组织、迁移和验收。它是 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) 和 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) 的下游落地计划，不是 LangBot 宿主协议的设计前提。QA 入口和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。
-
-官方 `local-agent` 可以外移，也可以重写。设计重点不是保留旧内置 runner 的内部结构，而是验证一个依附 LangBot host 基础设施的官方 agent 能否完整工作。同时，LangBot host 协议必须服务 Claude Code SDK、Codex、Pi Agent SDK、外部 Agent 平台等自管 context/runtime 的 runner，不能被官方插件的实现细节绑死。
-
-## 1. 仓库组织
-
-官方 runner 插件与 LangBot 主仓库、SDK 仓库以不同节奏迭代：LangBot 主仓库只维护宿主协议和调度，SDK 仓库维护 AgentRunner 组件和 runtime 协议，官方 runner 插件承载业务 runner 的具体实现和第三方平台适配。
-
-当前推荐"官方插件可独立发布，必要时共享 SDK helper"。开发期采用本地多目录布局：
-
-```text
-langbot-app/
-  langbot-local-agent/                # plugin:langbot/local-agent/default
-    manifest.yaml
-    components/agent_runner/default.{yaml,py}
-  langbot-agent-runner/               # 外部服务 runner 仓库
-    acp-agent-runner/  claude-code-agent/  codex-agent/  dify-agent/  n8n-agent/  ...
-```
-
-后续可聚合进 monorepo，也可继续独立发布——这个选择不影响协议设计。重复逻辑优先沉淀到 SDK 或明确的共享 helper 包，不要把宿主私有结构泄漏给插件。旧 `src/langbot/pkg/provider/runners/*` 只作为历史行为对齐基准；当前未发布分支不提供旧内置 runner 的运行时 fallback。
-
-## 2. 插件命名和 runner id
-
-| 旧 runner | 官方插件 | runner id |
-| --- | --- | --- |
-| `local-agent` | `langbot/local-agent` | `plugin:langbot/local-agent/default` |
-| `dify-service-api` | `langbot/dify-agent` | `plugin:langbot/dify-agent/default` |
-| `n8n-service-api` | `langbot/n8n-agent` | `plugin:langbot/n8n-agent/default` |
-| `coze-api` | `langbot/coze-agent` | `plugin:langbot/coze-agent/default` |
-| - | `langbot/acp-agent-runner` | `plugin:langbot/acp-agent-runner/default` |
-| - | `langbot/claude-code-agent` | `plugin:langbot/claude-code-agent/default` |
-| - | `langbot/codex-agent` | `plugin:langbot/codex-agent/default` |
-| `dashscope-app-api` | `langbot/dashscope-agent` | `plugin:langbot/dashscope-agent/default` |
-| `deerflow-api` | `langbot/deerflow-agent` | `plugin:langbot/deerflow-agent/default` |
-| `langflow-api` | `langbot/langflow-agent` | `plugin:langbot/langflow-agent/default` |
-| `tbox-app-api` | `langbot/tbox-agent` | `plugin:langbot/tbox-agent/default` |
-| `weknora-api` | `langbot/weknora-agent` | `plugin:langbot/weknora-agent/default` |
-
-每个插件可后续提供多个 runner，但迁移目标的默认 runner 统一叫 `default`。
-
-## 3. 迁移批次
-
- **Batch 1（打通协议）**：`local-agent`（能力最完整基准）、`acp-agent-runner` / `claude-code-agent` / `codex-agent`（外部 code-agent harness 路径）、`dify-agent`（传统 service API runner）。
- **Batch 2（外部 workflow）**：`n8n-agent`、`langflow-agent`（webhook/workflow 输入输出、timeout、外部 conversation id）。
- **Batch 3（平台 Agent API）**：`coze-agent`、`dashscope-agent`、`tbox-agent`、`deerflow-agent`、`weknora-agent`（平台特有响应格式、引用资料、文件/图片输入、外部 thread/session 状态）。
-
-## 4. 每个官方插件的组件要求
-
-每个插件至少包含一个 `AgentRunner` 组件，manifest 示例：
-
-```yaml
-apiVersion: langbot/v1
-kind: AgentRunner
-metadata:
-  name: default
-  label: { en_US: Dify Agent, zh_Hans: Dify Agent }
-  description:
-    en_US: Run a Dify application as a LangBot AgentRunner.
-    zh_Hans: 将 Dify 应用作为 LangBot AgentRunner 运行。
-spec:
-  config: []
-  capabilities:        # 字段语义见 PROTOCOL_V1 §4.3
-    streaming: true
-execution:
-  python: { path: ./main.py, attr: DefaultAgentRunner }
-```
-
-## 5. local-agent 插件方向
-
-`local-agent` 是官方插件中能力最完整的消费者，但不是宿主协议的设计中心。它需要证明：一个主要依附 LangBot host 能力的 agent runner 可以通过公开协议完成模型、工具、知识库、状态、history、sandbox 文件访问、上下文压缩和消息投递。
-
-迁移或重写需覆盖旧内置 runner 的用户可见能力：model primary/fallback 选择、prompt、knowledge-bases、rerank-model、rerank-top-k、function calling、streaming、multimodal input、conversation history、monitoring metadata。
-
-责任边界与 Host API 消费方式见 AGENT_CONTEXT_PROTOCOL §8。关键约束：
-
- 从 `ctx.config` 读取静态绑定 `prompt`，**不**读取 `ctx.adapter.extra["prompt"]`；不消费 Query entry adapter 生成的历史窗口。
- 通过 `AgentRunAPIProxy.history` 拉取 transcript，而不是依赖 host 每轮强塞历史窗口。
- `ctx.input.contents` 保留图片/文件等多模态内容；RAG 只替换/插入文本部分，不丢图片/文件。
- 不能绕过 `ctx.resources` 调用未授权模型、工具或知识库。
- manifest 声明功能能力、LangBot 资源 permissions 和配置表单；实际授权来自 manifest permissions 与 binding resource policy、runner config、`ctx.context.available_apis` 和 Host run session snapshot 的交集。
-
-### 5.1 Native Execution / Skills 后续接入
-
-本阶段不把 sandbox/skills 做成 AgentRunner 协议字段。后续 sandbox/skills 分支合并后，命令执行、文件操作、skill、MCP managed process 应先由 Host / sandbox 封装成 scoped tools，再通过 `ctx.resources.tools` 和 SDK runtime 转发暴露给 runner。这让 local-agent 只消费授权后的 Host 基础设施，而不是直接持有宿主机执行能力。
-
-## 6. 外部 runner 插件要求
-
-外部平台 runner 迁移遵循：旧配置字段尽量保持同名便于 migration 复制；输出统一转换为 `AgentRunResult`；外部 API timeout 从 runner config 读取；平台 conversation id 存 plugin storage 或 context runtime state，不依赖 LangBot 内置 conversation uuid 私有结构；流式按平台能力声明，没有流式就只发 `message.completed`。
-
-### 6.1 Code-agent harness runner
-
-Claude Code、Codex、Kimi Code 这类 runner 不一定通过 LangBot 的模型/工具 loop 执行，可以依赖自己的 harness，但仍必须遵守统一 Host 边界。总体边界见 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) §4.8；context projection 形态见 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) §4.5；发布级要求见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
-
-本文件只补充官方 runner 的实现要求：输入来自 `ctx.event` / `ctx.input`，不依赖 Pipeline 私有 `Query`；外部 session id / workspace / checkpoint 写入 Host state 或 plugin storage；插件实例边界见 PROTOCOL_V1 §13；CLI / subprocess runner 必须处理 timeout、取消、空输出、非零退出和 stderr 映射。
-
-实现结构应把 provider-native output 解析与 LangBot result stream 组装分开：Claude stream-json、Codex JSONL、Kimi / OpenCode 事件等只在 runner adapter 内解析，输出统一归一为 `AgentRunResult`（`message.completed` / `message.delta`、`state.updated`、`run.completed` / `run.failed`）。文件和工具大结果留在当前 run 的 sandbox/workspace，通过消息 metadata、attachment ref 或 path 指向。未知 native event 不应导致 run 崩溃；应记录诊断 metadata 或 warning。新增 harness 时优先补 native fixture -> `AgentRunResult` 的转换测试，再接 WebUI smoke。
-
-并发约束应按外部 session 粒度表达，而不是按 Agent / runner id / 插件实例表达；Agent 复用和全局锁边界见 PROTOCOL_V1 §13。若 runner 使用 `external.session_id` / `thread_id` resume 到同一 native session，且该 harness 不支持并发 turn，runner 应按稳定 external session key 串行写入；一次性 subprocess runner 可以只在单次 `run(ctx)` 内处理，长连接/daemon runner 则应采用 reader 独占 native stream、turn writer 串行写入的结构。
-
-### 6.2 LangBot MCP gateway
-
-外部 harness 不能直接持有进程内的 `plugin_runtime_handler`，也不能用自己的 native tools 直接访问 LangBot 资源。外部 harness runner 应通过稳定 HTTP MCP gateway 或 SDK-owned bridge 把 harness 的工具请求转回 SDK runtime / Host API：
-
- Gateway 由 runner 插件启动，暴露稳定的 `langbot_history_page`、`langbot_retrieve_knowledge`、`langbot_call_tool` 等最小工具面。
- Harness 每次调用必须携带当前 LangBot `run_id`；Host 仍按 run session、caller identity 和授权快照校验。
- Gateway 只转发 LangBot 资产访问，不承担外部 harness 的文件、进程或 native tool 权限边界。
-
-第一批工具保持很小：history page、knowledge retrieve、authorized tool call。新增工具必须先有 Host action 权限与 run-scoped authorization，再由 gateway 投影。
-
-## 7. Code-agent harness runner 当前形态
-
-外部 code-agent harness 由直接 runner 插件承接，例如 `acp-agent-runner`、`claude-code-agent`、`codex-agent`，每个 runner 负责把目标 harness 的 native session、workspace、MCP bridge 和输出事件转换为统一 `AgentRunResult`。本地 smoke 验收入口与记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。
-
-当前形态：
-
- Runner ID 示例：`plugin:langbot/acp-agent-runner/default`、`plugin:langbot/claude-code-agent/default`、`plugin:langbot/codex-agent/default`。
- Runner 可通过 ACP、远端 daemon、本机 subprocess 或外部 HTTP API 调用 harness；harness 的安装、登录态、workspace 和 provider-native 权限由该运行环境负责。
- Runner 会把当前 LangBot `run_id`、可访问资源摘要和 gateway 使用规则注入本次消息；harness 通过 gateway 回填 `run_id` 后访问 LangBot 资产。
- 外部 session id / workspace / checkpoint 写回 Host state 或 plugin storage，后续轮次可复用目标 harness 会话。
-
-### 7.1 当前限制
-
-这不是发布级安全边界实现；LangBot 只约束 LangBot 持有资产的访问，外部 harness 的文件、进程、workspace、provider-native MCP 和模型凭据由对应 runner 的运行环境承担。当前 `run_id` 可由系统提示词、ACP metadata 或 runner 自有 session metadata 传递给 harness 并由 gateway 校验。runtime 管控面方向见 [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md)。
-
-## 8. 发布和安装策略
-
-最终 LangBot 安装/升级时需保证官方 runner 插件可用，可选方案：首次启动检测缺失并提示安装；打包发行版预装；migration 前检查插件存在性。当前分支未发布，因此不把历史配置兼容或旧内置 runner fallback 写入运行时协议面。建议顺序：开发阶段用本地路径插件 → 发布前支持 marketplace 安装 → 若发布升级需要迁移历史配置，再在 release gate 中实现一次性 migration 并要求官方插件已可用。
-
-## 9. 验收标准
-
- 每个目标 runner 都有对应官方 AgentRunner 插件和稳定 runner id；当前配置只使用 `ai.runner.id` + `ai.runner_config[id]`。
- LangBot 主聊天路径不再通过 `RequestRunner` 执行业务 runner。
- 官方插件测试覆盖非流式、流式、错误、timeout、配置缺失。
- `local-agent` 能完成模型 fallback、tool calling、知识库检索、多模态输入、静态绑定 prompt 消费、history API 拉取、rerank。
- 外部 code-agent harness runner 能消费 event-first context、投影 scoped resources、保存 external session state，并通过 WebUI Debug Chat smoke。
- `local-agent` 覆盖旧内置 runner 的用户可见核心能力；代码结构和运行路径不需要相同。
@@ -1,736 +0,0 @@
-# LangBot AgentRunner Protocol v1
-
-本文档是 LangBot Host 与插件 SDK / Runtime / AgentRunner 之间协议合同的**唯一规范来源（single source of truth）**。
-
- 本文件描述当前 Protocol v1 稳定合同，不混入验收流水。当前实现状态见 [STATUS.md](./STATUS.md)，测试执行入口见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)，安全发布门槛见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
- 本文件之外的任何文档**不得重新定义这里的数据结构**，只能引用，例如"见 PROTOCOL_V1 §4.2"。
- Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、Descriptor、各 Store）不属于 SDK 协议，定义在 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)。
-
-## 1. 协议目标
-
-Protocol v1 只解决四件事：
-
- LangBot 如何发现插件提供的 AgentRunner。
- LangBot 如何把一次事件调用封装成 `AgentRunContext`。
- AgentRunner 如何以事件流形式返回运行结果。
- AgentRunner 如何通过受限 API 访问 LangBot host 能力。
-
-Protocol v1 **不定义**：
-
- LangBot 内部如何持久化 `AgentBinding`（见 HOST_SDK）。
- AgentRunner 内部如何组装 prompt、压缩历史、管理 memory（见 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md)）。
- 官方 runner 的具体实现（见 [OFFICIAL_RUNNER_PLUGINS.md](./OFFICIAL_RUNNER_PLUGINS.md)）。
- Pipeline 的长期配置模型。
- 发布级安全 hardening 的完整实现（见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)）。
-
-## 2. 参与方
-
-| 名称 | 职责 |
-| --- | --- |
-| LangBot Host | 事件入口、绑定解析、权限、资源、存储、生命周期、结果投递。 |
-| Plugin Runtime | 加载插件，响应 Host 的 runner discovery 和 run 调用。 |
-| AgentRunner | 插件提供的 agent 执行组件。 |
-| AgentRunAPIProxy | AgentRunner 访问 Host 能力的受限 API。 |
-| AgentBinding | Host 内部的事件到 runner 绑定配置，不直接暴露给 SDK（见 HOST_SDK §4.2）。 |
-
-产品层的 `Agent` 替代旧 Pipeline 承载 agent 配置：bot / IM channel
-绑定一个 Agent，一个 Agent 可以被多个 bot / channel 复用。Host 内部的
-`AgentBinding` 是一次事件运行前解析出的有效绑定，只影响 Host 构造出的
-`ctx.config`、`ctx.resources`、`ctx.context` 和 `ctx.delivery`。SDK 不需要知道
-Agent / binding 的持久化形态。
-
-外部 harness runner（Claude Code、Codex、Kimi Code 等）也是 `AgentRunner`：它们消费 event-first `AgentRunContext`、返回 `AgentRunResult`，并通过 Host 授权的 state/storage API 保存跨轮次指针；当前运行文件和工具大结果进入 sandbox/workspace。它们内部可以继续使用自己的 session、tool loop、MCP、上下文压缩和权限模型。
-
-## 3. 协议演进
-
-当前 AgentRunner 合同不暴露显式 `protocol_version` 字段。协议演进先按字段级兼容规则处理：
-
- 新增可选字段保持向后兼容。
- 删除字段或改变既有字段语义，需要在 SDK 发布前完成；发布后应走新的显式兼容方案。
- 结果流演进：Host **必须忽略未知 result type 并记录 warning**（除非该 type 明确要求强校验）。SDK envelope 接收入站未知 `type` 字符串，runner 侧可按原字符串转发或忽略；新增 result type 不提升大版本。
- SDK 入站 context 类实体偏宽松，用于兼容 Host 附加的非核心字段；manifest、result payload、page/result 返回与错误模型偏严格，未知字段默认禁止。安全边界仍在 Host，SDK 校验只提升开发体验。
-
-## 4. Discovery 协议
-
-### 4.1 LIST_AGENT_RUNNERS
-
-Host 调用 Plugin Runtime 获取当前插件暴露的 runner 列表，请求无额外 payload。返回：
-
-```python
-class ListAgentRunnersResponse(BaseModel):
-    runners: list[AgentRunnerDiscovery]
-
-class AgentRunnerDiscovery(BaseModel):
-    plugin_author: str
-    plugin_name: str
-    runner_name: str
-    manifest: AgentRunnerManifest
-```
-
-`manifest` 是 SDK typed `AgentRunnerManifest`，由 Runtime 从插件组件 manifest 解析并校验后返回。`plugin_author` / `plugin_name` / `runner_name` 保留为 transport 寻址字段；Host 以它们生成稳定 runner id，并把 `manifest.id` 校验为 `plugin:author/name/runner`。单个 runner manifest 解析失败时 Runtime/Host 记录 warning 并跳过该 runner，不影响同一插件或其它插件的 runner discovery。
-
-### 4.2 AgentRunnerManifest
-
-这里的 manifest 指 Runtime 返回给 Host 的 typed runner manifest：
-
-```python
-class AgentRunnerManifest(BaseModel):
-    id: str
-    name: str
-    label: I18nObject
-    description: I18nObject | None = None
-    capabilities: AgentRunnerCapabilities = AgentRunnerCapabilities()
-    permissions: AgentRunnerPermissions = AgentRunnerPermissions()
-    config_schema: list[DynamicFormItemSchema] = []
-    metadata: dict[str, Any] = {}
-```
-
- runner id 由 Host 生成，格式 `plugin:author/name/runner`。
- `name` 是插件内 runner 名称，例如 `default`。
- `config_schema` 只描述绑定配置表单，不代表插件实例状态。
- `capabilities` 是 Host 用于 UI 和资源投影的 typed bool model；它不是权限授予。
- `permissions` 是 runner 申请的 LangBot 资源访问上限；实际授权仍必须与 binding policy 求交。
- `metadata` 只放展示、诊断、非稳定扩展信息。
-
-### 4.3 Capabilities
-
-```python
-class AgentRunnerCapabilities(BaseModel):
-    streaming: bool = False
-    tool_calling: bool = False
-    knowledge_retrieval: bool = False
-    multimodal_input: bool = False
-    skill_authoring: bool = False
-    interrupt: bool = False
-    steering: bool = False
-
-    model_config = ConfigDict(extra="forbid")
-```
-
- `streaming`: runner 可以返回 `message.delta`。
- `tool_calling`: runner 可能调用 Host tool API。
- `knowledge_retrieval`: runner 可能调用 Host knowledge API。
- `multimodal_input`: runner 可以处理非纯文本 input / attachment。
- `skill_authoring`:（降级为便捷开关，非访问硬前提）声明该 runner 期望使用 LangBot skill 工具链。skill 本身通过**统一 tool 授权**获得——发现走 `list_skills` / `langbot_list_assets`，激活/注册走 `activate` / `register_skill`，操作走 native exec/read/write，全部计入 `resource_policy.allowed_tool_names`。该 capability 仅作为「一键授权这组 skill tool + sandbox」的便捷开关，不再单独决定 skill 是否可用。
- `interrupt`: runner 支持取消或中断。
- `steering`: runner 支持在 turn 边界通过 Host pull API 消费同 conversation 在途追加消息。
-
-Capabilities 字段全部是 `bool`，未知 key 禁止进入 typed manifest。早期草案里的上下文/会话类 capability 已删除；对应语义由 event-first context 和 runner-owned context 原则表达。
-
-### 4.4 Permissions 与 Effective Access
-
-```python
-class AgentRunnerPermissions(BaseModel):
-    models: list[Literal["invoke", "stream", "rerank"]] = []
-    tools: list[Literal["detail", "call"]] = []
-    knowledge_bases: list[Literal["list", "retrieve"]] = []
-    history: list[Literal["page", "search"]] = []
-    events: list[Literal["get", "page"]] = []
-    storage: list[Literal["plugin", "workspace"]] = []
-    files: list[Literal["config", "knowledge"]] = []
-
-    model_config = ConfigDict(extra="forbid")
-```
-
-平台动作执行不属于当前 permissions。Platform action executor / EBA action 分支落地前，runner 只能返回 `action.requested` telemetry，Host 不执行平台动作。
-
-Runner 实际可用 LangBot 资源来自 Host 在 run 前冻结的授权快照：
-
-```text
-effective_access = manifest.permissions ∩ binding.resource_policy ∩ current scope/config
-```
-
-具体落地：
-
-1. `AgentResourceBuilder` 先用 manifest permissions 与 binding resource policy / runner config 求交，生成 `ctx.resources`。
-2. `AgentContextBuilder` 用 manifest permissions 与 binding state/storage policy 求交，生成 `ctx.context.available_apis`。
-3. `AgentRunSessionRegistry` 冻结 run-scoped resources 与 available APIs。
-4. Runtime handler / `AgentRunAPIProxy` 按 active `run_id`、runner identity、caller plugin identity、resource id、scope、payload size、rate limit 和 deadline 校验每次调用。
-
-反承诺：manifest permissions **只约束 LangBot 持有的资源访问**。它不承诺限制外部 harness 的 native shell、文件系统、CLI、MCP、网络或本机权限；这些能力由 operator/runtime/sandbox 另行约束，见 HOST_SDK §4.8 与 SECURITY_HARDENING。
-
-默认原则：
-
- Host 不得默认 inline 全量历史。
- Host 只 inline 当前 event / input 和 context handles。
- Runner 拥有 working context assembly。
- Runner 可在授权后通过 Host history / event / state API 拉取更多上下文，并通过授权 sandbox/workspace 工具访问当前运行文件。
- 历史窗口策略不属于 Protocol v1 字段，也不属于 Host 通用语义。
-
-context 边界的设计理由见 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md)。
-
-## 5. Run 协议
-
-### 5.1 RUN_AGENT
-
-Host 调用 Runtime：
-
-```python
-class AgentRunRequest(BaseModel):
-    runner_id: str
-    runner_name: str
-    context: AgentRunContext
-```
-
-Runtime 返回 `AgentRunResult` 异步流。底层 transport 可继续用 `plugin_author` / `plugin_name` / `runner_name` 定位组件，但协议语义以 `runner_id` 和 `context` 为准。
-
-### 5.2 AgentRunContext
-
-这是 SDK 看到的**唯一权威 context 定义**。
-
-```python
-class AgentRunContext(BaseModel):
-    run_id: str
-    trigger: AgentTrigger
-    event: AgentEventContext
-    conversation: ConversationContext | None = None
-    actor: ActorContext | None = None
-    subject: SubjectContext | None = None
-    input: AgentInput
-    delivery: DeliveryContext
-    resources: AgentResources
-    context: ContextAccess
-    state: AgentRunState
-    runtime: AgentRuntimeContext
-    config: dict[str, Any] = {}
-    adapter: AdapterContext | None = None
-    metadata: dict[str, Any] = {}
-```
-
-核心约束：
-
- `event` 是必选字段，Protocol v1 是 event-first。
- `input` 表示当前事件的主输入，不等于历史消息。
- `bootstrap` / `messages` **不是协议字段**；Host 不内联历史窗口。
- `adapter` 只放入口 adapter 的非核心元数据，runner 不应依赖它做长期能力。
- `config` 是 Agent/runner config，不是插件实例状态。
-
-### 5.3 AgentTrigger
-
-```python
-class AgentTrigger(BaseModel):
-    type: str
-    source: Literal["platform", "webui", "api", "scheduler", "system", "host_adapter"]
-    timestamp: int | None = None
-```
-
-`trigger.type` 应与 `event.event_type` 一致或更粗粒度。例如入口适配器触发消息时：
-
-```json
-{ "type": "message.received", "source": "host_adapter" }
-```
-
-### 5.4 AgentEventContext
-
-```python
-class AgentEventContext(BaseModel):
-    event_id: str
-    event_type: str
-    event_time: int | None = None
-    source: str
-    source_event_type: str | None = None
-    raw_ref: RawEventRef | None = None
-    data: dict[str, Any] = {}
-```
-
- `event_type` 使用 LangBot 稳定协议名，例如 `message.received`。稳定事件名清单见 [EVENT_BASED_AGENT.md](./EVENT_BASED_AGENT.md)。
- 平台原始事件名放入 `source_event_type`。
- 大型原始 payload 必须放入 `raw_ref` 或 staged file，不应直接塞入 `data`。
-
-### 5.5 Conversation / Actor / Subject
-
-```python
-class ConversationContext(BaseModel):
-    conversation_id: str | None = None
-    thread_id: str | None = None
-    launcher_type: str | None = None
-    launcher_id: str | None = None
-    sender_id: str | None = None
-    bot_id: str | None = None
-    workspace_id: str | None = None
-    session_id: str | None = None
-
-class ActorContext(BaseModel):
-    actor_type: str
-    actor_id: str | None = None
-    actor_name: str | None = None
-    metadata: dict[str, Any] = {}
-
-class SubjectContext(BaseModel):
-    subject_type: str
-    subject_id: str | None = None
-    data: dict[str, Any] = {}
-```
-
-示例：
-
- 消息事件：actor 是发消息的人，subject 是当前消息。
- 入群事件：actor 是新成员或邀请人，subject 是群/成员关系。
- 定时事件：actor 可以是 system，subject 是 schedule。
-
-### 5.6 AgentInput
-
-```python
-class AgentInput(BaseModel):
-    text: str | None = None
-    contents: list[ContentElement] = []
-    attachments: list[InputAttachment] = []
-```
-
- 文本、多模态、附件都属于当前 event input。
- 大文件、图片、音频、工具大结果应进入授权 sandbox/workspace，input attachment 只携带轻量 metadata/path/url/content。
- 平台原始消息链不属于 SDK `AgentInput`；需要诊断时放在 Host 内部 envelope 或 `ctx.adapter.extra` 的一次性兼容字段中，不作为长期 runner 合同。
-
-### 5.7 DeliveryContext
-
-```python
-class DeliveryContext(BaseModel):
-    surface: str
-    reply_target: dict[str, Any] | None = None
-    supports_streaming: bool = False
-    supports_edit: bool = False
-    supports_reaction: bool = False
-    max_message_size: int | None = None
-    platform_capabilities: dict[str, Any] = {}
-```
-
-Runner 可参考 delivery 能力决定返回 `message.delta`、`message.completed` 或 `action.requested`。
-
-### 5.8 ContextAccess
-
-```python
-class ContextAccess(BaseModel):
-    conversation_id: str | None = None
-    thread_id: str | None = None
-    latest_cursor: str | None = None
-    event_seq: int | None = None
-    transcript_seq: int | None = None
-    has_history_before: bool = False
-    inline_policy: InlineContextPolicy
-    available_apis: ContextAPICapabilities
-
-class InlineContextPolicy(BaseModel):
-    mode: Literal["none", "current_event", "recent_tail", "summary_tail"]
-    delivered_count: int = 0
-    source_total_count: int | None = None
-    messages_complete: bool = False
-    reason: str | None = None
-
-class ContextAPICapabilities(BaseModel):
-    prompt_get: bool = False
-    history_page: bool = False
-    history_search: bool = False
-    event_get: bool = False
-    event_page: bool = False
-    state: bool = False
-    storage: bool = False
-    steering_pull: bool = False
-```
-
-`ContextAccess` 告诉 runner：Host inline 了什么、没 inline 什么、需要更多上下文时走哪些 API。它是 runner 按需读取上下文的入口说明，不是 Host 的业务上下文编排策略。
-
-### 5.9 AgentRuntimeContext
-
-```python
-class AgentRuntimeContext(BaseModel):
-    langbot_version: str | None = None
-    trace_id: str | None = None
-    deadline_at: float | None = None
-    metadata: dict[str, Any] = {}
-```
-
-### 5.10 AgentRunState
-
-```python
-class AgentRunState(BaseModel):
-    conversation: dict[str, Any] = {}
-    actor: dict[str, Any] = {}
-    subject: dict[str, Any] = {}
-    runner: dict[str, Any] = {}
-```
-
-State 是可选 host-owned snapshot。Runner 也可以完全自管状态。
-
-## 6. Resources
-
-```python
-class ToolResource(BaseModel):
-    tool_name: str
-    tool_type: str | None = None
-    description: str | None = None
-    parameters: dict[str, Any] | None = None  # 完整 JSON schema，由 Host 一次塞齐
-    operations: list[Literal["detail", "call"]] = []
-
-class SkillResource(BaseModel):
-    skill_name: str
-    display_name: str | None = None
-    description: str | None = None
-
-class AgentResources(BaseModel):
-    models: list[ModelResource] = []
-    tools: list[ToolResource] = []
-    knowledge_bases: list[KnowledgeBaseResource] = []
-    skills: list[SkillResource] = []
-    storage: StorageResource = StorageResource()
-    platform_capabilities: dict[str, Any] = {}
-```
-
-`tools` 携带每个授权工具的完整 schema（`parameters`），由 Host 在构造 `ctx.resources` 时一次塞齐，runner 不需再逐个调用 `get_tool_detail` 拉取，减少 N 次往返。
-
-`skills` 是本次 run 中 pipeline-visible 的 skill facts（`skill_name`、`display_name`、`description`）。**skill 通过统一 tool 形式消费，不是独立资源类别**：发现走 `list_skills` tool（或 `langbot_list_assets` 增加 skills 一类），激活走 `activate`，操作走 native exec/read/write。Host **不**把 skill 索引注入 system prompt，也不做 progressive-disclosure 注入；LLM 通过调用发现工具主动查询 skill 清单。Host **可选**在 ctx 提供预渲染的 `suggested_skill_prompt`（首轮延迟优化，runner 可忽略 / override），但它不是访问前提。`skills` 字段本身仅作为发现工具的数据来源与该可选预渲染的输入。
-
-资源列表是本次 run 的授权结果。History / Event / State / Storage 访问通过 `ctx.context.available_apis` 和 Host 侧 run session 校验控制，不作为可枚举 resource list 暴露。Runner 只能通过 `AgentRunAPIProxy` 访问这些能力。当前事件的文件和工具大结果优先进入授权 sandbox/workspace，由 runner 通过 read/write/exec 类工具按需读取。
-
-## 7. Result Stream
-
-### 7.1 AgentRunResult envelope
-
-```python
-JSONValue = str | int | float | bool | None | list["JSONValue"] | dict[str, "JSONValue"]
-
-ResultType = Literal[
-    "message.delta",
-    "message.completed",
-    "tool.call.started",
-    "tool.call.completed",
-    "state.updated",
-    "action.requested",
-    "run.completed",
-    "run.failed",
-]
-
-class AgentRunResult(BaseModel):
-    run_id: str
-    type: AgentRunResultType | str
-    data: dict[str, Any] = {}
-    usage: LLMTokenUsage | None = None
-    sequence: int | None = None
-    timestamp: int | None = None
-```
-
-SDK 当前实现是单一 envelope：`type` 枚举 + `data` dict。Payload 由 SDK typed model 构造并 dump，但 wire 不改成 discriminated union；这样新旧版本偏斜时 Host 仍可按 §3 忽略未知 `type`。
-
-`usage` 是 runner 可选上报的 token 使用量，沿用 SDK `LLMTokenUsage`：
-
-```python
-class LLMTokenUsage(BaseModel):
-    prompt_tokens: int | None = None
-    completion_tokens: int | None = None
-    total_tokens: int | None = None
-    # provider-specific detail/cached/reasoning counters are preserved as extra fields
-```
-
-约束：
-
- 运行时能观测到 provider/runtime usage 时，SHOULD 在 terminal `run.completed.usage` 上报本次 run 的最终聚合 token usage。
- `run.failed.usage` MAY 上报失败前已经产生的部分 usage。
- 不能观测 usage 的 runner 合法地省略该字段；缺失表示 unknown，Host 不得按 0 处理。
- ACP 等外部协议不保证统一 usage；ACP runner 只能在具体 provider/native event 提供 usage 时填充本字段。
- cost 不作为 runner result 的权威字段。Host 后续应基于 usage、model identity、时间和自身价格表计算账单成本；provider 原始 cost 如需保留，可放在 `usage` extra 字段中作为非权威 telemetry。
-
-Host 边界分级校验：
-
- `message.delta`、`message.completed`、`state.updated`、`action.requested`、`run.completed`、`run.failed` 属于会影响投递或 Host 副作用的严格 payload；校验失败时丢弃该 result 并记录 warning。
- `tool.call.started`、`tool.call.completed` 当前只作为 telemetry，payload 宽松兼容。
- 未知 `type` 忽略并记录 warning。
-
-### 7.2 稳定 result payloads
-
-| type | `data` payload |
-| --- | --- |
-| `message.delta` | `{ "chunk": MessageChunk }` |
-| `message.completed` | `{ "message": Message }` |
-| `tool.call.started` | `{ "tool_call_id": str, "tool_name": str, "parameters": dict }` |
-| `tool.call.completed` | `{ "tool_call_id": str, "tool_name": str, "result": dict \| None, "error": str \| None }` |
-| `state.updated` | `{ "scope": "conversation" \| "actor" \| "subject" \| "runner", "key": str, "value": JSONValue }` |
-| `action.requested` | `{ "action": str, "target": dict \| None, "payload": dict \| None }` |
-| `run.completed` | `{ "finish_reason": str, "message"?: Message }` |
-| `run.failed` | `{ "code": str, "error": str, "retryable": bool }` |
-
-Runner 生成的大文件、工具输出和临时产物不通过 result event 回传；应写入当前 run 的授权 sandbox/workspace，再用消息文本、metadata 或 attachment reference 指向它们。
-
-### 7.3 稳定 result types
-
-| type | 说明 | 当前消费 |
-| --- | --- | --- |
-| `message.delta` | 流式消息片段。 | ✅ |
-| `message.completed` | 完整消息。 | ✅ |
-| `tool.call.started` | 工具调用开始的可观测事件。 | telemetry |
-| `tool.call.completed` | 工具调用完成的可观测事件。 | telemetry |
-| `state.updated` | runner 请求更新 host-owned state。 | ✅ |
-| `action.requested` | runner 请求 Host 执行平台动作。 | **reserved / 仅 telemetry，不执行** |
-| `run.completed` | run 正常结束。 | ✅ |
-| `run.failed` | run 失败。 | ✅ |
-
-`action.requested` 是为 EBA 和 platform API 保留的协议表面：本分支 Host 收到后只记 telemetry，**不执行**，runner 作者不应在当前 Host 底座中依赖其副作用。真实执行器由外部 EBA / platform action 分支接入；执行模型见 EVENT_BASED_AGENT §6。
-
-Host 必须校验 `state.updated` 的 scope、key、value 大小和 JSON 可序列化性。本分支 `action.requested` 仍只记录 telemetry。
-
-除 runner 经 `state.updated` 写之外，Host 自身也可直接写部分 host-owned state。例如 `activate` tool 在 Host 侧执行时，直接把已激活 skill 写入 conversation scope 的 `host.activated_skills` 快照。当 host 直接写与 runner `state.updated` 写到同一 key 时，按 **last-write-wins** 合并——runner 可以覆盖 host 写的快照。
-
-### 7.4 Stream delivery semantics
-
- Host 按 Runtime stream 顺序消费 result。当前 v1 不定义跨连接 replay，也不承诺 at-least-once；从 Host 视角，收到的 result 最多应用一次。
- `sequence` 是单个 `run_id` 内的结果序号。in-process / stdio 这类天然有序的在线 stream 可以省略；任何会缓冲、重放、跨进程队列或 runtime-managed task 的 transport 必须提供从 1 开始严格递增的 `sequence`。
- Host 看到已提供 `sequence` 的 result 时，应按 `(run_id, sequence)` 做重复检测，并在缺号或乱序时记录 warning；除非 transport 明确声明 replay 语义，Host 不应自行等待缺失序号重排用户可见输出。
- `run.failed.data.retryable` 只表示整次 run 理论上可由上层重试；Protocol v1 不自动重试 run，也不自动重试 proxy action。
- History / Event / Transcript cursor 是 opaque token。runner 不得解析 cursor，也不得假设 cursor 在不同 API、conversation、thread 或 retention window 之间可比较；当前实现即使返回数字字符串，也只是实现细节。
-
-### 7.5 示例
-
-```json
-{ "type": "message.delta",     "data": { "chunk": { "role": "assistant", "content": "hel" } } }
-{ "type": "message.completed", "data": { "message": { "role": "assistant", "content": "hello" } } }
-{ "type": "state.updated",     "data": { "scope": "conversation", "key": "external.session_id", "value": "abc" } }
-{ "type": "action.requested",  "data": { "action": "message.edit", "target": {"message_id": "..."}, "payload": {"text": "..."} } }
-```
-
-## 8. AgentRunAPIProxy
-
-所有 proxy action 必须携带 `run_id`。Host 必须校验：active run session 存在、caller plugin identity 匹配、resource 在本次 `ctx.resources` 中授权、scope 不越界、payload size / rate limit / deadline 合法。
-
-```python
-# Model
-await api.invoke_llm(llm_model_uuid, messages, funcs=None, extra_args=None)
-await api.invoke_llm_with_usage(llm_model_uuid, messages, funcs=None, extra_args=None)
-async for chunk in api.invoke_llm_stream(llm_model_uuid, messages, funcs=None, extra_args=None):
-    ...
-async for event in api.invoke_llm_stream_events(llm_model_uuid, messages, funcs=None, extra_args=None):
-    ...
-await api.invoke_rerank(rerank_model_id, query, documents, top_k=None)
-
-# Tool
-await api.get_tool_detail(tool_name)
-await api.call_tool(tool_name, parameters)
-
-# Knowledge
-await api.retrieve_knowledge(kb_id, query_text, top_k=5, filters=None)
-
-# History（返回 Transcript projection，不返回原始平台 payload）
-await api.get_prompt()
-await api.history_page(conversation_id=None, before_cursor=None, after_cursor=None,
-                       limit=50, direction="backward", include_attachments=False)
-await api.history_search(query, filters=None, top_k=10)
-
-# Event（返回稳定 event envelope 或受限 raw ref，不默认返回大 payload）
-await api.event_get(event_id)
-await api.event_page(conversation_id=None, event_types=None, before_cursor=None, limit=50)
-await api.steering_pull(mode="all", limit=None)
-
-# State / Storage
-await api.state_get(scope, key);   await api.state_set(scope, key, value);   await api.state_delete(scope, key)
-await api.state_list(scope, prefix=None, limit=100)
-await api.get_plugin_storage(key); await api.set_plugin_storage(key, value); await api.delete_plugin_storage(key)
-await api.get_plugin_storage_keys()
-await api.get_workspace_storage(key); await api.set_workspace_storage(key, value); await api.delete_workspace_storage(key)
-await api.get_workspace_storage_keys()
-
-# Host info
-await api.get_langbot_version()
-```
-
-`invoke_llm()` / `invoke_llm_stream()` 的第一个参数在 SDK 中命名为
-`llm_model_uuid`，wire payload 字段也是 `llm_model_uuid`。该值对 runner
-仍是 opaque identifier，不应解析其内部格式。
-
-`invoke_llm()` 和 `invoke_llm_stream()` 保持兼容：前者返回 `Message`，后者只
-yield `MessageChunk`。需要 provider 真实 token 计量的 runner 应使用
-`invoke_llm_with_usage()` 或 `invoke_llm_stream_events()`。Host response 可在
-原有 `{message: ...}` / `{chunk: ...}` 外额外携带可选 `usage` 字段；streaming
-场景允许在所有 chunk 之后追加一个 usage-only event。`usage` 至少保留
-OpenAI-compatible 的 `prompt_tokens`、`completion_tokens`、`total_tokens`，
-若 provider 返回 `prompt_tokens_details` / `completion_tokens_details` 或
-cache token counters，Host / SDK 不应丢弃这些字段。没有 usage 的 provider
-必须继续返回成功响应，SDK 将 usage 置为 `None`。
-
-`get_prompt()` 返回当前 query-backed run 的 Host effective prompt messages：
-`list[Message]` 的 JSON 形式。该能力只在 `ctx.context.available_apis.prompt_get`
-为 true 时可用；没有 query 缓存、prompt 已过期或非 query entry run 时 Host
-可以返回错误或空列表。Runner 应在不可用时回退到自己的 config/prompt 策略。
-
-`steering_pull(mode="all")` 是推荐默认：Host 按 claim 顺序返回全部 pending steering 输入并清空对应队列。`mode="one-at-a-time"` 仅用于 runner 主动节流，每次返回一条。Host 不合并多条用户消息；runner 负责在 turn 边界决定模型侧格式。
-
-Steering 审计使用 EventLog 而不是 Transcript schema 扩展：被 active run 吸收的原始 `message.received` 事件保留原事件类型，并在 `metadata.steering` 标记 `status="queued"`、`trigger_behavior="absorbed_into_active_run"`、`claimed_by_run_id`、`claimed_runner_id`、`claimed_at`。Runner 成功 pull 后，Host 追加 `steering.injected` EventLog 记录，`metadata.steering.status="injected"` 并引用 `source_event_id`。若 run 结束时仍有已 claim 但未 pull 的 steering 输入，Host 追加 `steering.dropped` EventLog 记录，`metadata.steering.status="dropped"` 并引用 `source_event_id`；这不是用户消息事实的删除，只是 dispatch 终态。Transcript 继续只表示会话事实，不承担 dispatch 行为标记。
-
-`state` 与 `storage` 的建议边界：`state` 放小型 JSON（conversation / actor / subject / runner），`storage` 放 blob 或较大数据（插件私有数据、workspace 数据、checkpoint）。
-
-Compaction checkpoint 的推荐 state 约定：
-
- scope: `conversation`
- key: `runner.compaction.checkpoint`
- value:
-
-```json
-{
-  "schema_version": "langbot.local_agent.compaction_checkpoint.v1",
-  "summary": "<conversation_summary>...</conversation_summary>",
-  "covers_until": "transcript-cursor-or-seq",
-  "tokens_before": 12345,
-  "created_at": 1710000000,
-  "conversation_id": "conv-..."
-}
-```
-
-`covers_until` 是摘要覆盖到的 transcript 游标锚点。Runner 读取 checkpoint 后应只拉取该游标之后的 transcript；若 checkpoint 缺失、schema 不匹配、conversation 不匹配或游标不可用，应回退到无 checkpoint 的尾部历史拉取行为。
-
-Proxy 返回数据结构也属于本协议：
-
-```python
-class TranscriptItem(BaseModel):
-    transcript_id: str
-    event_id: str
-    conversation_id: str | None = None
-    thread_id: str | None = None
-    role: str
-    item_type: str = "message"
-    content: str | None = None
-    content_json: dict[str, Any] | None = None
-    attachment_refs: list[dict[str, Any]] = []
-    seq: int | None = None
-    cursor: str | None = None
-    created_at: int | None = None
-    metadata: dict[str, Any] = {}
-
-class HistoryPage(BaseModel):
-    items: list[TranscriptItem] = []
-    next_cursor: str | None = None
-    prev_cursor: str | None = None
-    has_more: bool = False
-    total_count: int | None = None
-
-class HistorySearchResult(BaseModel):
-    items: list[TranscriptItem] = []
-    total_count: int | None = None
-    query: str
-
-class AgentEventRecord(BaseModel):
-    event_id: str
-    event_type: str
-    event_time: int | None = None
-    source: str
-    bot_id: str | None = None
-    workspace_id: str | None = None
-    conversation_id: str | None = None
-    thread_id: str | None = None
-    actor_type: str | None = None
-    actor_id: str | None = None
-    actor_name: str | None = None
-    subject_type: str | None = None
-    subject_id: str | None = None
-    input_summary: str | None = None
-    input_ref: str | None = None
-    raw_ref: str | None = None
-    seq: int | None = None
-    cursor: str | None = None
-    created_at: int | None = None
-    metadata: dict[str, Any] = {}
-
-class EventPage(BaseModel):
-    items: list[AgentEventRecord] = []
-    next_cursor: str | None = None
-    prev_cursor: str | None = None
-    has_more: bool = False
-    total_count: int | None = None
-
-class SteeringInputItem(BaseModel):
-    claimed_run_id: str
-    runner_id: str
-    claimed_at: int | None = None
-    event: AgentEventContext
-    input: AgentInput
-    conversation: ConversationContext | None = None
-    actor: ActorContext | None = None
-    subject: SubjectContext | None = None
-    metadata: dict[str, Any] = {}
-
-class SteeringPullResult(BaseModel):
-    items: list[SteeringInputItem] = []
-```
-
-## 9. 错误模型
-
-```python
-class AgentAPIError(BaseModel):
-    code: str
-    message: str
-    retryable: bool = False
-    details: dict[str, Any] = {}
-```
-
-| code | 说明 |
-| --- | --- |
-| `unauthorized` | 未授权访问资源或 scope。 |
-| `not_found` | 资源不存在或对当前 runner 不可见。 |
-| `deadline_exceeded` | 超过 run deadline。 |
-| `payload_too_large` | 请求或响应过大。 |
-| `rate_limited` | Host 限流。 |
-| `invalid_argument` | 参数错误。 |
-| `runtime_error` | Host 或下游能力错误。 |
-
-SDK runner-facing proxy 在 Host 返回结构化错误或畸形响应时抛出 `AgentAPIException`，其中 `error` 字段为 `AgentAPIError`。Legacy transport 只返回字符串错误时，SDK 使用 `host.action_error` 包装，避免 runner 继续依赖裸 `KeyError` 或字符串匹配。
-
-Runner 失败使用 `run.failed`：
-
-```json
-{ "type": "run.failed", "data": { "code": "runner.error", "error": "failed to call external agent", "retryable": false } }
-```
-
-## 10. Timeout 与 Cancellation
-
- Host 在 `ctx.runtime.deadline_at` 下发总 deadline；SDK proxy 必须用该 deadline 限制单次 action timeout。
- Host 可以取消 active run；Runtime 应尽力中断 runner。
- Protocol v1 的 run 绑定当前 Host 进程和当前 runtime channel，不保证跨 Host 重启恢复。Host 重启、runtime channel 断开或 run session 丢失时，Runtime / external harness connector 必须 fail-fast 并尽力取消仍在执行的 runner，不得继续使用旧 `run_id` 调用 Host API。
- Runner 支持中断时应返回或触发 `run.failed`，code 为 `cancelled`。
- Host 必须 unregister active run session。
-
-## 11. Security 与 Guardrail（协议层）
-
-Protocol v1 的安全边界在 Host：
-
- Runner 不能直接访问未授权 model/tool/kb/history/storage/sandbox。
- SDK 本地校验只提升开发体验，不能替代 Host 校验。
- 所有 resource id 对 runner 来说都是 opaque。
- 默认只能访问当前 conversation / thread 的 history；跨会话、workspace 级访问必须额外授权。
- 大 payload 不应塞进 result event；当前 run 的文件和工具大结果应进入授权 sandbox/workspace，由 read/write/exec 类工具按需访问。
- Host 必须记录 run_id、runner_id、action、resource、scope、result。
-
-Host 不负责业务编排：不拼接全量历史、不替 runner 做 prompt assembly、不内置 agent memory / tool loop / 上下文压缩策略。这些由官方或第三方 AgentRunner 插件实现。
-
-外部 harness runner 的边界统一见 HOST_SDK §4.8。简言之：harness native permission mode、allowed/disallowed tools、shell/MCP 权限只是额外执行约束，不能替代 Host 对 LangBot 资源的授权。
-
-> 发布级路径隔离、MCP allowlist、secret redaction、配额、workspace 清理等**不属于** v1 协议闭环，是生产默认启用前的 release gate，见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
-
-## 12. Pipeline Adapter 边界
-
-Pipeline 是当前入口 adapter，不是协议中心。目标产品模型中 Agent 会替代
-Pipeline 承载 runner config、resource policy 和 delivery policy；当前 Query
-entry adapter 只是迁移桥。它负责：
-
- 从 `Query` 构造 `AgentEventContext` 和临时 `AgentBinding`（见 HOST_SDK §4.2）。
- 从当前 Agent/runner config 构造 `ctx.config`。
- 将 Query-only 字段放入 `ctx.adapter`，例如 filtered params 放 `ctx.adapter.extra["params"]`。
-
-约束：
-
- adapter **不**定义历史窗口、prompt 组装或 agentic context 策略。
- `ctx.adapter.extra` 只允许承载一次性、JSON-safe、入口相关的非核心元数据，例如 `params`；不得承载 `prompt`、history window、RAG 结果、tool schema 或授权资源。
- 静态绑定 prompt 属于 `ctx.config.prompt`。preprocessing / hook 后的动态有效指令不通过 `ctx.adapter.extra` 主动推送；后续如需要保留这类能力，应通过 Host prompt/instruction pull API 暴露（占位见 HOST_SDK §4.8）。
- 新 runner 不应长期依赖 `adapter`，应只依赖 event-first context 和 Host API。
-
-## 13. 已确认约束
-
- v1 / EBA 主线是 `one event -> one AgentBinding -> one run_id -> one runner`。
- 一个 bot / IM channel 在同一时间只绑定一个负责 agentic 处理的 Agent；一个 Agent 可以被多个 bot / channel 复用。
- 如果配置层出现多个匹配 AgentBinding，BindingResolver 必须按明确规则选出一个或拒绝配置，不应默认 fan-out。
- observer agent、多 runner fan-out、并行裁决、result 合并等能力需要单独设计 delivery、state、platform action 和 audit 语义，不属于当前 v1 契约。
- `AgentRunnerDescriptor.source` 只允许 `plugin`；Host 内置 adapter 不能作为 runner source 绕过插件/runtime/proxy 权限链。
- `ctx.resources` 与 proxy action 校验必须来自同一个 run authorization snapshot；runtime handler 不应重新执行资源裁剪。
- v1 不要求 Agent、AgentRunner 插件实例或 runner id 全局串行。多个 bot / channel 可复用同一个 Agent；并发隔离依赖 `run_id`、binding、conversation / thread scope 和 Host authorization snapshot。
- 外部 harness runner 当前是 MVP / dev path，证明协议可接入，不代表发布级安全边界或 Docker 生产可用性完成。
-
-## 14. 开放问题
-
- `AgentBinding` 是否需要进入 SDK 文档作为只读诊断信息，还是完全 Host 内部。
- State 与 Storage 的边界是否需要更强类型。
- platform action 的审批模型如何表达。
- Host 侧 scoped MCP / workspace projection 是否需要从 runner config 上移为一等 resource projection API。（skill 一项已收敛：skill 全 tool 化，作为被授权 tool 暴露，不再是独立 projection。）
@@ -1,154 +0,0 @@
-# Agent Runner 插件化文档入口
-
-本文档是 agent-runner 插件化工作的路由页。具体设计拆到独立文档中维护，避免把 LangBot 宿主架构、SDK 协议、上下文管理、EBA 接入边界和官方 runner 迁移混在同一份 README 里。
-
-## 背景与问题
-
-旧 runner 路径主要围绕 Pipeline / Query 和 `pkg/provider/runners` 内置实现展开，扩展外部 agent runtime 时容易把 runner 选择、上下文裁剪、资源授权和消息投递绑在同一条聊天链路里。这个分支要把 LangBot 收敛成 Agent Host：Host 负责事件、绑定、授权、事实源和结果投递；AgentRunner 作为插件或外部 harness 消费统一协议并自主管理 prompt / history / memory。
-
-## 文档维护原则（单一事实源）
-
- **协议数据结构（schema）唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。** 其他文档不得重抄 schema，只能引用，例如"见 PROTOCOL_V1 §4.2"。
- 当前实现状态、spec 差距与 runner 验收状态归 [STATUS.md](./STATUS.md)；测试执行入口归 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)，安全发布门槛归 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
- Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、Descriptor、各 Store）定义在 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)，不属于 SDK 协议。
- 其余专题文档只讲"为什么/边界/怎么用"，避免重复叙述。
-
-## 本分支目标
-
-**本分支目标：AgentRunner 外化 / 插件化基础设施**
-
-本分支只做 LangBot 作为 Agent Host 的基础能力建设，为后续用 `Agent`
-替代 Pipeline 承载 agent 配置打底：
-
- LangBot 与 SDK 的稳定协议合同（Protocol v1）
- Host-side `AgentEventEnvelope` / `AgentBinding` 模型
- `run(event, binding)` event-first 入口
- `QueryEntryAdapter`：Query → AgentEventEnvelope + AgentBinding
- EventLog / Transcript / PersistentStateStore
- History / Event / State pull APIs
- Sandbox/workspace read/write/exec 文件能力，用于当前 run 的上传文件、工具大结果和临时产物
- SDK runtime forwarding pull APIs + `caller_plugin_identity` 验证路径
-
-## 本分支不实现
-
-以下能力由其他分支负责，本分支只保留 integration point。EBA 完整事件网关与事件路由当前由外部 EBA 分支联调：
-
- **EventGateway / EventRouter**：完整事件网关实现、事件路由、事件持久化管理
- **Event subscription / Event notification**：事件订阅、推送通知
- **BindingResolver persistence UI**：绑定配置的持久化 UI 和 event router 集成（如由其他模块负责）
- **Scheduler / Background event source**：定时任务、后台事件源
- **完整 Agent Platform / daemon control plane**：Host-owned `AgentRun` / `AgentRunEvent`、run control primitives、最小 runtime heartbeat/claim lease 已作为 v2 foundation 落地；业务队列、Platform UI、daemon supervisor、runtime wakeup channel 和分布式 runtime 管控仍不属于 Protocol v1 主线。
-
-EventGateway / EventRouter 在本文档中描述为 **external EBA branch integration point**，由外部 EBA 分支提供并联调。本分支只定义 host-side envelope/binding models 和 `run(event, binding)` orchestrator 入口。
-
-本分支与外部 EBA / Agent Platform / Runtime Control Plane 的扩展边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
-
-## 目标产品模型
-
-未来产品层应把 `Agent` 理解为 Pipeline 的替代物：原先 bot 绑定 Pipeline，Pipeline 携带 agent/provider/RAG/tool 等配置；后续应改为 bot 或 IM channel 绑定一个 Agent，Agent 携带 runner id、runner config、resource/state/delivery policy 等 agent 配置。
-
-调度基数、Agent 复用、插件实例无状态、Pipeline adapter 和 fan-out 边界的规范来源是 [PROTOCOL_V1.md](./PROTOCOL_V1.md) §13；README 不复写这些约束。
-
-## 当前入口关系
-
-**当前 Pipeline 是入口 adapter，不再是 agent runner 设计核心。**
-
-主入口仍可由 Pipeline 触发，但内部已转换成 event-first path：`run_from_query()` 经 `QueryEntryAdapter` 把 `Query` 转换为 `AgentEventEnvelope` + `AgentBinding`，再委托到统一的 `run(event, binding, ...)`。Pipeline path 因此获得了 event-first host capabilities（EventLog / Transcript / PersistentStateStore 写入，History / Event / State pull API 和 sandbox/workspace 文件读写能力可用）。
-
-下一轮测试路径、状态定义和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。
-
-## 术语表
-
-| 术语 | 含义 |
-| --- | --- |
-| Protocol v1 | Host 调用 AgentRunner 的 runner 可见合同：discovery、`AgentRunContext`、result stream、Host pull API 和错误模型。 |
-| Agent | 目标产品层配置对象，保存 runner id、runner config 和资源/状态/投递策略；不等于插件实例。 |
-| AgentConfig | Host 内部迁移期配置投影，由当前 Pipeline config 或未来持久 Agent 生成。 |
-| AgentBinding / binding | Host 在一次事件运行前解析出的有效绑定，决定调用哪个 runner 以及带什么策略。 |
-| envelope | Host 内部事件封装，即 `AgentEventEnvelope`；runner 看到的是由它投影出的 `ctx.event`。 |
-| descriptor / manifest | runner discovery 的能力和配置描述；manifest 来自插件，descriptor 是 Host 校验后的注册表视图。 |
-| EBA | Event Based Agent，把消息、撤回、入群、定时任务等都统一成 host event 的接入方向；完整网关和路由在外部 EBA 分支联调。 |
-| harness runner | ACP、Claude Code、Codex 等已有自身 session / tool loop / MCP / 压缩机制的外部 runtime adapter。 |
-| projection | Host 把内部事实源、授权资源或配置裁剪成 runner / harness 可消费视图的过程。 |
-| Runtime Control Plane | v2 Host 能力层，当前已落地 Host-owned run/result ledger、run control primitives、最小 runtime heartbeat/claim lease；完整 daemon worker 管控、task wakeup 和 Agent Platform 产品形态不是 Protocol v1 主线。 |
-
-## 设计文档
-
-| 文档 | 关注点 |
-| --- | --- |
-| [PROTOCOL_V1.md](./PROTOCOL_V1.md) | **🔒 唯一 schema 事实源**。LangBot Host 与 SDK / Runtime / AgentRunner 的协议合同：版本协商、discovery、run context、result stream、proxy actions、错误和 adapter 边界。 |
-| [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) | LangBot 宿主能力与分层架构、Host 内部模型（`AgentEventEnvelope` / `AgentBinding` / Descriptor / 各 Store）、runner 发现、绑定、资源授权、状态、存储、生命周期和调用链。 |
-| [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) | Agent-owned context 方向：事件到来时 LangBot 传什么，agent 如何按需拉取更多历史 / state、如何访问 sandbox/workspace 文件，以及如何支持 KV cache 友好的上下文管理。 |
-| [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md) | AgentRunner 外化与外部 EBA / Agent Platform / Runtime Control Plane 的扩展边界矩阵，说明哪些是本分支底座、哪些由外部分支接入。 |
-| [EVENT_BASED_AGENT.md](./EVENT_BASED_AGENT.md) | EBA 接入边界：事件模型、事件来源、触发绑定、非消息事件如何复用 AgentRunner 调度；完整 EventGateway / EventRouter 由外部 EBA 分支联调。 |
-| [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) | Agent Platform v2 / runtime 管控面决策：`AgentRun` / `AgentRunEvent` / run control 已作为 Host 事实源落地，最小 runtime heartbeat/claim lease 已落地；完整 runtime registry / daemon 管控仍是后续可选阶段。 |
-| [OFFICIAL_RUNNER_PLUGINS.md](./OFFICIAL_RUNNER_PLUGINS.md) | 官方 runner 插件迁移，包括 local-agent 和外部 runner。它是下游落地计划，不是 LangBot 基础能力设计的前置约束。 |
-| [RUN_STEERING_AND_CHECKPOINT.md](./RUN_STEERING_AND_CHECKPOINT.md) | 运行中消息注入（steering / follow-up）与压缩摘要持久化（compaction checkpoint）的设计与落地状态记录；schema 仍以 PROTOCOL_V1 为准。 |
-| [STATUS.md](./STATUS.md) | 当前实现状态、spec 与实现已知差距、runner 验收状态和历史高价值记录。 |
-| [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md) | Agent Runner QA 指南：保留最高价值测试路径，指导 agent 开展下一轮 WebUI / runner smoke 验证。 |
-| [SECURITY_HARDENING.md](./SECURITY_HARDENING.md) | 安全发布级 hardening 的后续发布门槛：路径隔离、权限边界、secret、资源配额、MCP / skill 投影和审计。 |
-
-## 工作拆分
-
-### 1. LangBot + SDK 基础设施
-
-目标是把 LangBot 从内置 runner 执行器变成 agent host：
-
- LangBot 与 SDK 的稳定协议合同
- runner manifest / descriptor / registry
- Agent / binding 配置解析
- run orchestration 和生命周期管理
- resource authorization 与 `run_id` 级权限校验
- host-owned state / storage / event log / transcript 能力
- sandbox/workspace 文件 staging 与 read/write/exec 能力
- SDK `AgentRunner`、`AgentRunContext`、`AgentRunResult`、`AgentRunAPIProxy`
-
-协议合同详见 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。
-
-详见 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)。
-
-### 2. Agent-owned context
-
-LangBot 不应成为最终 agentic context manager。它应提供事实源、默认上下文引用和按需读取 API；agent 或其背后的 runtime 负责历史剪裁、摘要、召回和 KV cache 策略。
-
-Host 不定义通用历史窗口字段或策略；runner 通过 Host pull API 按需拉取历史并自行管理 working context。
-
-详见 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md)。
-
-### 3. Event Based Agent（External Branch）
-
-消息只是事件的一种。外部 EBA 分支中的 `message.received`、`message.recalled`、`group.member_joined`、`friend.request_received` 等事件都应能通过统一事件 envelope 触发 AgentRunner。
-
-EBA dispatch 的基数和 fan-out 边界仍以 PROTOCOL_V1 §13 为准；本文档只列出本分支提供给外部 EBA 分支复用的入口点。
-
-**本分支不实现 EBA 完整能力，只提供：**
- event-first envelope (`AgentEventEnvelope`)
- AgentBinding model
- `run(event, binding)` 入口
- QueryEntryAdapter（当前 AgentEventEnvelope / AgentBinding 的 Query entry adapter source）
-
-详见 [EVENT_BASED_AGENT.md](./EVENT_BASED_AGENT.md)。
-
-### 4. 官方 runner 插件
-
-官方 `local-agent` 和外部 runner 迁移是下游工作。它们需要依附 LangBot 提供的宿主能力，但不应反过来决定宿主协议。
-
-`local-agent` 可以外移，也可以重写。验收重点是它能完整消费 LangBot 的模型、工具、知识库、存储、事件、history API 和 result stream，而不是保留旧内置 runner 的内部结构。
-
-详见 [OFFICIAL_RUNNER_PLUGINS.md](./OFFICIAL_RUNNER_PLUGINS.md)。
-
-### 5. Runtime Control Plane v2（Foundation Partial）
-
-当前 AgentRunner v1 主线仍以 `event -> binding -> runner.run(ctx) -> result stream` 为 runner 可见合同。Host 侧已经新增持久 `AgentRun` / `AgentRunEvent`、result persistence、cancel/finalize/query 等通用 run control primitives，并提供受权限保护的最小 runtime register/heartbeat/list、claim/renew/release 和 reconcile 原语。
-
-在这些 Host 能力之上，可以构建独立 agent 管控面插件；插件负责 UI、策略和编排体验，runtime/task 的事实源仍由 Host 持有。完整 daemon supervisor、任务唤醒/长轮询/WebSocket、跨 Host 分布式锁、provider 登录态诊断和产品化业务队列仍是后续工作。
-
-详见 [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md)。
-
-## 约束事实源
-
-本分支已确认约束不在 README 重写：
-
- Runner 可见协议、result stream 和调度边界见 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。
- Host 内部 `AgentConfig` / `AgentBinding` 投影见 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)。
- 外部 EBA / Agent Platform / Runtime Control Plane 接入边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
@@ -1,541 +0,0 @@
-# Agent Platform / Runtime Control Plane Decision Note
-
-本文档记录 AgentRunner 插件化之后，LangBot 如何继续演进成 Agent Platform 基础设施层。这里讨论的是 Host capability layer，不是 `AgentRunner Protocol v2`，也不是把某个具体 Agent Platform 产品写进 LangBot core。
-
-> 本文是当前决策版。协议数据结构仍以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) 为准；测试执行入口见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)；扩展边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
->
-> 实现状态说明：本文描述的是 Runtime Control Plane v2 的目标能力和分阶段落地建议。当前 AgentRunner 插件化主线已经具备 event-first context、run-scoped authorization、EventLog / Transcript / State / sandbox 文件等 Host capability，并已落地持久 `AgentRun` / `AgentRunEvent` ledger、run control actions、最小 runtime heartbeat/claim lease 和 admin reconcile 原语。完整 Agent Platform 产品形态、daemon supervisor、runtime wakeup channel 和分布式 runtime 管控仍未完成。当前实现状态以 [STATUS.md](./STATUS.md) 为准。
-
-## 1. 当前决策
-
-LangBot 后续定位应更像 **Agent Host / infrastructure provider / transfer layer**，而不是把某个完整 Agent Platform 产品固化进 core。
-
-结论：
-
- **Agent Platform 产品形态做成插件**。插件负责 agent 管理、策略、业务队列、UI、编排、多 agent 协作和产品体验。
- **Agent Platform 所需的基础事实源做进 Host**。当前 Host 已保存 event、state、transcript、sandbox 文件边界、active run 权限快照、持久 run/result ledger、审计关联和通用控制状态。
- **最小 runtime registry / heartbeat / claim lease 已作为 Host 原语落地，但不等于完整 daemon worker 管控**。远程 harness / daemon 的进程托管、wakeup channel、provider 登录态诊断和分布式调度仍可以先由 AgentRunner 插件和 SDK remote layer 自己维护。
- **不把业务调度写进 Host**。Host 提供通用 run/result/control primitives，Platform 插件决定哪些事件触发哪些 agent、如何排队、如何分配、是否 fan-out。
-
-推荐分层：
-
-```text
-LangBot Host
-  Current base: EventLog / runtime AgentBinding / State / Transcript / sandbox files / active run authorization
-  Current v2 foundation: Run / RunEvent / audit / result persistence / control primitives / minimal runtime heartbeat and claim lease
-  Planned: Agent / Binding persistence / daemon supervisor / wakeup channel / distributed runtime operations
-
-Agent Platform plugin
-  Agent management UI / project-task model / event routing policy
-  Business queue / multi-agent orchestration / runtime selection policy
-
-AgentRunner plugin / external harness runtime
-  Connects ACP / remote daemon / local subprocess / HTTP API
-  Executes and converts provider-native events to AgentRunResult
-```
-
-## 2. Platform 与非 Platform 的区别
-
-当前 LangBot 已经具备 Agent Host 的核心特征：
-
- 抹平不同 AgentRunner。
- 从 IM / Pipeline 入口触发 runner。
- 有 event-first context 方向。
- 有 Host-owned EventLog / Transcript / State 和 sandbox/workspace 文件边界。
- 有 runner config 下发和 active run-scoped authorization。
- 有 `run_id` 串联 event、transcript、state、sandbox 文件和内存授权上下文。
-
-这还不是完整 Agent Platform。完整 Platform 至少还需要：
-
- 可管理的 agent 资产：agent profile、binding、resource policy、runner config、可用状态。
- 可观察的执行生命周期：run status、result stream、失败原因、文件引用、审计、回放。
- 可运营的控制面：取消、重试、排队、并发、超时、恢复、诊断。
- 可产品化的调度体验：事件订阅、路由策略、任务板、多 agent 协作、项目/工作区视图。
-
-因此，区别不只是“有没有调度”，而是是否具备：
-
-```text
-managed agent assets + observable run lifecycle + operational run control
-```
-
-Host 负责这些能力的通用事实源和安全边界；Platform 插件负责把它们组装成具体产品。
-
-### 2.1 当前实现边界
-
-当前代码中的 `run_id` 已经连接 active run 授权、持久 run ledger 和多个 Host 事实源：
-
- `EventLog` 保存输入事件和审计入口，并记录 `run_id` / `runner_id`。
- `Transcript` 保存对话历史投影，并用 `run_id` 关联 assistant 输出。
- Sandbox/workspace 保存当前运行输入文件和 runner 产物，并用 `run_id` 做访问边界的一部分。
- `PersistentStateStore` 保存 runner state，但不等同于 run lifecycle。
- `AgentRunSessionRegistry` 保存 active run 的内存态授权快照，用于 proxy action 校验；进程结束或 run 结束后不作为可回放事实源。
- `AgentRun` 保存 run lifecycle、scope、authorization snapshot、queue/claim 状态、cancel intent、usage/cost 和 metadata。
- `AgentRunEvent` 保存 runner/result/admin event stream，按 `run_id + sequence` 做可回放分页。
- `AgentRuntime` 保存最小 runtime registry / heartbeat 事实，用于 runtime list、stale mark 和 claim lease reconcile。
-
-因此本文后续提到的 `AgentRun` / `AgentRunEvent`、`run_append_result`、`run_finalize`、`run_cancel`、`runtime_register`、`runtime_heartbeat`、`run_claim` 等基础原语已经存在。仍未完成的是独立 platform `run_create` action、Host-owned Agent / Binding 持久模型、业务队列产品形态、daemon supervisor、runtime wakeup channel、跨 Host 分布式锁和 provider/runtime 诊断面。
-
-## 3. 基础概念
-
-### 3.1 Event
-
-Event 表示“发生了什么”：
-
-```text
-message.received
-github.issue.opened
-scheduler.tick
-user.approved
-system.webhook.received
-```
-
-EBA 负责把外部输入标准化成 event。Event 本身不是 queue，也不等同于一次 agent 执行。当前 `EventLog` 记录的是输入事件和审计事实；未来 `AgentRunEvent` 记录的是某次 run 的输出事件流，二者不能混用。
-
-### 3.2 Run
-
-Run 表示“某个 agent / binding / runner 针对某个 event 的一次执行”。
-
-Run 应由 Host 持久化，成为执行状态、结果、权限和审计的事实源：
-
-```text
-run_id
-event_id
-agent_id / binding_id
-runner_id
-status
-created_at / started_at / finished_at
-error / failure_reason
-delivery target
-metadata
-```
-
-当前 `AgentRunSessionRegistry` 只保存 active run 的内存态授权信息，不足以支撑 Platform 的回放、审计、取消、重试和异步执行。
-
-### 3.3 RunEvent / RunResult
-
-RunEvent 是一次 run 过程中产生的结果事件流，对应 runner 返回的 `AgentRunResult`。它不同于 EBA/EventLog 的输入事件：
-
-```text
-message.delta
-message.completed
-tool.call.started
-tool.call.completed
-state.updated
-action.requested
-run.completed
-run.failed
-```
-
-Host 应保存这些输出事件，按 `run_id + sequence` 可回放。Transcript、State 可以由这些 result event 触发写入现有 store，并保留能回溯到 `AgentRunEvent` 的关联。文件和工具大结果留在当前 run 的 sandbox/workspace 中，不作为 result event blob 回传。
-
-### 3.4 Queue
-
-Queue 不是 EBA 的替代品。
-
-EBA 负责产生 event；queue 负责处理“这个 event 对应的执行 work item 何时执行、谁来执行、如何取消/重试/恢复”。
-
-队列可以分两层：
-
- **业务队列**：由 Platform 插件管理，例如项目任务、优先级、agent team、workflow、人工审批。
- **执行队列 / run queue**：可选 Host 原语，例如 queued / running / completed / failed / cancelled、claim lease、dispatch timeout、orphan recovery。
-
-第一阶段不要求 Host 内置完整执行队列。Platform 插件可以先管理业务队列；在 Phase 1 / Phase 2 能力落地前，插件仍只能通过现有 `AgentRunOrchestrator.run(...)` 同步执行路径和现有 Host stores 获得有限的 run 关联能力。
-
-### 3.5 Runtime / Daemon
-
-Runtime / daemon 表示执行位置或执行能力，例如某台机器上的 Claude Code / Codex CLI。
-
-当前决策：
-
- Host 不在第一阶段维护完整 runtime registry。
- AgentRunner 插件可以通过 SDK remote layer 与 daemon 保持连接、心跳和执行通道。
- 外部 harness / agent 不应直接访问 LangBot Host 或数据库。访问 LangBot 资源必须通过 daemon / AgentRunner plugin / SDK runtime / `AgentRunAPIProxy` / scoped MCP bridge，并接受 run-scoped authorization 校验。
- 如果后续多个插件都需要共享 runtime 状态，再把薄的 `RuntimeLease` / registry 下沉为 Host 通用能力。
-
-## 4. Host 应新增的最小能力
-
-第一阶段最重要的不是 daemon registry，而是让 Host 成为 run/result 的事实源。
-
-### 4.1 AgentRun Store
-
-新增持久 `AgentRun`：
-
-```text
-id / run_id
-event_id
-agent_id
-binding_id
-runner_id
-conversation_id / thread_id
-workspace_id / bot_id
-status
-status_reason
-created_at / started_at / finished_at / updated_at
-deadline_at
-cancel_requested_at
-usage_json
-cost_json
-metadata_json
-```
-
-建议 status 至少包含：
-
-```text
-created
-running
-completed
-failed
-cancelled
-timeout
-```
-
-如果后续加执行队列，再引入：
-
-```text
-queued
-claimed
-dispatching
-```
-
-### 4.2 AgentRunEvent Store
-
-新增持久 `AgentRunEvent`：
-
-```text
-id
-run_id
-sequence
-type
-data_json
-usage_json
-created_at
-source
-metadata_json
-```
-
-约束：
-
- 同一 `run_id` 内 `sequence` 单调递增。
- append 必须幂等，支持远程 daemon / plugin 重试。
- 未知 result type 可保存但 Host 只对已知类型执行副作用。
- 大 payload 仍应进入 sandbox/workspace，不直接塞入 result event。
- `usage_json` 保存 `AgentRunResult.usage` 原样结构；缺失表示 unknown，不等于 0。
-
-### 4.3 Run Control API
-
-Host 提供通用控制原语：
-
-```text
-run.create
-run.get
-run.list
-run.events.page
-run.cancel
-run.append_result
-run.finalize
-```
-
-语义：
-
- `run.create` 创建 Host-owned run 和授权快照。
- `run.append_result` 只允许受信 SDK/runtime 路径调用，必须绑定 run 创建时固化的授权快照，写入 `AgentRunEvent` 并触发 transcript/state/delivery 副作用。
- `run.finalize` 关闭 run，更新 terminal status。
- `run.cancel` 设置取消意图；同步 runner 通过 context/deadline 感知，远程 runner 通过插件/daemon 通道感知。
-
-第一阶段可以只暴露给插件 runtime action，不一定先做公开 HTTP API。
-
-### 4.4 Result Persistence In Orchestrator
-
-当前 `AgentRunOrchestrator.run()` 已经处理：
-
-```text
-event -> binding -> context -> runner invocation -> result normalization
-```
-
-需要补齐：
-
- run 开始时创建 `AgentRun`。
- 每个 `AgentRunResult` 进入 `AgentRunEvent`。
- `run.completed` / 正常 generator 结束时标记 completed。
- `run.failed` / exception / timeout 标记 failed 或 timeout。
- terminal result 携带 usage 时，写入 `AgentRunEvent.usage_json` 并汇总到 `AgentRun.usage_json`。
- `state.updated`、transcript 写入继续走现有 journal，但应与 `AgentRunEvent` 有可追踪关系。
-
-### 4.5 Usage / Cost Accounting
-
-SDK 侧 `AgentRunResult` 已提供可选 `usage` 字段，用于把不同 runner / external harness / provider-native event 的 token usage 归一到同一个 run result envelope。
-
-语义：
-
- `run.completed.usage` SHOULD 表示本次 run 的最终聚合 token usage。
- `run.failed.usage` MAY 表示失败前已知的部分 token usage。
- 没有 usage 表示 upstream runtime 没有报告或 adapter 暂未接入；Host 不得按 0 计费或按 0 判断上下文消耗。
- Host 应把 event-level usage 原样写入 `AgentRunEvent.usage_json`，并在 terminal event 或 finalize 阶段汇总到 `AgentRun.usage_json`。
- cost 应由 Host 根据 usage、runner/model identity、发生时间和价格表计算，写入 `AgentRun.cost_json`；runner/provider 上报的 cost 只能作为非权威 telemetry 保留在 metadata 或 usage extra 中。
-
-这层约束先解决协议位置和持久化位置；具体 ACP、remote daemon、local subprocess runner 如何从 native event 中抽取 usage，可在各插件后续适配。
-
-### 4.6 Authorization Snapshot
-
-异步或远程执行时，run 创建时必须固化授权快照：
-
- runner identity
- binding identity
- caller plugin identity
- resource policy
- allowed tools/models/files/knowledge bases/storage scopes
- state scopes
- conversation/thread/workspace scope
-
-后续 append result、state API、history API 和 sandbox/workspace 文件访问都以这个 snapshot 校验，不重新扩大权限。
-
-## 5. SDK 侧应新增的最小能力
-
-SDK 不需要马上定义完整 daemon registry，但需要让插件和 runner 使用 Host run/result 能力。
-
-### 5.1 Entities
-
-新增或补齐：
-
-```text
-AgentRun
-AgentRunStatus
-AgentRunEvent
-RunEventPage
-RunCreateRequest / RunCreateResult
-RunAppendResultRequest
-```
-
-这些是 Host control primitives，不替代 `AgentRunContext` / `AgentRunResult`。
-
-### 5.2 Proxy Methods
-
-在 SDK proxy 中提供：
-
-```python
-create_run(...)
-get_run(run_id)
-list_runs(...)
-page_run_events(run_id, cursor=None, limit=...)
-cancel_run(run_id)
-append_run_result(run_id, result, sequence=None)
-finalize_run(run_id, status, error=None)
-```
-
-访问边界：
-
- 普通 AgentRunner 在同步 `run(ctx)` 内不一定需要直接调用这些 API；Host orchestrator 可自动记录。
- Platform 插件可以创建/查询/取消 run。
- AgentRunner 插件或 daemon bridge 可以 append/finalize 自己负责的 run。
- 外部 harness 仍不能直接调用 Host；必须经 SDK runtime / proxy / bridge。
-
-### 5.3 Plugin-Daemon Heartbeat
-
-远程 daemon 的初始心跳可以是 SDK / AgentRunner plugin 私有能力：
-
-```text
-daemon <-> AgentRunner plugin / SDK remote layer <-> LangBot plugin runtime <-> Host
-```
-
-Host 第一阶段只需要知道：
-
- 相关插件是否在线。
- run 是否有 progress/result。
- run 是否超时或取消。
-
-如果后续需要跨插件共享 daemon 可用性，再把 heartbeat/registry 下沉为 Host 能力。
-
-## 6. Platform 插件应负责什么
-
-Agent Platform 插件可以负责：
-
- 管理哪些 agent 可用。
- 维护产品层 agent profile、项目、任务板、workflow、team。
- 订阅 EBA event，决定哪些 event 触发哪些 agent。
- 维护业务 queue：优先级、重试策略、人工审批、分配规则。
- 选择 runner / runtime / daemon。
- 在 Run Control API 落地后，调用 Host run API 创建、取消、查询执行。
- 展示 run status、result stream、文件引用、失败原因和审计。
-
-Platform 插件不应负责：
-
- 在 Host Run Ledger 落地后，私有保存通用 run/result 事实源。
- 绕过 Host 直接写 transcript/state 或越权访问 sandbox/workspace 文件。
- 让外部 harness 直接访问 LangBot DB 或 Host 内部资源。
- 把某个业务队列语义强塞进 AgentRunner Protocol v1。
-
-## 7. 与 EBA 的关系
-
-EBA 做好后，事件流可以进入两种路径。
-
-直接执行路径：
-
-```text
-EventGateway
-  -> EventRouter resolves AgentBinding
-  -> AgentRunOrchestrator.run(event, binding)
-  -> Host records AgentRun / AgentRunEvent (after Run Ledger lands)
-  -> delivery
-```
-
-Platform 插件编排路径：
-
-```text
-EventGateway
-  -> Platform plugin receives/subscribes event
-  -> plugin applies policy / business queue
-  -> plugin creates Host run (after Run Control API lands)
-  -> runner/plugin/daemon executes
-  -> Host records result and state
-  -> plugin displays / Host delivers
-```
-
-这两条路径最终应共享 Host run/result/state 事实源和 sandbox/workspace 文件边界。当前阶段可共享的是 event/transcript/state、sandbox 文件和同步执行链路；持久 run/result ledger 需要 Runtime Control Plane v2 Phase 1 补齐。区别在于是否有 Platform 插件参与产品化调度和业务队列。
-
-## 8. 与 AgentRunner Protocol v1 的关系
-
-本设计不改变 v1 的 runner 可见合同：
-
-```text
-AgentRunContext -> AgentRunner.run(ctx) -> AgentRunResult stream
-```
-
-必须保持：
-
- `AgentRunContext` 不塞入 daemon/worker/pod 细节。
- `AgentRunResult` 仍是 runner 输出的统一事件流。
- 普通 runner 不需要知道 task queue / runtime registry。
- 远程 harness 可以自管 session、tool loop、MCP、上下文压缩，但访问 LangBot 资源必须通过 SDK proxy / bridge。
- Runtime-managed execution 是 placement / transport 选择，不是普通 runner 协议的强制概念。
-
-## 9. 分阶段实施建议
-
-### Phase 1: Run Ledger（Foundation Implemented）
-
-目标：Host 成为执行状态和结果事实源。
-
-范围：
-
- `AgentRun` 表。
- `AgentRunEvent` 表。
- Orchestrator 自动创建/更新 run。
- Journal 持久化每个 `AgentRunResult`。
- Run 查询和事件分页 API。
- SDK entities + proxy 方法。
-
-复杂度：中等。
-
-预计改动：
-
-```text
-Host: 12-20 个文件
-SDK: 4-8 个文件
-Tests: 8-15 个文件
-```
-
-### Phase 2: Platform Plugin Queue On Host Run Primitives（Control Primitives Partially Implemented; Product Queue Pending）
-
-目标：Platform 插件管理业务 queue，Host 提供 run/result/cancel 原语。
-
-范围：
-
- `run.create`
- `run.cancel`
- `run.append_result`
- `run.finalize`
- result append 的 sequence/idempotency。
- 受权限保护的远程 append/finalize。
- Platform 插件可基于 Host run 构建任务板和调度体验。
-
-复杂度：中等偏高。
-
-预计改动：
-
-```text
-Host: 20-35 个文件
-SDK: 8-14 个文件
-Tests: 15-25 个文件
-```
-
-### Phase 3: Optional Host Execution Queue / Claim Lease（Claim Lease Primitive Implemented; Full Queue Pending）
-
-目标：当多个插件重复实现 claim/cancel/retry/recovery 时，再下沉执行队列到 Host。
-
-范围：
-
- `queued/running/completed/failed/cancelled` 状态机扩展。
- `claim_run` / `lease_until`。
- dispatch timeout。
- retry / orphan recovery。
- cancel propagation。
- 并发 claim 防重。
-
-复杂度：高。
-
-预计改动：
-
-```text
-Host: 35-55 个文件
-SDK: 12-20 个文件
-Tests: 25-40 个文件
-```
-
-### Phase 4: Optional Runtime Registry（Minimal Registry Implemented; Full Daemon Control Pending）
-
-目标：当 Host 需要统一管理多个 daemon / worker 时，再引入 runtime registry。
-
-范围：
-
- runtime register / heartbeat / deregister。
- capability report：provider、version、login status、workspace access、slot。
- runtime online/offline。
- runtime scoped auth。
- runtime audit。
- runtime gone recovery。
- task wakeup / long polling / websocket。
- 多 Host 实例下的 relay / distributed lock。
-
-复杂度：很高。
-
-预计改动：
-
-```text
-Host: 55-80+ 个文件
-SDK: 18-30 个文件
-Tests: 40+ 个文件
-```
-
-不建议现在直接进入此阶段。
-
-## 10. 设计原则
-
- 先把 run/result 事实源做进 Host，再谈完整 runtime control plane。
- Agent Platform 产品做插件；Host 做基础设施。
- Host 不写业务调度策略，但要保存通用状态、结果、权限和审计。
- EBA event 不是 queue；queue 是执行生命周期问题。
- 业务 queue 可以先在 Platform 插件里；执行 queue 只有在复用需求明确后再下沉 Host。
- Daemon registry 不应污染 AgentRunner Protocol v1。
- 外部 harness 不直接访问 LangBot Host 或 DB。
- 所有 LangBot 资源访问必须走 SDK runtime / `AgentRunAPIProxy` / scoped MCP bridge。
- Docker / remote / local subprocess 只是 runtime placement，不是 runner 协议差异。
-
-## 11. 非目标
-
-当前阶段不做：
-
- 完整 Multica 式 runtime registry。
- Host 内置项目管理、任务板、agent team、workflow 产品逻辑。
- 把 daemon heartbeat / worker liveness 放进 `AgentRunContext`。
- 把业务 queue 定义为 AgentRunner Protocol 字段。
- 让 Platform 插件私有保存 run/result 事实源。
- 让外部 agent/harness 直连 Host 内部资源。
-
-## 12. 待定问题
-
- Host 是否需要最小持久 `Agent` / `Binding` 模型，还是继续由 Pipeline / Platform 插件投影运行期 `AgentBinding`。
- Platform 插件创建 run 时，是否传完整 `AgentBinding` snapshot，还是引用 Host-owned binding id。
- `AgentRunEvent` 与现有 `EventLog` / `Transcript` 的查询关系：直接 join，还是通过专门 view 聚合。
- `run.append_result` 的认证粒度：runner plugin identity、run token、scoped capability token，或 SDK runtime 内部 channel。
- 取消语义：同步 runner、external harness runtime/session 如何统一感知 cancel。
- 何时把插件私有 daemon heartbeat 提升为 Host `RuntimeLease`。
- 若未来 Host 做 claim lease，Platform 插件业务 queue 与 Host execution queue 如何避免双队列混乱。
@@ -1,154 +0,0 @@
-# Run Steering 与 Compaction Checkpoint（Design Note）
-
-本文档记录两项 Host/runner 协作能力：**运行中消息注入（steering / follow-up）**和
-**压缩摘要持久化（compaction checkpoint）**。两者来自官方 local-agent 对照
-Pi agent harness（`pi-mono/packages/agent`，下称 pi-agent-core）的差距分析：
-local-agent 已移植 Pi 的事件生命周期、并行工具语义、hook 扩展点和压缩预算模型，
-这两项需要 Host 协议、授权与 runner turn 边界协同才能闭环。
-
-> 本文是设计备忘，不是 schema 事实源。涉及的数据结构最终落到
-> [PROTOCOL_V1.md](./PROTOCOL_V1.md)；上下文边界语义以
-> [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) 为准；
-> run 持久化与控制原语以 [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) 为准。
-
-## 1. Run Steering / Follow-up（运行中消息注入）
-
-### 1.1 问题
-
-IM 场景下用户在 agent 运行中追加消息非常常见（补充信息、纠正方向、"算了别查了"）。
-当前主线是 `one event -> one AgentBinding -> one run_id -> one runner`
-（PROTOCOL_V1 §13）：同会话的新消息要么等待当前 run 结束后触发新 run，
-要么并发触发独立 run。两种行为都无法把新消息送进**正在执行的 tool loop**，
-用户体验是"agent 自顾自跑完过期任务，然后才看到新消息"。
-
-cancel（PROTOCOL_V1 §10）不解决这个问题：cancel 丢弃已完成的工作；
-steering 是在保留当前进度的前提下改变后续方向。
-
-### 1.2 Pi 的参考语义
-
-pi-agent-core 区分两个队列，注入时机都在 turn 边界，不打断进行中的模型流或工具执行：
-
- **steering**：运行中插入。当前 assistant 消息的全部 tool call 完成后、
-  下一次模型调用前，注入排队的用户消息；模型在下一 turn 看到它们。
- **follow-up**：排队后续工作。仅当没有 pending tool call 且没有 steering 消息、
-  run 即将自然结束时检查；若有排队消息则注入并继续下一 turn，而不是结束 run。
-
-两个队列各自支持 `one-at-a-time`（每次注入一条）和 `all`（一次注入全部）模式。
-
-### 1.3 设计方向
-
-职责划分遵循既有原则：Host 拥有事件路由和会话事实源，runner 拥有 turn 边界。
-
- **Host 侧**：BindingResolver / dispatch 层识别"同 conversation 存在 active run
-  且 runner 声明支持 steering"的新消息事件，将其写入 run-scoped steering queue，
-  并标记该事件已被在途 run 认领（不再触发新 run，避免破坏 §13 的基数约束）。
-  事件仍照常进 EventLog / Transcript（事实源不变，改变的只是触发行为）。
- **Runner 侧**：在 turn 边界（tool batch 完成后、下一次模型调用前，以及 run
-  即将自然结束前）通过 run-scoped pull API 拉取 pending steering 输入，
-  注入 working context。local-agent 的 `AgentLoopHooks.prepare_next_turn` /
-  `should_stop_after_turn` 已预留了对应的注入点。
- **能力协商**：runner manifest 声明 `steering` capability（参照 PROTOCOL_V1 §4.3）；
-  未声明的 runner 保持现状（新消息按现有规则另起 run）。
- **回执**：被 steering 消费的事件通过 EventLog 审计。原始 `message.received`
-  记录在 `metadata.steering` 标记 queued/absorbed 与 `claimed_by_run_id`；
-  runner 成功 pull 后，Host 追加 `steering.injected` 记录并引用源事件。
-  run 结束时仍未被 pull 的已 claim 输入，Host 追加 `steering.dropped` 记录作为
-  dispatch 终态；原始 Transcript 事实不删除。
-  Transcript 继续只表示会话事实，不扩展 dispatch 行为字段。
-
-已落地的协议面（最终定义归 PROTOCOL_V1）：
-
-1. `ContextAccess.available_apis` 增加 steering pull 能力位。
-2. `AgentRunAPIProxy` 增加 steering 拉取 action：默认 `mode=all`，Host 保序返回全部
-   pending 输入；`one-at-a-time` 仅作为 runner 主动节流选项。
-3. dispatch 层的"认领"规则：`message.received` 可被同 conversation 的 active run
-   吸收，原事件写 EventLog / Transcript，dispatch 行为写入 EventLog metadata。
-4. Host 对单 run steering queue 设置内存上限，队列满时不再 claim 新消息，消息回到
-   正常 dispatch 路径，避免 active run 无限吞入同会话输入。
-
-### 1.4 边界
-
- 不引入 Host 替 runner 做 prompt 拼接：Host 只递队列，注入位置和格式由 runner 决定。
- 不与 observer / fan-out 混淆：steering 仍是单 run 内的输入补充，不产生第二个 runner。
- 远程 / 外部 harness runner（claude-code、codex 等）若其底层 session 自带
-  steering 能力，adapter 可以直接转发；协议面保持一致。
-
-## 2. Compaction Checkpoint 持久化
-
-### 2.1 问题
-
-local-agent 当前是无状态 runner：每次 run 重新拉取 transcript 尾部
-（默认 50 条）、重新估算 token、重新生成压缩摘要。后果：
-
- 长会话中每 run 重复压缩计算，摘要每次重新生成，不同 run 之间措辞漂移，
-  对 provider KV cache 不友好（AGENT_CONTEXT_PROTOCOL §"Summary checkpoint 稳定"
-  已写明期望：只有压缩发生时才产生新 checkpoint）。
- 历史一旦超过 fetch limit，更早的内容永久不可见——没有 checkpoint 记录
-  "已压缩到哪里、压缩出了什么"。
-
-pi-agent-core 把 compaction 条目持久化进 session tree：摘要带
-`tokensBefore` 和覆盖范围，后续 turn 直接复用，只在再次越过阈值时增量压缩。
-
-### 2.2 现状盘点
-
-协议面和主消费路径已具备：
-
- State / Storage API 已定义（PROTOCOL_V1 §8 "State / Storage"），
-  且 AGENT_CONTEXT_PROTOCOL 已点名 `summary.checkpoint` 是 state 的预期用法。
- Host 会根据 binding state policy 暴露 `ContextAccess.available_apis.state`。
- local-agent 会在 state API 可用时读取/写入 `runner.compaction.checkpoint`；
-  缺失、schema 不匹配、conversation 不匹配或游标失败时回退尾部历史拉取。
- LLM 生成摘要**不依赖**本项 Host 能力——runner 用已授权的 `invoke_llm`
-  即可生成；checkpoint 只解决"存下来、下次复用"。
-
-### 2.3 设计方向
-
- **存放位置**：state，scope=`conversation`（小 JSON，符合 PROTOCOL_V1 §8
-  对 state/storage 的边界建议）。若未来摘要膨胀，超出部分放 storage 并在
-  state 中留引用。
- **key 约定**：`runner.compaction.checkpoint`（runner 命名空间内）。
- **内容约定**（schema 落 PROTOCOL_V1 或 runner 文档，此处只列语义）：
-  - `schema_version`
-  - `summary`：压缩摘要文本（LLM 生成或确定性生成）
-  - `covers_until`：已被摘要覆盖的 transcript 游标（seq / message id），
-    是增量压缩和"从哪继续拉历史"的锚点
-  - `tokens_before` / `created_at`：诊断与失效判断
- **消费流程**：run 开始时读 checkpoint → 只拉取 `covers_until` 之后的
-  transcript → 压缩触发时基于旧摘要增量生成新摘要、写回新 checkpoint。
-  checkpoint 缺失或解析失败时回退到现行为（全量拉尾部），保证向后兼容。
- **失效规则**：`covers_until` 在 Host transcript 中不存在（会话被清理 / 重置）
-  即作废；runner 不得信任跨 conversation 的 checkpoint。
- **授权**：Host 对声明需要 state 的 runner binding 开启
-  `available_apis.state`；校验沿用现有 run-scoped state 校验
-  （scope、key、value 大小、JSON 可序列化，见 PROTOCOL_V1 §7.2 对
-  `state.updated` 的要求）。
-
-### 2.4 相关但独立的工作
-
- **tokenizer / usage metadata 透传**：runner 目前用 chars/4 启发式估 token，
-  对 CJK 偏低 3-4 倍，压缩触发系统性偏晚。Host 应在模型响应或
-  `ctx.runtime.metadata` 透传 provider usage（prompt/completion tokens）与
-  model context window（LiteLLM model-info 工作）。该项不阻塞 checkpoint
-  落地，但决定压缩触发的准确性。
-
-## 3. 实施拆分
-
-| 项 | 归属 | 依赖 |
-| --- | --- | --- |
-| steering queue、事件认领、基础审计 | LangBot Host（dispatch / binding 层） | 已落地，含队列上限与未消费 dropped 终态 |
-| steering pull API + capability 位 | PROTOCOL_V1 + SDK proxy | 已落地 |
-| turn 边界拉取与注入 | langbot-local-agent | 已落地 |
-| local-agent 对 state API 的 checkpoint 读写 | langbot-local-agent | 已落地 |
-| checkpoint key / 内容 / 失效约定 | PROTOCOL_V1 + local-agent README | 已落地 |
-| LLM 压缩摘要生成 | langbot-local-agent | 已落地（`invoke_llm`，失败回退确定性摘要） |
-| usage / context-window metadata 透传 | LangBot Host（model 层） | LiteLLM model-info |
-
-剩余工作应优先补 usage / context-window metadata。streaming delivery 衔接依赖
-`ctx.delivery` 编辑/追加语义，不建议在协议能力缺失时硬编码。
-
-## 4. 开放问题
-
- streaming delivery 下 steering 注入后，前序 turn 已流出的内容与新 turn
-  输出在 IM 消息编辑面的衔接（涉及 `ctx.delivery` 能力，待 delivery 演进定）。
- checkpoint 是否需要 Host 侧主动失效通知（如会话清空时删除对应 state key）。
-  当前实现靠 runner 读取时校验并回退，功能不阻塞。
@@ -1,211 +0,0 @@
-# Agent Runner Security Boundary
-
-本文档记录 agent-runner 插件化后的安全边界和最小护栏。
-
-## 状态
-
-**当前结论：不采用高强度监管模型。**
-
-LangBot 的目标不是托管一个强隔离、不可信 code runner 平台。AgentRunner 插件，尤其是 ACP / Claude Code / Codex / OpenCode / Kimi Code 这类外部 harness，默认视为 **operator-owned execution**：用户或部署者显式配置并承担其文件系统、进程、网络、workspace、provider 登录态和 native tool 风险。
-
-LangBot 需要负责的是保护 **LangBot 自己持有的资源**，包括模型、知识库、LangBot tools、history、event、state、plugin/workspace storage、sandbox/workspace 文件访问等。只要这些资源访问是 run-scoped、permission-scoped、可校验、可诊断的，当前阶段即可接受。
-
-这意味着：
-
- 不要求 LangBot 在应用层实现完整 OS sandbox、VM、cgroup、seccomp、CPU / memory / network quota。
- 不要求为 ACP runner 做复杂审批流；用户选择 ACP runner 即表示显式 opt-in。
- 不要求在非 Docker 进程部署里做强监管；只要文档明确风险归属即可。
- Docker / K8s 可以提供部署级隔离，但不是 LangBot agent-runner 协议发布的前置条件。
- 不能宣传 LangBot 已经提供 managed sandbox；除非未来真的提供受管执行环境。
-
-## 责任边界
-
-### LangBot Host 负责
-
- **资源授权**：根据 runner manifest permissions、binding resource policy、run scope 生成本次 run 可访问的资源快照。
- **运行期校验**：所有带 `run_id` 的 SDK / Host action 必须校验 active run session、caller plugin identity、resource id 和 operation。
- **Scoped projection**：只把授权后的资源摘要、MCP server config、context、attachment/path ref、state snapshot 投影给 runner。
- **LangBot 文件路径约束**：LangBot 自己 staged 和读取的文件必须限制在声明 root 内，防止 path escape。
- **基础 secret 策略**：不要主动把 LangBot 持有的 API key / token / secret 投影给 runner；日志和错误里做常见 secret 字段脱敏。
- **基础运行约束**：提供 timeout、取消传播、输出大小限制或错误映射的基础能力。
- **audit-lite**：记录 event、run id、runner id、binding、资源授权摘要、关键失败、state/file/transcript 事实。
-
-### Runner Plugin 负责
-
- 遵守 Host 下发的 `ctx.resources`、`ctx.context.available_apis`、runner config 和 state policy。
- 把 LangBot 资源投影成目标平台可消费的形式，例如 MCP config、context prompt、HTTP header、run token。
- 不绕过 SDK / Host action 直接访问 LangBot 内部资源。
- 对自己启动的外部进程做合理封装，包括参数构造、timeout、取消、输出解析和错误映射。
- 清楚记录自身 README 中的 provider 风险、部署假设和限制。
-
-### 部署者 / 用户负责
-
- ACP / external harness 的 workspace 内容、文件系统访问、进程权限、网络访问、provider-native tool 权限。
- Docker / K8s 的 image、volume、secret、network policy、resource limit、namespace、service account 配置。
- 本机进程部署时的 OS 用户权限、PATH、HOME、CLI 登录态、全局配置和外部 MCP 配置。
- 是否允许 runner 对某个目录执行真实写操作。
-
-### 外部 Harness 负责
-
-Claude Code、Codex、OpenCode、Kimi Code、Gemini CLI 等外部工具继续使用自己的权限模型、MCP 加载策略、session/resume、sandbox 或 approval 能力。LangBot 不承诺约束这些工具对其所在容器或宿主 OS 用户本来可访问资源的能力。
-
-## 部署场景策略
-
-| 场景 | LangBot 策略 | 不由 LangBot 承担 |
-| --- | --- | --- |
-| 普通进程部署 | 文档提示 operator-owned execution；Host 只保护 LangBot 资源。 | 阻止外部 CLI 读取同一 OS 用户可访问的文件、进程、HOME、全局 CLI 配置。 |
-| Docker / K8s 部署 | 继续使用相同 Host 资源边界；容器隔离由部署环境提供。 | 应用层重复实现容器/VM/cgroup/seccomp/network quota。 |
-| ACP runner | 用户显式选择 runner 和 workspace；LangBot 注入 scoped MCP / run token。 | ACP CLI native tools、workspace 写入、provider 登录态和外部 MCP 行为。 |
-| 外部 SaaS runner，例如 Dify | LangBot 通过 run token / gateway 限制 LangBot 资产访问。 | SaaS 平台内部 agent 执行策略、模型工具消息格式、平台侧日志。 |
-| 未来 managed runner | 只有当 LangBot 明确提供受管执行环境时，才需要单独定义强隔离 SLA。 | 当前协议闭环不承诺 managed sandbox。 |
-
-## 最小护栏
-
-以下是当前阶段需要维持的最小要求。它们是保护 LangBot 资源边界的要求，不是完整监管外部进程的要求。
-
-### Resource Permission Boundary
-
-每次 run 前必须冻结授权快照：
-
- runner manifest permissions 是资源访问上限。
- binding resource policy / runner config 决定本次实际授权。
- runtime action 按 `run_id` + `caller_plugin_identity` + resource id + operation 校验。
- manifest permissions 只约束 LangBot 持有资源，不约束 external harness native tools。
-
-当前实现方向是正确的：`AgentRunSessionRegistry` 保存 run-scoped snapshot，`plugin/handler.py` 对模型、工具、知识库、history、state、storage 等 action 做运行期校验，sandbox/workspace 文件访问由 scoped tool 边界控制。
-
-**Skill 读写门控（不可弱化）**：pipeline-visible 的 skill 一次性以 `rw` 挂进同一 sandbox，mount 层不区分「可见」与「已激活」；写类 native 操作（write/edit/exec）只放行 activated skill，读类放行 visible + activated——这层区分等同资产授权语义，必须保留。skill 全 tool 化后尤其注意：「都是 tool」不等于「只控资产授权即可」，native 层的 visible/activated 门控不能砍。可弱化的只是 realpath 越界字符串检查（有 chroot/namespace 兜底）。
-
-### MCP / Asset Gateway Boundary
-
-LangBot MCP / asset gateway 只暴露当前 run 授权的工具面：
-
- `langbot_list_assets`
- `langbot_get_current_event`
- `langbot_history_page`
- `langbot_retrieve_knowledge`
- `langbot_get_tool_detail`
- `langbot_call_tool`
-
-外部平台需要使用短期 `run_token` 或 Authorization bearer token。token 缺失、错误或过期时必须拒绝访问。
-
-不要求当前阶段实现 admin 级 MCP allowlist、dangerous tool approval 或复杂审批流。是否注册外部 MCP provider 是部署者/用户行为。
-
-### Workspace / Path Boundary
-
-LangBot 只需要约束自己管理的路径：
-
- Host staged 文件必须校验 `realpath` 和 root containment。
- Attachment/file metadata 不应暴露 Host-only storage key / host path。
- Context 文件、sandbox/workspace 文件如由 LangBot 创建，应放在可清理的位置。
-
-用户配置给 ACP runner 的 workspace 不属于 LangBot 的强监管范围。Docker/K8s 下依赖 volume 挂载边界；普通进程部署下依赖 OS 用户权限和用户自担风险。
-
-### Secret Handling
-
-这里的 secret 指 API key、provider token、run token、MCP token、platform secret、数据库密码等。
-
-当前阶段只要求基础策略：
-
- LangBot 不主动把自己持有的 secret 投影给 runner，除非这是 runner config 明确需要的外部服务凭据。
- run token 是短期、run-scoped 的，不应长期保存。
- 日志、错误、transcript、attachment/file metadata 尽量避免打印常见 secret 字段。
- 配置 UI / API 返回时继续沿用现有 secret masking 规则。
-
-不要求当前阶段实现完整 DLP、全链路敏感数据追踪、secret lineage 或自动轮换体系。
-
-### Process / Runtime Bounds
-
-LangBot 需要提供基本可控性：
-
- Host run deadline / runner timeout。
- runner 侧请求 timeout。
- generator close / cancel 传播。
- 输出和 inline payload size 上限。
- 错误映射为受控 runner failure。
-
-不要求 LangBot 为外部 harness 实现 CPU、内存、磁盘、网络、进程树强隔离。需要这些能力时由 Docker/K8s、systemd、容器平台或用户机器策略提供。
-
-### UI / Admin Surface
-
-前端可以展示 runner 权限摘要，但它是信息披露，不是审批系统。
-
-权限摘要指 runner manifest 声明的 LangBot 资源权限，例如：
-
- `tools.detail`
- `tools.call`
- `knowledge_bases.retrieve`
- `history.page`
- `storage.plugin`
-
-当前阶段不要求强制弹窗、管理员审批、dangerous tool approval 或生产禁用开关。可以在 runner 配置区展示简短提示：此 runner 能访问哪些 LangBot 资源，外部 harness 执行风险由用户/部署者承担。
-
-### Audit Lite
-
-需要记录足够排查问题的事实：
-
- run id、runner id、binding、event。
- 授权资源摘要。
- state update、file write/read event、transcript message。
- MCP / pull API 拒绝时的 warning。
- steering queued / injected / dropped。
-
-不要求当前阶段建立独立安全审计产品、审批记录系统或 SIEM 级事件模型。
-
-## 降级后的检查表
-
-| 项目 | 当前要求 | 状态判断 |
-| --- | --- | --- |
-| Path isolation | 只约束 LangBot 管理的 context/sandbox 文件路径；runner workspace 归用户/部署环境。 | Minimal required |
-| Permission boundary | 必须保护 LangBot 资源；不约束外部 CLI native 能力。 | Required |
-| Secret handling | 基础不投影、基础 masking、run token 短期化。 | Basic required |
-| MCP policy | run-scoped token + scoped tool surface；无复杂审批。 | Required |
-| Skill access policy | skill 通过 Host 授权 tool 暴露（发现 / activate / register / native exec 走统一 tool 授权）；**native 层 visible（只读）vs activated（可写）门控不可弱化**——所有 pipeline-visible skill 以 `rw` 挂进同一 sandbox，读写区分全靠 native 层；harness-native skill 文件不作为 LangBot 安全边界。 | Required |
-| Process isolation | 由 Docker/K8s/用户机器负责。 | Out of scope |
-| State lifecycle | scope 隔离、JSON size limit、基础 cleanup primitive。 | Basic required |
-| Audit | 记录运行事实和拒绝原因。 | Audit-lite |
-| UI / Admin control | 权限摘要可展示；不要求审批流。 | Optional |
-| Test matrix | 覆盖 run auth、MCP token、permission deny、timeout、sandbox path、state size。 | Focused tests |
-
-## 当前实现快照
-
-截至 2026-06-15，已有实现覆盖：
-
- SDK typed AgentRunner manifest、capabilities、permissions。
- Host resource builder 按 manifest permissions 和 binding policy 生成 `ctx.resources`。
- Active run session snapshot 和 `caller_plugin_identity` 校验。
- History / event / state / tool / knowledge runtime action 的 run-scoped 校验。
- Sandbox file path `realpath` + root containment。
- Persistent state scope 隔离和 JSON size limit。
- SDK-owned MCP bridge 和 long-lived asset gateway。
- Dify / ACP runner 对 LangBot asset gateway 的接入。
- Runner timeout、Dify HTTP timeout、ACP startup / initialize / request timeout。
-
-仍可继续优化但不阻塞当前发布的事项：
-
- 前端展示 runner LangBot 资源权限摘要。
- 常见 secret 字段 redaction 收敛成统一 helper。
- Context/sandbox file TTL cleanup 调度。
- 更完整的 MCP 调用 audit。
- 更好的文档提示：ACP runner 是 operator-owned execution。
-
-## 非目标
-
-以下不属于当前 agent-runner pluginization 的安全目标：
-
- 防止 ACP / external harness 修改其 workspace。
- 防止外部 CLI 读取同一容器或 OS 用户本来可读的文件。
- 管控 external harness 的 provider-native tools、approval、MCP、browser、shell。
- 在 LangBot 应用层实现 VM / container / cgroup / seccomp / network policy。
- 为 Docker/K8s 部署替代平台自身的 secret、volume、network、resource limit 管理。
- 实现企业级审批系统、SIEM、DLP 或安全运营面板。
-
-## 发布口径
-
-可以对外说明：
-
-> AgentRunner 插件通过 run-scoped authorization 和 scoped MCP gateway 保护 LangBot 持有资源。外部 code harness 的执行环境由用户或部署平台负责隔离；LangBot 当前不提供 managed sandbox。
-
-不能对外说明：
-
-> LangBot 已经安全沙箱化 Claude Code / Codex / OpenCode 等外部 runner。
@@ -1,59 +0,0 @@
-# AgentRunner Pluginization Status
-
-本文档是 `docs/agent-runner-pluginization/` 的状态事实源。协议 schema 仍以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) 为准；测试步骤以 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md) 为准；安全发布门槛以 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md) 为准。
-
-状态快照日期：2026-06-20。
-
-## 实现状态
-
-| 领域 | 状态 | 说明 |
-| --- | --- | --- |
-| SDK manifest schema | Done | `AgentRunnerManifest` 包含 typed `capabilities` / `permissions`；未知 capability / permission key 禁止进入 typed model。 |
-| Runner discovery | Done | Runtime 返回 typed manifest；Host registry 校验单个 runner，失败 warning + skip，不影响其它 runner。 |
-| Host resource authorization | Done | `ctx.resources` 和 `ctx.context.available_apis` 由 manifest permissions 与 binding policy / run scope 求交后生成。 |
-| Run authorization snapshot | Done | active run session 冻结 run-scoped resources 与 available APIs；runtime handler 按 snapshot 校验 pull API。 |
-| Result payload validation | Done | Wire 保持 `{type, data}`；Host 对投递/副作用类 payload 严格校验，tool-call telemetry 宽松，未知 type 忽略并 warning。 |
-| Old built-in runners | Done | 旧 `src/langbot/pkg/provider/runners/*` 与 `RequestRunner` 路径已从本分支删除。 |
-| Official runner manifests | Done | `local-agent`、ACP / Claude Code / Codex 外部 harness runner、外部服务 runner 已重新声明真实生效的 LangBot resource permissions。 |
-| Skill 链路 | Broken → Redesigning | 分支上 skill 激活链端到端悬空：`activate` 调用未定义的 `persist_activated_skill`（运行即 `AttributeError`）、`host.activated_skills` 只读不写、skill awareness 既未注入也未被 runner 消费。已拍板改为 **skill 全 tool 化**：发现走 `list_skills` / `langbot_list_assets` 加 skills 一类，`activate` / `register_skill` 走统一 tool 授权，`skill_authoring` capability 降级为便捷开关，host 直接写 `host.activated_skills`（last-write-wins）。 |
-| Runtime Control Plane v2 foundation | Partial | Host-owned `AgentRun` / `AgentRunEvent` ledger、orchestrator 自动建账、result event persistence、run get/list/event page/cancel/append/finalize actions 已落地；`agent_run:admin` / `runtime:admin` 控制权限、最小 runtime register/heartbeat/list/reconcile 和 run claim/renew/release 原语已落地。完整 Agent Platform 产品形态、daemon supervisor、任务唤醒/长轮询/WebSocket、分布式 runtime 管控仍未完成。 |
-| Security boundary | Done | 当前口径降级为轻量边界：LangBot 保护自身持有资源；external harness 的 OS / process / network / workspace 风险由用户或部署环境承担；managed sandbox 不是当前承诺。 |
-| Steering control path | Done | claim 异常不再逃逸 consumer loop；queue 有上限；未 pull 的 claimed 输入在 run 结束时写 `steering.dropped` 审计终态。 |
-| SDK v1 contract closure | Done | SDK 提供 `AgentAPIError` / `AgentAPIException`、typed `SteeringPullResult`、未知 result type 宽容解析、result `sequence` 注入与取消传播。 |
-
-## Spec 与实现已知差距
-
- `action.requested` 仍只作为 telemetry / reserved surface；platform action executor 不在本分支执行。
- EventGateway / EventRouter 完整实现由外部 EBA 分支联调；本分支只提供 event-first host envelope / binding / run 入口。
- State 与 storage 的长期类型边界仍可继续收窄；当前合同只要求 JSON-safe state 与受控 storage API。
- `ToolResource` 当前只带 `tool_name` / `tool_type` / `description` / `operations`，不含 `parameters` full schema；runner（如 local-agent `build_llm_tools`）需逐个 `get_tool_detail` 往返。拟在 `ToolResource` 增补 `parameters`，由 Host 在构造 `ctx.resources` 时一次塞齐。
- EventLog / Transcript 已提供显式 cleanup primitive；长期 retention 默认值、TTL 调度接入和 sandbox/workspace 文件清理仍是运维收尾项，应在 Runtime Control Plane 产品化前补齐。
- External harness 的 native shell / filesystem / CLI / MCP 权限不受 manifest permissions 约束；manifest permissions 只约束 LangBot 持有的资源访问。
- LangBot 当前不承诺 managed sandbox；external harness 的 OS/process/network quota、workspace GC、provider-native tool 权限由用户或部署环境承担。
- Runtime Control Plane v2 当前只落地 Host 事实源和控制原语；还没有内置 Agent Platform UI、业务队列、daemon 进程托管、runtime wakeup channel、跨 Host 分布式锁或 provider 登录态诊断。
-
-## Runner 验收状态
-
-| Runner | 状态 | 最近证据 |
-| --- | --- | --- |
-| `plugin:langbot/local-agent/default` | Unit-pass; UI smoke pending | 2026-06-10 本地 pytest / ruff 通过；WebUI smoke 由人工统一执行。 |
-| `plugin:langbot/acp-agent-runner/default` / `plugin:langbot/claude-code-agent/default` / `plugin:langbot/codex-agent/default` | Unit-pass; E2E pending | 通过 runner 仓库单测覆盖 session、run_id 注入和 LangBot MCP gateway；真实 harness E2E 取决于对应运行环境、CLI/daemon 可用性和 provider 登录态。 |
-| Dify / n8n / Coze / DashScope / Langflow / Tbox / DeerFlow / WeKnora | Unit-pass; credential smoke optional | 2026-06-13 plugin layout / parser tests 通过；真实服务凭据 smoke 非每轮必跑。 |
-
-## Host / SDK 验收状态
-
-| 范围 | 状态 | 最近证据 |
-| --- | --- | --- |
-| LangBot Runtime Control Plane v2 foundation | Unit-pass; product E2E pending | 2026-06-16 `tests/unit_tests/agent/test_run_ledger_store.py`、`test_run_ledger_api_auth.py`、`test_orchestrator_integration.py` 通过，覆盖 ledger、admin permissions、runtime heartbeat、claim/reconcile、orchestrator 持久化和取消传播。 |
-| SDK AgentRunner control entities / proxy | Unit-pass | 2026-06-16 SDK agent-runner 相关单测通过，覆盖 typed run ledger entities、AgentRunAPIProxy、MCP bridge、runtime manager 与 pull API handlers。 |
-
-## 历史高价值记录
-
-历史报告已合并为本状态页和 QA 指南，不再保留单独进度文档。后续若需要追溯，优先查看 `langbot-skills/reports/` 下的原始执行报告。
-
-截至 2026-05-29，已有本地 smoke 证明：
-
- `local-agent` 可以通过 Pipeline Debug Chat 走插件化 `AgentRunOrchestrator` 主链路。
- 外部 harness runner 可以通过同一条 `run(event, binding)` 路径执行；当前官方实现已收敛到 ACP / Claude Code / Codex 等直接 runner 插件。
-
-这些记录只证明本地协议闭环可用，不代表 LangBot 提供 managed sandbox 或 external harness OS 级隔离。
@@ -8,7 +8,7 @@ requires-python = ">=3.11,<4.0"
 dependencies = [
    "aiocqhttp>=1.4.4",
    "aiofiles>=24.1.0",
-    "aiohttp>=3.14.0",
+    "aiohttp>=3.14.1",
    "aioshutil>=1.5",
    "aiosqlite>=0.21.0",
    "anthropic>=0.51.0",
@@ -16,7 +16,7 @@ dependencies = [
    "async-lru>=2.0.5",
    "certifi>=2025.4.26",
    "colorlog~=6.6.0",
-    "cryptography>=46.0.7",
+    "cryptography>=48.0.1",
    "dashscope>=1.25.10",
    "dingtalk-stream>=0.24.0",
    "discord-py>=2.5.2",
@@ -61,9 +61,9 @@ dependencies = [
    "beautifulsoup4>=4.12.3",
    "ebooklib>=0.18",
    "html2text>=2024.2.26",
-    "langchain>=0.2.0",
+    "langchain>=1.3.9",
    "langchain-core>=1.3.3",
-    "langsmith>=0.8.0",
+    "langsmith>=0.8.18",
    "python-multipart>=0.0.27",
    "Mako>=1.3.12",
    "langchain-text-splitters>=1.1.2",
@@ -26,7 +26,7 @@ and LangBot's own Local Agent) working with the LangBot ecosystem.

 ## Quick start (for an AI agent)

-1. Read this README, `AGENTS.md`, and `qa-agent-docs/` to understand the layout.
+1. Read this README, `AGENTS.md`, and `docs/user-guide.md` to understand the layout.
 2. Read `skills/.env` for shared local defaults. On a new machine, copy
   `skills/.env.example` to `skills/.env.local` (gitignored) and override
   machine-specific values there. Never commit secrets.
@@ -48,6 +48,7 @@ bin/lbs env show     # inspect resolved env defaults (redacted)
 bin/lbs env doctor   # diagnose local environment readiness
 bin/lbs case list --ready
 bin/lbs test plan <case-id>
+bin/lbs suite plan langbot-debug-chat-load-gate
 ```

 ## Maintenance rule
@@ -0,0 +1,171 @@
+# LangBot QA Skills User Guide
+
+Use this guide as the first operational path after reading `README.md` and
+`AGENTS.md`.
+
+## 1. Configure Local Inputs
+
+Read `skills/.env`, then create `skills/.env.local` for machine-local values.
+Do not commit `.env.local`, browser profiles, reports, tokens, API keys, OAuth
+state, or provider credentials.
+
+Minimum local fields for live browser QA:
+
+```bash
+LANGBOT_REPO=/path/to/LangBot
+LANGBOT_WEB_REPO=/path/to/LangBot/web
+LANGBOT_BACKEND_URL=http://127.0.0.1:5300
+LANGBOT_FRONTEND_URL=http://127.0.0.1:3000
+LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:3000
+LANGBOT_BROWSER_PROFILE=/path/to/langbot-browser-profile
+LANGBOT_CHROMIUM_EXECUTABLE=/path/to/chromium-or-playwright-chrome
+LANGBOT_E2E_LOGIN_USER=qa-local@example.com
+```
+
+`LANGBOT_E2E_LOGIN_USER` is a local QA account. The setup automation uses the
+LangBot recovery key from the active checkout to initialize or refresh that
+local account and write a browser `localStorage` token. It does not need the
+user's GitHub or Space credentials.
+
+## 2. Check Readiness
+
+From `skills/`:
+
+```bash
+bin/lbs env show
+bin/lbs env doctor
+bin/lbs validate
+bin/lbs index --check
+```
+
+`env doctor` should report reachable backend and frontend URLs before live
+browser cases are run. Missing Space provider credentials are not a LangBot
+product pass; classify them as `env_issue` and configure the local Space
+provider before measuring Debug Chat performance.
+
+## 3. Start Services
+
+Start the backend from `LANGBOT_REPO`:
+
+```bash
+cd "$LANGBOT_REPO"
+uv run main.py
+```
+
+Start the standalone frontend from `LANGBOT_WEB_REPO` and point it at the
+backend:
+
+```bash
+cd "$LANGBOT_WEB_REPO"
+VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
+```
+
+If `VITE_API_BASE_URL` is missing, browser tests can load the Vite page but send
+API requests to the frontend port, which produces false UI failures.
+
+## 4. Prepare User-Path Fixtures
+
+For local-agent Debug Chat cases and the user-path performance gate:
+
+```bash
+node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env
+```
+
+The script:
+
+- refreshes the local QA login and browser token;
+- marks the local wizard as skipped;
+- creates or updates a local QA pipeline;
+- scans Space LLM models, tests candidates, and switches to the first working
+  Space model with tested fallback models;
+- writes `LANGBOT_PIPELINE_URL`, `LANGBOT_PIPELINE_NAME`, and local-agent
+  pipeline/model variables into `skills/.env.local`;
+- returns `env_issue` when no Space model can be scanned or tested.
+
+Useful model controls:
+
+```bash
+LANGBOT_E2E_MODEL_TEST_LIMIT=8
+LANGBOT_E2E_MODEL_FALLBACK_COUNT=3
+LANGBOT_E2E_SKIP_MODEL_UUIDS=uuid-a,uuid-b
+LANGBOT_E2E_SKIP_MODEL_NAMES=model-a,model-b
+LANGBOT_E2E_SCAN_SPACE_MODELS=true
+```
+
+The setup writes a current-runtime compatibility `max-round` value into the
+pipeline config because this backend still reads that field directly during
+message truncation. Do not treat it as a long-term QA contract.
+
+## 5. Run Gates
+
+Fast contract gate, no live service required:
+
+```bash
+bin/lbs suite run langbot-performance-contract-gate --run-id langbot-contract-local
+```
+
+Live backend gate:
+
+```bash
+bin/lbs suite run langbot-live-backend-gate --run-id langbot-backend-local
+```
+
+Browser-visible user-path performance gate:
+
+```bash
+bin/lbs suite plan langbot-user-path-performance-gate
+bin/lbs suite run langbot-user-path-performance-gate --run-id langbot-user-path-local --include-manual-check
+```
+
+Controlled Debug Chat message-path load gate (manual/non-required; run fake-provider cases serially when they share `LANGBOT_FAKE_PROVIDER_URL`):
+
+```bash
+bin/lbs suite plan langbot-debug-chat-load-gate
+bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
+bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
+bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
+bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
+```
+
+Cross-pipeline Debug Chat isolation is a separate manual regression gate because
+current releases may fail it due to product bug #2286:
+
+```bash
+bin/lbs suite plan langbot-debug-chat-isolation-gate
+bin/lbs suite run langbot-debug-chat-isolation-gate --run-id langbot-debug-chat-isolation-local --include-manual-check
+```
+
+Start with `langbot-fake-provider-debug-chat-load`. It launches a local
+OpenAI-compatible fake provider, creates the matching provider/model/pipeline,
+then sends concurrent WebSocket Debug Chat messages through the real backend.
+Use `langbot-fake-provider-debug-chat-slow-load` to measure the same path under
+deterministic streaming latency. Use
+`langbot-fake-provider-debug-chat-fault-recovery` to inject bounded provider
+HTTP failures and confirm later Debug Chat requests recover. Use the separate
+`langbot-debug-chat-isolation-gate` to verify that concurrent Debug Chat traffic
+on two pipelines does not leak assistant responses across pipeline boundaries;
+current releases may fail that gate because of #2286, so keep it out of the
+normal load gate until the product fix lands.
+Use `langbot-space-debug-chat-concurrency-smoke` only as a low-volume live
+provider smoke; it includes Space/model/network latency and should be compared
+against the fake-provider baseline before attributing failures to LangBot.
+
+`manual_check` means the agent must confirm the declared preconditions for that
+run window. When setup automation is declared, run output may stop early with
+`env_issue`; fix that environment input before treating the product path as
+measured.
+
+## 6. Read Results
+
+Suite reports live under `skills/reports/`. Evidence lives under
+`skills/reports/evidence/<run-id>/`.
+
+For performance cases, inspect:
+
+- `metrics.json` for p50/p95/p99, error rate, and total duration;
+- `automation-result.json` for threshold decisions and artifacts;
+- `console.log` and `network.log` for frontend/API failures;
+- backend logs for provider, runner, WebSocket, or persistence failures.
+
+Do not call a user-path performance result a LangBot overhead regression until
+provider/tool/network time has been separated or ruled out.
@@ -48,7 +48,18 @@
    },
    "type": {
      "type": "string",
-      "enum": ["smoke", "regression", "feature", "provider", "exploratory"]
+      "enum": [
+        "smoke",
+        "regression",
+        "feature",
+        "provider",
+        "exploratory",
+        "contract",
+        "performance",
+        "reliability",
+        "chaos",
+        "security"
+      ]
    },
    "priority": {
      "type": "string",
@@ -102,7 +113,11 @@
          "backend_log",
          "frontend_log",
          "api_diagnostic",
-          "filesystem"
+          "filesystem",
+          "metrics",
+          "trace",
+          "profile",
+          "resource_log"
        ]
      },
      "minItems": 1
@@ -188,9 +203,101 @@
      "type": "string",
      "enum": ["person", "group"]
    },
+    "automation_debug_chat_response_p95_ms": {
+      "type": "string"
+    },
+    "automation_debug_chat_max_error_rate": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_requests": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_concurrency": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_timeout_ms": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_response_p95_ms": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_first_response_p95_ms": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_max_error_rate": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_min_error_rate": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_min_error_count": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_min_ok_count": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_min_provider_fault_count": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_expected_prefix": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_prompt_template": {
+      "type": "string"
+    },
+    "automation_debug_chat_load_stream": {
+      "type": "string",
+      "enum": ["0", "1", "false", "true"]
+    },
+    "automation_debug_chat_load_reset": {
+      "type": "string",
+      "enum": ["0", "1", "false", "true"]
+    },
+    "automation_debug_chat_load_fail_on_final_mismatch": {
+      "type": "string",
+      "enum": ["0", "1", "false", "true"]
+    },
+    "automation_fake_provider_response_text": {
+      "type": "string"
+    },
+    "automation_fake_provider_first_token_delay_ms": {
+      "type": "string"
+    },
+    "automation_fake_provider_chunk_delay_ms": {
+      "type": "string"
+    },
+    "automation_fake_provider_chunk_count": {
+      "type": "string"
+    },
+    "automation_fake_provider_fail_first_n": {
+      "type": "string"
+    },
+    "automation_fake_provider_fail_every_n": {
+      "type": "string"
+    },
+    "automation_fake_provider_fault_status": {
+      "type": "string"
+    },
+    "automation_fake_provider_fail_after_first_chunk": {
+      "type": "string",
+      "enum": ["0", "1", "false", "true"]
+    },
+    "automation_fake_provider_dynamic_response": {
+      "type": "string",
+      "enum": ["0", "1", "false", "true"]
+    },
    "automation_filesystem_checks_json": {
      "type": "string"
    },
+    "metrics_thresholds_json": {
+      "type": "string"
+    },
+    "load_profile_json": {
+      "type": "string"
+    },
+    "fault_model_json": {
+      "type": "string"
+    },
    "automation_pipeline_url_env": {
      "type": "string",
      "pattern": "^[A-Z][A-Z0-9_]*$"
@@ -18,7 +18,17 @@
    },
    "type": {
      "type": "string",
-      "enum": ["smoke", "regression", "release_gate", "exploratory"]
+      "enum": [
+        "smoke",
+        "regression",
+        "release_gate",
+        "exploratory",
+        "contract",
+        "performance",
+        "reliability",
+        "chaos",
+        "security"
+      ]
    },
    "priority": {
      "type": "string",
@@ -0,0 +1,205 @@
+#!/usr/bin/env node
+
+import { spawn } from "node:child_process";
+import { mkdir, readFile, writeFile } from "node:fs/promises";
+import { dirname, resolve } from "node:path";
+import { env } from "node:process";
+import {
+  appendLine,
+  ensureEvidence,
+  evidencePaths,
+  loadEnvFiles,
+  redact,
+  writeResult,
+} from "./lib/langbot-e2e.mjs";
+
+const caseId = "ensure-fake-provider-cross-pipelines";
+const DEFAULT_PIPELINE_A_NAME = "LangBot QA Fake Provider Debug Chat A";
+const DEFAULT_PIPELINE_B_NAME = "LangBot QA Fake Provider Debug Chat B";
+
+await loadEnvFiles();
+const paths = evidencePaths(caseId);
+await ensureEvidence(paths);
+
+const writeEnv = process.argv.includes("--write-env");
+const envLocalPath = resolve("skills/.env.local");
+const pipelineAName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || DEFAULT_PIPELINE_A_NAME;
+const pipelineBName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || DEFAULT_PIPELINE_B_NAME;
+
+const result = {
+  source: "setup_automation",
+  case_id: caseId,
+  run_id: paths.runId,
+  status: "fail",
+  reason: "",
+  pipeline_a: {
+    name: pipelineAName,
+    id: "",
+    url: "",
+  },
+  pipeline_b: {
+    name: pipelineBName,
+    id: "",
+    url: "",
+  },
+  fake_provider: {
+    url: "",
+    base_url: "",
+    pid: null,
+  },
+  wrote_env: false,
+  evidence: {
+    console_log: paths.consoleLog,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  },
+  evidence_collected: ["api_diagnostic", "filesystem"],
+};
+
+try {
+  console.error(`[langbot-qa] configuring cross-pipeline QA fixtures: pipeline_a=\"${pipelineAName}\", pipeline_b=\"${pipelineBName}\"`);
+  console.error("[langbot-qa] run these fake-provider setup/probe commands serially when they share LANGBOT_FAKE_PROVIDER_URL.");
+  if (pipelineAName === pipelineBName) {
+    throw new Error("LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME and LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME must be different.");
+  }
+
+  const setupA = await runPipelineSetup(pipelineAName, "A");
+  const setupB = await runPipelineSetup(pipelineBName, "B");
+  result.pipeline_a = {
+    name: setupA.pipeline_name || pipelineAName,
+    id: setupA.pipeline_id || "",
+    url: setupA.pipeline_url || "",
+  };
+  result.pipeline_b = {
+    name: setupB.pipeline_name || pipelineBName,
+    id: setupB.pipeline_id || "",
+    url: setupB.pipeline_url || "",
+  };
+  result.fake_provider = {
+    url: setupB.fake_provider?.url || setupA.fake_provider?.url || "",
+    base_url: setupB.fake_provider?.base_url || setupA.fake_provider?.base_url || "",
+    pid: setupB.fake_provider?.pid ?? setupA.fake_provider?.pid ?? null,
+  };
+
+  if (!result.pipeline_a.url || !result.pipeline_b.url || !result.fake_provider.url) {
+    throw new Error("Cross-pipeline fake provider setup did not return both pipeline URLs and provider URL.");
+  }
+
+  if (writeEnv) {
+    await upsertEnvLocal(envLocalPath, {
+      LANGBOT_FAKE_PROVIDER_URL: result.fake_provider.url,
+      LANGBOT_FAKE_PROVIDER_BASE_URL: result.fake_provider.base_url,
+      LANGBOT_FAKE_PROVIDER_PID: result.fake_provider.pid ? String(result.fake_provider.pid) : "",
+      LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL: result.pipeline_a.url,
+      LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME: result.pipeline_a.name,
+      LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL: result.pipeline_b.url,
+      LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME: result.pipeline_b.name,
+    });
+    result.wrote_env = true;
+  }
+
+  result.status = "pass";
+  result.reason = "Fake provider cross-pipeline fixtures are configured.";
+} catch (error) {
+  result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
+  result.reason = safeReason(error.message);
+} finally {
+  await writeResult(paths, result);
+  console.log(JSON.stringify(result, null, 2));
+}
+
+process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
+
+function runPipelineSetup(pipelineName, label) {
+  return new Promise((resolvePromise, rejectPromise) => {
+    const child = spawn(process.execPath, ["scripts/e2e/ensure-fake-provider-pipeline.mjs"], {
+      cwd: resolve("."),
+      env: {
+        ...env,
+        LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
+        LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS || "25",
+        LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS: env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS || "10",
+        LANGBOT_FAKE_PROVIDER_CHUNK_COUNT: env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT || "0",
+        LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N: "0",
+        LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N: "0",
+        LANGBOT_FAKE_PROVIDER_FAULT_STATUS: env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS || "500",
+        LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK: "false",
+        LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE: "true",
+      },
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    let stdout = "";
+    let stderr = "";
+    child.stdout.on("data", (chunk) => {
+      const text = chunk.toString();
+      stdout += text;
+      appendLine(paths.consoleLog, `[setup ${label} stdout] ${text.trimEnd()}`).catch(() => {});
+    });
+    child.stderr.on("data", (chunk) => {
+      const text = chunk.toString();
+      stderr += text;
+      appendLine(paths.consoleLog, `[setup ${label} stderr] ${text.trimEnd()}`).catch(() => {});
+    });
+    child.on("error", rejectPromise);
+    child.on("close", (code) => {
+      const parsed = parseJsonOutput(stdout);
+      if (code !== 0 || parsed.status !== "pass") {
+        rejectPromise(new Error(parsed.reason || stderr || `Fake provider pipeline setup ${label} exited with ${code}.`));
+        return;
+      }
+      resolvePromise(parsed);
+    });
+  });
+}
+
+function parseJsonOutput(text) {
+  const trimmed = String(text || "").trim();
+  if (!trimmed) return {};
+  try {
+    return JSON.parse(trimmed);
+  } catch {
+    const start = trimmed.indexOf("{");
+    const end = trimmed.lastIndexOf("}");
+    if (start >= 0 && end > start) {
+      try {
+        return JSON.parse(trimmed.slice(start, end + 1));
+      } catch {
+        return {};
+      }
+    }
+    return {};
+  }
+}
+
+async function upsertEnvLocal(path, updates) {
+  await mkdir(dirname(path), { recursive: true });
+  let text = "";
+  try {
+    text = await readFile(path, "utf8");
+  } catch {
+    text = "";
+  }
+  const lines = text.split(/\r?\n/);
+  const seen = new Set();
+  const next = lines.map((line) => {
+    const trimmed = line.trim();
+    const match = trimmed.match(/^([A-Z][A-Z0-9_]*)=/);
+    if (!match || updates[match[1]] === undefined) return line;
+    seen.add(match[1]);
+    return `${match[1]}=${updates[match[1]]}`;
+  });
+  for (const [key, value] of Object.entries(updates)) {
+    if (!seen.has(key)) next.push(`${key}=${value}`);
+  }
+  await writeFile(path, `${next.join("\n").replace(/\n+$/, "")}\n`, "utf8");
+}
+
+function looksLikeEnvIssue(error) {
+  const message = String(error?.message || error || "");
+  return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
+}
+
+function safeReason(value) {
+  return redact(String(value || "")).slice(0, 1000);
+}
@@ -0,0 +1,635 @@
+#!/usr/bin/env node
+
+import { spawn } from "node:child_process";
+import { open, readFile, mkdir, writeFile } from "node:fs/promises";
+import { dirname, resolve } from "node:path";
+import { env } from "node:process";
+import {
+  apiJson,
+  ensureEvidence,
+  evidencePaths,
+  loadEnvFiles,
+  redact,
+  resetAndAuthLocalUser,
+  writeResult,
+} from "./lib/langbot-e2e.mjs";
+
+const RUNNER_ID = "local-agent";
+const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
+const DEFAULT_PIPELINE_NAME = "LangBot QA Fake Provider Debug Chat";
+const DEFAULT_PROVIDER_NAME = "LangBot QA Fake OpenAI Provider";
+const QA_RESOURCE_DESCRIPTION = "Managed by LangBot skills QA automation for controlled fake-provider Debug Chat tests. Safe to delete when local QA fixtures are no longer needed.";
+const DEFAULT_MODEL_NAME = "gpt-4o-mini";
+const DEFAULT_REQUESTER = "openai-chat-completions";
+
+const caseId = "ensure-fake-provider-pipeline";
+
+await loadEnvFiles();
+const paths = evidencePaths(caseId);
+await ensureEvidence(paths);
+
+const writeEnv = process.argv.includes("--write-env");
+const frontendUrl = env.LANGBOT_FRONTEND_URL || "";
+const backendUrl = env.LANGBOT_BACKEND_URL || "";
+const envLocalPath = resolve("skills/.env.local");
+const repoRoot = resolve(env.LANGBOT_REPO || "..");
+const fakeStateDir = resolve(env.LANGBOT_FAKE_PROVIDER_STATE_DIR || resolve(repoRoot, ".qa/fake-provider"));
+const fakeStatePath = resolve(fakeStateDir, "state.json");
+const fakeStdoutPath = resolve(fakeStateDir, "fake-provider.stdout.log");
+const fakeStderrPath = resolve(fakeStateDir, "fake-provider.stderr.log");
+const pipelineName = env.LANGBOT_FAKE_PROVIDER_PIPELINE_NAME || DEFAULT_PIPELINE_NAME;
+const providerName = env.LANGBOT_FAKE_PROVIDER_NAME || DEFAULT_PROVIDER_NAME;
+const requester = env.LANGBOT_FAKE_PROVIDER_REQUESTER || DEFAULT_REQUESTER;
+const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || DEFAULT_MODEL_NAME;
+
+const result = {
+  source: "automation",
+  case_id: caseId,
+  run_id: paths.runId,
+  status: "fail",
+  reason: "",
+  frontend_url: frontendUrl,
+  backend_url: backendUrl,
+  fake_provider: {
+    url: "",
+    base_url: "",
+    pid: null,
+    reused: false,
+    config: {},
+    state_file: fakeStatePath,
+    stdout_log: fakeStdoutPath,
+    stderr_log: fakeStderrPath,
+  },
+  provider: {
+    uuid: "",
+    name: providerName,
+    requester,
+    created: false,
+    updated: false,
+  },
+  model: {
+    uuid: "",
+    name: modelName,
+    created: false,
+    updated: false,
+    test_status: "not_run",
+    test_reason: "",
+  },
+  pipeline_id: "",
+  pipeline_name: pipelineName,
+  pipeline_url: "",
+  created: false,
+  updated: false,
+  wrote_env: false,
+  evidence: {
+    console_log: paths.consoleLog,
+    network_log: paths.networkLog,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  },
+  evidence_collected: ["api_diagnostic", "network", "filesystem"],
+};
+
+try {
+  console.error(`[langbot-qa] configuring QA-owned fake-provider fixtures: provider=\"${providerName}\", pipeline=\"${pipelineName}\"`);
+  console.error("[langbot-qa] this setup may create or update local QA provider/model/pipeline resources on the selected backend.");
+  if (!backendUrl) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_BACKEND_URL is not configured.");
+  }
+  if (!frontendUrl) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_FRONTEND_URL is not configured.");
+  }
+
+  const fakeProvider = await ensureFakeProvider();
+  const setupConfig = await configureFakeProvider(fakeProvider.url, healthyFakeProviderConfig(), true);
+  result.fake_provider = {
+    ...result.fake_provider,
+    ...fakeProvider,
+    config: setupConfig.config || healthyFakeProviderConfig(),
+  };
+
+  const user = env.LANGBOT_E2E_LOGIN_USER || "";
+  const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
+  if (!user) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the fake provider pipeline.");
+  }
+
+  const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
+  const wizard = await skipWizard({ backendUrl, token: auth.token });
+  if (wizard.status !== "pass") {
+    result.status = "fail";
+    throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
+  }
+
+  const provider = await ensureProvider({
+    backendUrl,
+    token: auth.token,
+    name: providerName,
+    requester,
+    baseUrl: fakeProvider.base_url,
+  });
+  result.provider = provider;
+
+  const model = await ensureModel({
+    backendUrl,
+    token: auth.token,
+    providerUuid: provider.uuid,
+    name: modelName,
+  });
+  result.model = model;
+
+  const pipeline = await ensurePipeline({
+    backendUrl,
+    token: auth.token,
+    name: pipelineName,
+    modelUuid: model.uuid,
+  });
+  Object.assign(result, pipeline);
+  result.pipeline_url = `${frontendUrl.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.pipeline_id)}`;
+
+  const runConfig = await configureFakeProvider(fakeProvider.url, targetFakeProviderConfig(), true);
+  result.fake_provider.config = runConfig.config || targetFakeProviderConfig();
+
+  if (writeEnv) {
+    await upsertEnvLocal(envLocalPath, {
+      LANGBOT_E2E_LOGIN_USER: user,
+      LANGBOT_FAKE_PROVIDER_URL: fakeProvider.url,
+      LANGBOT_FAKE_PROVIDER_BASE_URL: fakeProvider.base_url,
+      LANGBOT_FAKE_PROVIDER_PID: fakeProvider.pid ? String(fakeProvider.pid) : "",
+      LANGBOT_FAKE_PROVIDER_PROVIDER_UUID: provider.uuid,
+      LANGBOT_FAKE_PROVIDER_MODEL_UUID: model.uuid,
+      LANGBOT_FAKE_PROVIDER_PIPELINE_URL: result.pipeline_url,
+      LANGBOT_FAKE_PROVIDER_PIPELINE_NAME: pipelineName,
+    });
+    result.wrote_env = true;
+  }
+
+  result.status = "pass";
+  result.reason = `Fake provider pipeline is configured with ${requester}/${modelName}.`;
+} catch (error) {
+  result.status = result.status === "env_issue" ? "env_issue" : "fail";
+  result.reason = result.reason || safeReason(error.message);
+} finally {
+  await writeResult(paths, result);
+  console.log(JSON.stringify(result, null, 2));
+}
+
+process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);
+
+async function ensureFakeProvider() {
+  const envUrl = normalizeProviderRootUrl(env.LANGBOT_FAKE_PROVIDER_URL || "");
+  if (envUrl && await fakeProviderHealthy(envUrl) && await fakeProviderConfigurable(envUrl)) {
+    return {
+      url: envUrl,
+      base_url: `${envUrl}/v1`,
+      pid: null,
+      reused: true,
+    };
+  }
+
+  const state = await readState(fakeStatePath);
+  const stateUrl = normalizeProviderRootUrl(state.url || "");
+  if (stateUrl && await fakeProviderHealthy(stateUrl)) {
+    if (await fakeProviderConfigurable(stateUrl)) {
+      return {
+        url: stateUrl,
+        base_url: state.base_url || `${stateUrl}/v1`,
+        pid: Number.isInteger(state.pid) ? state.pid : null,
+        reused: true,
+      };
+    }
+    if (Number.isInteger(state.pid)) await stopProcess(state.pid);
+  }
+
+  await mkdir(fakeStateDir, { recursive: true });
+  await writeFile(fakeStatePath, `${JSON.stringify({ status: "starting", started_at: new Date().toISOString() }, null, 2)}\n`, "utf8");
+  const stdout = await open(fakeStdoutPath, "a");
+  const stderr = await open(fakeStderrPath, "a");
+  const scriptPath = resolve("scripts/e2e/fake-openai-provider.mjs");
+  const host = env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
+  const port = env.LANGBOT_FAKE_PROVIDER_PORT || "0";
+  const child = spawn(process.execPath, [
+    scriptPath,
+    `--host=${host}`,
+    `--port=${port}`,
+    `--state-file=${fakeStatePath}`,
+  ], {
+    cwd: resolve("."),
+    detached: true,
+    env: {
+      ...env,
+      LANGBOT_FAKE_PROVIDER_MODEL_NAME: modelName,
+    },
+    stdio: ["ignore", stdout.fd, stderr.fd],
+  });
+  child.unref();
+  await stdout.close();
+  await stderr.close();
+
+  const started = await waitForFakeProviderState(fakeStatePath, child.pid, 10_000);
+  if (!started.url || !await fakeProviderHealthy(started.url) || !await fakeProviderConfigurable(started.url)) {
+    throw new Error(`Fake provider did not become healthy. See ${fakeStderrPath}`);
+  }
+
+  return {
+    url: started.url,
+    base_url: started.base_url || `${started.url}/v1`,
+    pid: child.pid ?? started.pid ?? null,
+    reused: false,
+  };
+}
+
+async function configureFakeProvider(rootUrl, config, resetRequestCount) {
+  const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
+    method: "POST",
+    headers: { "content-type": "application/json" },
+    body: JSON.stringify({
+      config,
+      reset_request_count: resetRequestCount,
+    }),
+    signal: AbortSignal.timeout(3000),
+  });
+  const json = await response.json().catch(() => ({}));
+  if (!response.ok || json.ok !== true) {
+    throw new Error(`Fake provider config failed with HTTP ${response.status}.`);
+  }
+  return json;
+}
+
+async function fakeProviderHealthy(rootUrl) {
+  try {
+    const response = await fetch(`${rootUrl.replace(/\/$/, "")}/healthz`, {
+      signal: AbortSignal.timeout(2000),
+    });
+    if (!response.ok) return false;
+    const json = await response.json().catch(() => ({}));
+    return json.ok === true;
+  } catch {
+    return false;
+  }
+}
+
+async function fakeProviderConfigurable(rootUrl) {
+  try {
+    const response = await fetch(`${rootUrl.replace(/\/$/, "")}/__qa/config`, {
+      signal: AbortSignal.timeout(2000),
+    });
+    if (!response.ok) return false;
+    const json = await response.json().catch(() => ({}));
+    return json.ok === true && json.config && typeof json.config === "object";
+  } catch {
+    return false;
+  }
+}
+
+async function stopProcess(pid) {
+  try {
+    process.kill(pid, "SIGTERM");
+  } catch {
+    return;
+  }
+  await sleep(500);
+}
+
+async function waitForFakeProviderState(path, expectedPid, timeoutMs) {
+  const startedAt = Date.now();
+  let lastState = {};
+  while (Date.now() - startedAt < timeoutMs) {
+    const state = await readState(path);
+    if (state.url && (!expectedPid || state.pid === expectedPid)) return state;
+    lastState = state;
+    await sleep(150);
+  }
+  return lastState;
+}
+
+async function readState(path) {
+  try {
+    return JSON.parse(await readFile(path, "utf8"));
+  } catch {
+    return {};
+  }
+}
+
+function normalizeProviderRootUrl(value) {
+  const trimmed = String(value || "").trim().replace(/\/$/, "");
+  return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
+}
+
+function healthyFakeProviderConfig() {
+  return {
+    response_text: "OK",
+    first_token_delay_ms: 25,
+    chunk_delay_ms: 10,
+    chunk_count: 0,
+    fault_status: 500,
+    fail_first_n: 0,
+    fail_every_n: 0,
+    fail_after_first_chunk: false,
+    dynamic_response: true,
+  };
+}
+
+function targetFakeProviderConfig() {
+  return {
+    response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
+    first_token_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
+    chunk_delay_ms: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
+    chunk_count: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
+    fault_status: httpFaultStatus(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
+    fail_first_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
+    fail_every_n: nonNegativeInteger(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
+    fail_after_first_chunk: envBool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
+    dynamic_response: envBool(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE, true),
+  };
+}
+
+async function skipWizard({ backendUrl, token }) {
+  const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
+    method: "POST",
+    token,
+    body: { status: "skipped" },
+  });
+  const ok = response.status < 400 && response.json.code === 0;
+  return {
+    status: ok ? "pass" : "fail",
+    http_status: response.status,
+    code: response.json.code ?? null,
+    reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
+  };
+}
+
+async function ensureProvider({ backendUrl, token, name, requester, baseUrl }) {
+  const list = await apiJson(backendUrl, "/api/v1/provider/providers", { token });
+  if (isApiFailure(list)) {
+    throw new Error(list.json.msg || "Failed to list providers.");
+  }
+  const providers = list.json.data?.providers || [];
+  const existing = providers.find((provider) => (
+    provider.name === name
+      || (provider.requester === requester && String(provider.base_url || "").replace(/\/$/, "") === baseUrl.replace(/\/$/, ""))
+  ));
+  const body = {
+    name,
+    requester,
+    base_url: baseUrl,
+    api_keys: [env.LANGBOT_FAKE_PROVIDER_API_KEY || "langbot-fake-provider-key"],
+  };
+
+  if (existing?.uuid) {
+    const update = await apiJson(backendUrl, `/api/v1/provider/providers/${encodeURIComponent(existing.uuid)}`, {
+      method: "PUT",
+      token,
+      body,
+    });
+    if (isApiFailure(update)) {
+      throw new Error(update.json.msg || "Failed to update fake provider.");
+    }
+    return {
+      uuid: existing.uuid,
+      name,
+      requester,
+      created: false,
+      updated: true,
+    };
+  }
+
+  const create = await apiJson(backendUrl, "/api/v1/provider/providers", {
+    method: "POST",
+    token,
+    body,
+  });
+  const uuid = create.json.data?.uuid || "";
+  if (isApiFailure(create) || !uuid) {
+    throw new Error(create.json.msg || "Failed to create fake provider.");
+  }
+  return {
+    uuid,
+    name,
+    requester,
+    created: true,
+    updated: false,
+  };
+}
+
+async function ensureModel({ backendUrl, token, providerUuid, name }) {
+  const list = await apiJson(backendUrl, `/api/v1/provider/models/llm?provider_uuid=${encodeURIComponent(providerUuid)}`, { token });
+  if (isApiFailure(list)) {
+    throw new Error(list.json.msg || "Failed to list fake provider models.");
+  }
+  const models = list.json.data?.models || [];
+  const existing = models.find((model) => model.name === name);
+  const body = {
+    name,
+    provider_uuid: providerUuid,
+    abilities: [],
+    context_length: positiveInteger(env.LANGBOT_FAKE_PROVIDER_CONTEXT_LENGTH, 8192),
+    extra_args: {},
+    prefered_ranking: 0,
+  };
+  let modelUuid = existing?.uuid || "";
+  let created = false;
+  let updated = false;
+
+  if (modelUuid) {
+    const update = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
+      method: "PUT",
+      token,
+      body,
+    });
+    if (isApiFailure(update)) {
+      throw new Error(update.json.msg || "Failed to update fake provider model.");
+    }
+    updated = true;
+  } else {
+    const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
+      method: "POST",
+      token,
+      body,
+    });
+    modelUuid = create.json.data?.uuid || "";
+    if (isApiFailure(create) || !modelUuid) {
+      throw new Error(create.json.msg || "Failed to create fake provider model.");
+    }
+    created = true;
+  }
+
+  const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
+    method: "POST",
+    token,
+    body: { extra_args: {} },
+  });
+  if (isApiFailure(test)) {
+    throw new Error(safeReason(test.json.msg || test.json.message || "Fake provider model test failed."));
+  }
+
+  return {
+    uuid: modelUuid,
+    name,
+    created,
+    updated,
+    test_status: "pass",
+    test_reason: "",
+  };
+}
+
+async function ensurePipeline({ backendUrl, token, name, modelUuid }) {
+  const list = await apiJson(backendUrl, "/api/v1/pipelines", { token });
+  if (isApiFailure(list)) {
+    throw new Error(list.json.msg || "Failed to list pipelines.");
+  }
+  const pipelines = list.json.data?.pipelines || [];
+  let pipeline = pipelines.find((item) => item.name === name) || null;
+  let created = false;
+
+  if (!pipeline) {
+    const create = await apiJson(backendUrl, "/api/v1/pipelines", {
+      method: "POST",
+      token,
+      body: {
+        name,
+        description: QA_RESOURCE_DESCRIPTION,
+        emoji: "QA",
+      },
+    });
+    const pipelineId = create.json.data?.uuid || "";
+    if (isApiFailure(create) || !pipelineId) {
+      throw new Error(create.json.msg || "Failed to create fake provider pipeline.");
+    }
+    created = true;
+    pipeline = { uuid: pipelineId };
+  }
+
+  const loaded = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, { token });
+  pipeline = loaded.json.data?.pipeline || null;
+  if (isApiFailure(loaded) || !pipeline?.uuid) {
+    throw new Error(loaded.json.msg || "Failed to load fake provider pipeline.");
+  }
+
+  const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
+  const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
+  const existingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
+    ? ai["local-agent"]
+    : {};
+  const localAgentConfig = {
+    timeout: 60,
+    prompt: [{ role: "system", content: "You are a deterministic QA assistant. Reply exactly as instructed." }],
+    "remove-think": false,
+    "knowledge-bases": [],
+    "box-session-id-template": "{launcher_type}_{launcher_id}",
+    "retrieval-top-k": 5,
+    "rerank-model": "",
+    "rerank-top-k": 5,
+    "max-tool-iterations": 20,
+    "tool-execution-mode": "parallel",
+    "max-tool-result-chars": 20000,
+    "context-history-fetch-limit": 20,
+    "context-window-tokens": 8192,
+    "context-reserve-tokens": 1024,
+    "context-keep-recent-tokens": 2048,
+    "context-summary-tokens": 1024,
+    ...existingLocalAgentConfig,
+    // Current backend truncation still reads this field directly.
+    "max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
+    model: {
+      primary: modelUuid,
+      fallbacks: [],
+    },
+  };
+  const updatedConfig = {
+    ...config,
+    ai: {
+      ...ai,
+      runner: {
+        ...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
+        id: RUNNER_ID,
+        runner: RUNNER_ID,
+        "expire-time": 0,
+      },
+      "local-agent": localAgentConfig,
+    },
+  };
+
+  const update = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.uuid)}`, {
+    method: "PUT",
+    token,
+    body: {
+      name,
+      description: QA_RESOURCE_DESCRIPTION,
+      emoji: "QA",
+      config: updatedConfig,
+    },
+  });
+  if (isApiFailure(update)) {
+    throw new Error(update.json.msg || "Failed to update fake provider pipeline.");
+  }
+
+  return {
+    pipeline_id: pipeline.uuid,
+    pipeline_name: name,
+    created,
+    updated: true,
+  };
+}
+
+function isApiFailure(response) {
+  return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
+}
+
+function positiveInteger(value, fallback) {
+  const parsed = Number(value);
+  return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
+}
+
+function nonNegativeInteger(value, fallback) {
+  const parsed = Number(value);
+  return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
+}
+
+function httpFaultStatus(value, fallback) {
+  const parsed = Number(value);
+  return Number.isInteger(parsed) && parsed >= 400 && parsed <= 599 ? parsed : fallback;
+}
+
+function envBool(value, fallback) {
+  if (value === undefined || value === "") return fallback;
+  if (/^(1|true|yes|on)$/i.test(String(value))) return true;
+  if (/^(0|false|no|off)$/i.test(String(value))) return false;
+  return fallback;
+}
+
+function sleep(ms) {
+  return new Promise((resolve) => setTimeout(resolve, ms));
+}
+
+function safeReason(value) {
+  return redact(String(value || "")).slice(0, 1000);
+}
+
+async function upsertEnvLocal(path, updates) {
+  await mkdir(dirname(path), { recursive: true });
+  let text = "";
+  try {
+    text = await readFile(path, "utf8");
+  } catch {
+    text = "";
+  }
+  const lines = text.split(/\r?\n/);
+  const seen = new Set();
+  const next = lines.map((line) => {
+    const trimmed = line.trim();
+    const equals = trimmed.indexOf("=");
+    if (equals <= 0 || trimmed.startsWith("#")) return line;
+    const key = trimmed.slice(0, equals).trim();
+    if (!(key in updates)) return line;
+    seen.add(key);
+    return `${key}=${updates[key]}`;
+  });
+  for (const [key, value] of Object.entries(updates)) {
+    if (!seen.has(key)) next.push(`${key}=${value}`);
+  }
+  await writeFile(path, `${next.filter((line, index) => line !== "" || index < next.length - 1).join("\n")}\n`, "utf8");
+}
@@ -10,6 +10,7 @@ import {
  ensureEvidence,
  evidencePaths,
  loadEnvFiles,
+  redact,
  resetAndAuthLocalUser,
  safeScreenshot,
  setBrowserToken,
@@ -17,9 +18,12 @@ import {
  writeResult,
 } from "./lib/langbot-e2e.mjs";

-const RUNNER_ID = "plugin:langbot/local-agent/default";
+const RUNNER_ID = "local-agent";
+const SPACE_PROVIDER_UUID = "00000000-0000-0000-0000-000000000000";
 const DEFAULT_PIPELINE_NAME = "Agent QA Local Agent Debug Chat";
 const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
+const DEFAULT_MODEL_TEST_LIMIT = 8;
+const DEFAULT_MODEL_FALLBACK_COUNT = 3;
 const caseId = "ensure-local-agent-pipeline";

 await loadEnvFiles();
@@ -45,11 +49,18 @@ const result = {
  pipeline_url: "",
  runner_id: RUNNER_ID,
  selected_model_id: "",
+  selected_model_name: "",
+  fallback_model_ids: [],
  model_count: 0,
+  space_model_count: 0,
+  scanned_space_model_count: 0,
+  tested_model_count: 0,
+  model_tests: [],
  created: false,
  updated: false,
  wrote_env: false,
  auth: null,
+  wizard: null,
  browser_token_check: null,
  page_signal: "",
  evidence: {
@@ -71,6 +82,7 @@ try {
  const user = env.LANGBOT_E2E_LOGIN_USER || "";
  const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
  if (!user) {
+    result.status = "env_issue";
    throw new Error("LANGBOT_E2E_LOGIN_USER is required so this setup can create/update the pipeline via backend API.");
  }

@@ -81,6 +93,13 @@ try {
    backend_token_check: auth.check,
  };

+  const wizard = await skipWizard({ backendUrl, token: auth.token });
+  result.wizard = wizard;
+  if (wizard.status !== "pass") {
+    result.status = "fail";
+    throw new Error(wizard.reason || "Failed to mark the local QA wizard as skipped.");
+  }
+
  const prepared = await ensureLocalAgentPipeline({
    backendUrl,
    token: auth.token,
@@ -99,6 +118,10 @@ try {
      LANGBOT_PIPELINE_NAME: result.pipeline_name || pipelineName,
      LANGBOT_LOCAL_AGENT_PIPELINE_URL: result.pipeline_url,
      LANGBOT_LOCAL_AGENT_PIPELINE_NAME: result.pipeline_name || pipelineName,
+      ...(result.selected_model_id ? {
+        LANGBOT_LOCAL_AGENT_MODEL_UUID: result.selected_model_id,
+        LANGBOT_E2E_MODEL_UUID: result.selected_model_id,
+      } : {}),
    });
    result.wrote_env = true;
  }
@@ -127,6 +150,21 @@ try {

 process.exit(result.status === "pass" ? 0 : result.status === "env_issue" ? 2 : 1);

+async function skipWizard({ backendUrl, token }) {
+  const response = await apiJson(backendUrl, "/api/v1/system/wizard/completed", {
+    method: "POST",
+    token,
+    body: { status: "skipped" },
+  });
+  const ok = response.status < 400 && response.json.code === 0;
+  return {
+    status: ok ? "pass" : "fail",
+    http_status: response.status,
+    code: response.json.code ?? null,
+    reason: ok ? "Wizard marked skipped for local QA." : response.json.msg || "Wizard status update failed.",
+  };
+}
+
 async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runnerId }) {
  const [pipelineList, modelList] = await Promise.all([
    apiJson(backendUrl, "/api/v1/pipelines", { token }),
@@ -149,7 +187,19 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
  }

  const models = modelList.json.data?.models || [];
-  const selectedModel = models.find((model) => model.uuid) || null;
+  const skippedModelIds = new Set(
+    String(env.LANGBOT_E2E_SKIP_MODEL_UUIDS || "")
+      .split(",")
+      .map((item) => item.trim())
+      .filter(Boolean),
+  );
+  const skippedModelNames = new Set(
+    String(env.LANGBOT_E2E_SKIP_MODEL_NAMES || "")
+      .split(",")
+      .map((item) => item.trim())
+      .filter(Boolean),
+  );
+  const spaceModels = models.filter((model) => isSpaceModel(model) && !skippedModelIds.has(model.uuid));
  const pipelines = pipelineList.json.data?.pipelines || [];
  let pipeline = pipelines.find((item) => item.name === pipelineName) || null;
  let created = false;
@@ -170,6 +220,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
        reason: createdResponse.json.msg || "Failed to create pipeline.",
        create_status: createdResponse.status,
        model_count: models.length,
+        space_model_count: spaceModels.length,
      };
    }
    const pipelineId = createdResponse.json.data?.uuid || "";
@@ -183,6 +234,7 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
      status: "fail",
      reason: "Pipeline was not created or resolved.",
      model_count: models.length,
+      space_model_count: spaceModels.length,
    };
  }

@@ -194,27 +246,37 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
      get_status: loaded.status,
      pipeline_id: pipeline.uuid,
      model_count: models.length,
+      space_model_count: spaceModels.length,
    };
  }
  pipeline = loaded.json.data.pipeline;

  const config = pipeline.config && typeof pipeline.config === "object" ? pipeline.config : {};
  const ai = config.ai && typeof config.ai === "object" ? config.ai : {};
-  const runnerConfig = ai.runner_config && typeof ai.runner_config === "object" ? ai.runner_config : {};
-  const rawExistingLocalAgentConfig = runnerConfig[runnerId] && typeof runnerConfig[runnerId] === "object"
-    ? runnerConfig[runnerId]
+  const rawExistingLocalAgentConfig = ai["local-agent"] && typeof ai["local-agent"] === "object"
+    ? ai["local-agent"]
    : {};
  const existingLocalAgentConfig = rawExistingLocalAgentConfig;
  const existingModel = existingLocalAgentConfig.model && typeof existingLocalAgentConfig.model === "object"
    ? existingLocalAgentConfig.model
    : {};
  const requestedModelId = env.LANGBOT_LOCAL_AGENT_MODEL_UUID || env.LANGBOT_E2E_MODEL_UUID || "";
-  const selectedModelId = requestedModelId || existingModel.primary || selectedModel?.uuid || "";
+  const selected = await selectWorkingSpaceModel({
+    backendUrl,
+    token,
+    models,
+    skippedModelIds,
+    skippedModelNames,
+    requestedModelId,
+    existingModelId: existingModel.primary || "",
+  });
+  const selectedModelId = selected.selected_model_id || "";
  const localAgentConfig = {
    timeout: 300,
    prompt: [{ role: "system", content: "You are a helpful assistant." }],
    "remove-think": false,
    "knowledge-bases": [],
+    "box-session-id-template": "{launcher_type}_{launcher_id}",
    "retrieval-top-k": 5,
    "rerank-model": "",
    "rerank-top-k": 5,
@@ -227,9 +289,11 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
    "context-keep-recent-tokens": 20000,
    "context-summary-tokens": 8000,
    ...existingLocalAgentConfig,
+    // Current backend truncation still reads this field directly.
+    "max-round": positiveInteger(existingLocalAgentConfig["max-round"], 10),
    model: {
      primary: selectedModelId,
-      fallbacks: requestedModelId ? [] : Array.isArray(existingModel.fallbacks) ? existingModel.fallbacks : [],
+      fallbacks: selected.fallback_model_ids || [],
    },
  };
  const updatedConfig = {
@@ -239,12 +303,10 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
      runner: {
        ...(ai.runner && typeof ai.runner === "object" ? ai.runner : {}),
        id: runnerId,
+        runner: runnerId,
        "expire-time": 0,
      },
-      runner_config: {
-        ...runnerConfig,
-        [runnerId]: localAgentConfig,
-      },
+      "local-agent": localAgentConfig,
    },
  };

@@ -265,19 +327,31 @@ async function ensureLocalAgentPipeline({ backendUrl, token, pipelineName, runne
      update_status: updateResponse.status,
      pipeline_id: pipeline.uuid,
      model_count: models.length,
+      space_model_count: spaceModels.length,
+      scanned_space_model_count: selected.scanned_space_model_count,
+      tested_model_count: selected.tested_model_count,
+      model_tests: selected.model_tests,
      selected_model_id: selectedModelId,
+      selected_model_name: selected.selected_model_name,
+      fallback_model_ids: selected.fallback_model_ids,
    };
  }

  return {
    status: selectedModelId ? "pass" : "env_issue",
    reason: selectedModelId
-      ? "Local-agent pipeline is configured for Debug Chat."
-      : "Pipeline was created but no LLM model is configured in this LangBot instance.",
+      ? `Local-agent pipeline is configured for Debug Chat with Space model ${selected.selected_model_name || selectedModelId} and ${selected.fallback_model_ids.length} fallback(s).`
+      : selected.reason || "No working Space LLM model is configured in this LangBot instance.",
    pipeline_id: pipeline.uuid,
-    pipeline_name: pipeline.name,
+    pipeline_name: pipelineName,
    model_count: models.length,
+    space_model_count: spaceModels.length,
+    scanned_space_model_count: selected.scanned_space_model_count,
+    tested_model_count: selected.tested_model_count,
+    model_tests: selected.model_tests,
    selected_model_id: selectedModelId,
+    selected_model_name: selected.selected_model_name,
+    fallback_model_ids: selected.fallback_model_ids,
    created,
    updated: true,
  };
@@ -287,6 +361,229 @@ function isApiFailure(response) {
  return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
 }

+function isSpaceModel(model) {
+  const provider = model?.provider && typeof model.provider === "object" ? model.provider : {};
+  return model?.provider_uuid === SPACE_PROVIDER_UUID
+    || provider.uuid === SPACE_PROVIDER_UUID
+    || provider.requester === "space-chat-completions"
+    || provider.name === "LangBot Models";
+}
+
+async function selectWorkingSpaceModel({
+  backendUrl,
+  token,
+  models,
+  skippedModelIds,
+  skippedModelNames,
+  requestedModelId,
+  existingModelId,
+}) {
+  const modelTests = [];
+  const testLimit = positiveInteger(env.LANGBOT_E2E_MODEL_TEST_LIMIT, DEFAULT_MODEL_TEST_LIMIT);
+  const fallbackCount = positiveInteger(env.LANGBOT_E2E_MODEL_FALLBACK_COUNT, DEFAULT_MODEL_FALLBACK_COUNT);
+  const workingModels = [];
+  const spaceModels = rankModels(models.filter((model) => (
+    model.uuid
+      && isSpaceModel(model)
+      && !skippedModelIds.has(model.uuid)
+      && !skippedModelNames.has(model.name)
+  )));
+  const requestedModel = requestedModelId
+    ? spaceModels.find((model) => model.uuid === requestedModelId) || null
+    : null;
+  const existingModel = existingModelId
+    ? spaceModels.find((model) => model.uuid === existingModelId) || null
+    : null;
+  const candidates = uniqueCandidates([
+    ...(requestedModel ? [existingCandidate(requestedModel, "requested")] : []),
+    ...(existingModel ? [existingCandidate(existingModel, "existing-pipeline")] : []),
+    ...spaceModels.map((model) => existingCandidate(model, "configured-space")),
+  ]);
+
+  let scanResult = { status: "skipped", models: [], reason: "" };
+  if (env.LANGBOT_E2E_SCAN_SPACE_MODELS !== "false") {
+    scanResult = await scanSpaceModels({ backendUrl, token });
+    if (scanResult.status === "pass") {
+      const knownNames = new Set(spaceModels.map((model) => model.name));
+      candidates.push(...scanResult.models
+        .filter((model) => model.name && !knownNames.has(model.name) && !skippedModelNames.has(model.name))
+        .map((model) => scannedCandidate(model)));
+    }
+  }
+
+  const unique = uniqueCandidates(candidates);
+  for (const candidate of unique.slice(0, testLimit)) {
+    const test = await ensureAndTestModel({ backendUrl, token, candidate });
+    modelTests.push(test);
+    if (test.status === "pass" && test.model_uuid) {
+      workingModels.push(test);
+      if (workingModels.length >= fallbackCount + 1) break;
+    }
+  }
+
+  if (workingModels.length > 0) {
+    const [primary, ...fallbacks] = workingModels;
+    return {
+      status: "pass",
+      reason: "",
+      selected_model_id: primary.model_uuid,
+      selected_model_name: primary.model_name,
+      fallback_model_ids: fallbacks.map((model) => model.model_uuid),
+      scanned_space_model_count: scanResult.models.length,
+      tested_model_count: modelTests.length,
+      model_tests: modelTests,
+    };
+  }
+
+  const baseReason = unique.length === 0
+    ? scanResult.reason || "No Space LLM model candidates are available."
+    : `No working Space LLM model found after testing ${modelTests.length} candidate(s).`;
+  return {
+    status: "env_issue",
+    reason: requestedModelId && !requestedModel
+      ? `Requested Space LLM model ${requestedModelId} is missing or skipped; ${baseReason}`
+      : baseReason,
+    selected_model_id: "",
+    selected_model_name: "",
+    fallback_model_ids: [],
+    scanned_space_model_count: scanResult.models.length,
+    tested_model_count: modelTests.length,
+    model_tests: modelTests,
+  };
+}
+
+async function scanSpaceModels({ backendUrl, token }) {
+  const response = await apiJson(
+    backendUrl,
+    `/api/v1/provider/providers/${encodeURIComponent(SPACE_PROVIDER_UUID)}/scan-models?type=llm`,
+    { token },
+  );
+  if (isApiFailure(response)) {
+    return {
+      status: "env_issue",
+      models: [],
+      reason: safeReason(response.json.msg || response.json.message || "Failed to scan Space LLM models."),
+    };
+  }
+  return {
+    status: "pass",
+    models: response.json.data?.models || [],
+    reason: "",
+  };
+}
+
+async function ensureAndTestModel({ backendUrl, token, candidate }) {
+  let modelUuid = candidate.uuid || "";
+  let created = false;
+  if (!modelUuid) {
+    const create = await apiJson(backendUrl, "/api/v1/provider/models/llm", {
+      method: "POST",
+      token,
+      body: {
+        name: candidate.name,
+        provider_uuid: SPACE_PROVIDER_UUID,
+        abilities: candidate.abilities || [],
+        context_length: candidate.context_length ?? null,
+        extra_args: {},
+        prefered_ranking: positiveInteger(candidate.prefered_ranking, 0),
+      },
+    });
+    modelUuid = create.json.data?.uuid || "";
+    if (isApiFailure(create) || !modelUuid) {
+      return modelTestResult(candidate, {
+        status: "fail",
+        reason: safeReason(create.json.msg || "Failed to create scanned Space model."),
+        http_status: create.status,
+      });
+    }
+    created = true;
+  }
+
+  const test = await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}/test`, {
+    method: "POST",
+    token,
+    body: { extra_args: {} },
+  });
+  const passed = !isApiFailure(test);
+  if (!passed && created) {
+    await apiJson(backendUrl, `/api/v1/provider/models/llm/${encodeURIComponent(modelUuid)}`, {
+      method: "DELETE",
+      token,
+    }).catch(() => {});
+  }
+  return modelTestResult(candidate, {
+    status: passed ? "pass" : "fail",
+    reason: passed ? "" : safeReason(test.json.msg || test.json.message || "Space model test failed."),
+    http_status: test.status,
+    model_uuid: modelUuid,
+    created,
+  });
+}
+
+function modelTestResult(candidate, details) {
+  return {
+    source: candidate.source,
+    model_uuid: details.model_uuid || candidate.uuid || "",
+    model_name: candidate.name,
+    status: details.status,
+    reason: details.reason || "",
+    http_status: details.http_status ?? null,
+    created: Boolean(details.created),
+  };
+}
+
+function existingCandidate(model, source) {
+  return {
+    source,
+    uuid: model.uuid,
+    name: model.name,
+    abilities: model.abilities || [],
+    context_length: model.context_length,
+    prefered_ranking: model.prefered_ranking,
+  };
+}
+
+function scannedCandidate(model) {
+  return {
+    source: "scanned-space",
+    uuid: "",
+    name: model.name || model.id,
+    abilities: model.abilities || [],
+    context_length: model.context_length,
+    prefered_ranking: model.prefered_ranking,
+  };
+}
+
+function uniqueCandidates(candidates) {
+  const seen = new Set();
+  const result = [];
+  for (const candidate of candidates) {
+    const key = candidate.uuid ? `uuid:${candidate.uuid}` : `name:${candidate.name}`;
+    if (!candidate.name || seen.has(key)) continue;
+    seen.add(key);
+    result.push(candidate);
+  }
+  return result;
+}
+
+function rankModels(models) {
+  return [...models].sort((left, right) => {
+    const leftRank = Number.isFinite(Number(left.prefered_ranking)) ? Number(left.prefered_ranking) : 9999;
+    const rightRank = Number.isFinite(Number(right.prefered_ranking)) ? Number(right.prefered_ranking) : 9999;
+    if (leftRank !== rightRank) return leftRank - rightRank;
+    return String(left.name || "").localeCompare(String(right.name || ""));
+  });
+}
+
+function positiveInteger(value, fallback) {
+  const parsed = Number(value);
+  return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
+}
+
+function safeReason(value) {
+  return redact(String(value || "")).slice(0, 1000);
+}
+
 async function upsertEnvLocal(path, updates) {
  let text = "";
  try {
@@ -0,0 +1,496 @@
+#!/usr/bin/env node
+
+import { createServer } from "node:http";
+import { mkdir, writeFile } from "node:fs/promises";
+import { dirname, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+const args = parseArgs(process.argv.slice(2));
+const host = args.host || env.LANGBOT_FAKE_PROVIDER_HOST || "127.0.0.1";
+const port = integer(args.port ?? env.LANGBOT_FAKE_PROVIDER_PORT, 0);
+const stateFile = args["state-file"] || env.LANGBOT_FAKE_PROVIDER_STATE_FILE || "";
+const modelName = env.LANGBOT_FAKE_PROVIDER_MODEL_NAME || "gpt-4o-mini";
+const config = {
+  response_text: env.LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT || "OK",
+  first_token_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS, 25),
+  chunk_delay_ms: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS, 10),
+  chunk_count: integer(env.LANGBOT_FAKE_PROVIDER_CHUNK_COUNT, 0),
+  fault_status: integer(env.LANGBOT_FAKE_PROVIDER_FAULT_STATUS, 500),
+  fail_first_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N, 0),
+  fail_every_n: integer(env.LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N, 0),
+  fail_after_first_chunk: bool(env.LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK, false),
+  dynamic_response: !/^(0|false|no|off)$/i.test(env.LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE || ""),
+  request_log_limit: integer(env.LANGBOT_FAKE_PROVIDER_REQUEST_LOG_LIMIT, 500),
+};
+
+let requestCount = 0;
+const recentRequests = [];
+
+const server = createServer(async (request, response) => {
+  const startedAt = Date.now();
+  const startedPerf = performance.now();
+  let requestRecord = null;
+  const url = new URL(request.url || "/", `http://${request.headers.host || `${host}:${port}`}`);
+  try {
+    if (request.method === "GET" && url.pathname === "/healthz") {
+      sendJson(response, 200, {
+        ok: true,
+        model: modelName,
+        config,
+        request_count: requestCount,
+        recent_request_count: recentRequests.length,
+      });
+      return;
+    }
+
+    if (request.method === "GET" && url.pathname === "/__qa/config") {
+      sendJson(response, 200, {
+        ok: true,
+        model: modelName,
+        config,
+        request_count: requestCount,
+        recent_requests: recentRequests,
+      });
+      return;
+    }
+
+    if (request.method === "POST" && url.pathname === "/__qa/config") {
+      const body = await readJson(request);
+      applyConfig(body.config && typeof body.config === "object" ? body.config : body);
+      if (body.reset_request_count !== false) resetRequestState();
+      sendJson(response, 200, {
+        ok: true,
+        model: modelName,
+        config,
+        request_count: requestCount,
+      });
+      return;
+    }
+
+    if (request.method === "POST" && url.pathname === "/__qa/reset") {
+      resetRequestState();
+      sendJson(response, 200, {
+        ok: true,
+        model: modelName,
+        config,
+        request_count: requestCount,
+      });
+      return;
+    }
+
+    if (request.method === "GET" && ["/models", "/v1/models"].includes(url.pathname)) {
+      sendJson(response, 200, {
+        object: "list",
+        data: [
+          {
+            id: modelName,
+            object: "model",
+            created: 1,
+            owned_by: "langbot-qa",
+            type: "llm",
+          },
+        ],
+      });
+      return;
+    }
+
+    if (request.method === "POST" && ["/chat/completions", "/v1/chat/completions"].includes(url.pathname)) {
+      requestCount += 1;
+      const body = await readJson(request);
+      const requestId = `chatcmpl-langbot-fake-${requestCount}`;
+      const shouldFail = requestCount <= config.fail_first_n
+        || (config.fail_every_n > 0 && requestCount % config.fail_every_n === 0);
+      const replyText = responseTextForBody(body);
+      requestRecord = recordRequest({
+        id: requestId,
+        request_number: requestCount,
+        path: url.pathname,
+        stream: Boolean(body.stream),
+        model: body.model || "",
+        message_count: Array.isArray(body.messages) ? body.messages.length : 0,
+        should_fail: shouldFail,
+        status: "running",
+        http_status: null,
+        expected_text: replyText,
+        response_text_preview: previewText(replyText),
+        started_at: new Date(startedAt).toISOString(),
+        started_epoch_ms: startedAt,
+        configured_first_token_delay_ms: config.first_token_delay_ms,
+        configured_chunk_delay_ms: config.chunk_delay_ms,
+        configured_chunk_count: config.chunk_count,
+      });
+
+      if (shouldFail) {
+        await sleep(config.first_token_delay_ms);
+        sendJson(response, config.fault_status, {
+          error: {
+            message: `LangBot fake provider injected HTTP ${config.fault_status}`,
+            type: "fake_provider_fault",
+            code: "fake_provider_fault",
+          },
+        });
+        finishRequestRecord(requestRecord, startedPerf, {
+          status: "http_fault",
+          http_status: config.fault_status,
+        });
+        return;
+      }
+
+      if (body.stream) {
+        await streamCompletion(response, {
+          requestId,
+          model: body.model || modelName,
+          content: replyText,
+          failAfterFirstChunk: config.fail_after_first_chunk,
+          requestRecord,
+          startedPerf,
+        });
+      } else {
+        await sleep(config.first_token_delay_ms + config.chunk_delay_ms);
+        sendJson(response, 200, completionPayload({
+          requestId,
+          model: body.model || modelName,
+          content: replyText,
+        }));
+        markRequestTiming(requestRecord, "first_chunk", startedPerf);
+        markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
+        requestRecord.content_chunk_count = 1;
+        finishRequestRecord(requestRecord, startedPerf, {
+          status: "ok",
+          http_status: 200,
+        });
+      }
+      return;
+    }
+
+    sendJson(response, 404, {
+      error: {
+        message: `No fake provider route for ${request.method} ${url.pathname}`,
+        type: "not_found",
+      },
+    });
+  } catch (error) {
+    if (requestRecord) {
+      finishRequestRecord(requestRecord, startedPerf, {
+        status: "fake_provider_error",
+        http_status: 500,
+        error: error instanceof Error ? error.message : String(error),
+      });
+    }
+    sendJson(response, 500, {
+      error: {
+        message: error instanceof Error ? error.message : String(error),
+        type: "fake_provider_error",
+      },
+    });
+  } finally {
+    const durationMs = Date.now() - startedAt;
+    if (url.pathname !== "/healthz") {
+      console.log(JSON.stringify({
+        at: new Date().toISOString(),
+        method: request.method,
+        path: url.pathname,
+        duration_ms: durationMs,
+      }));
+    }
+  }
+});
+
+server.listen(port, host, async () => {
+  const address = server.address();
+  const selectedPort = typeof address === "object" && address ? address.port : port;
+  const url = `http://${host}:${selectedPort}`;
+  const state = {
+    status: "ready",
+    pid: process.pid,
+    url,
+    base_url: `${url}/v1`,
+    model: modelName,
+    started_at: new Date().toISOString(),
+  };
+  if (stateFile) {
+    const path = resolve(stateFile);
+    await mkdir(dirname(path), { recursive: true });
+    await writeFile(path, `${JSON.stringify(state, null, 2)}\n`, "utf8");
+  }
+  console.log(JSON.stringify(state));
+});
+
+server.on("error", (error) => {
+  console.error(JSON.stringify({
+    status: "error",
+    reason: error instanceof Error ? error.message : String(error),
+  }));
+  exit(1);
+});
+
+process.on("SIGTERM", () => {
+  server.close(() => exit(0));
+});
+
+function parseArgs(argv) {
+  const result = {};
+  for (const item of argv) {
+    const match = item.match(/^--([^=]+)(?:=(.*))?$/);
+    if (!match) continue;
+    result[match[1]] = match[2] ?? "1";
+  }
+  return result;
+}
+
+function integer(value, fallback) {
+  const parsed = Number.parseInt(String(value ?? ""), 10);
+  return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
+}
+
+function bool(value, fallback) {
+  if (value === undefined || value === "") return fallback;
+  if (/^(1|true|yes|on)$/i.test(String(value))) return true;
+  if (/^(0|false|no|off)$/i.test(String(value))) return false;
+  return fallback;
+}
+
+function sleep(ms) {
+  return new Promise((resolve) => setTimeout(resolve, Math.max(0, ms)));
+}
+
+async function readJson(request) {
+  let text = "";
+  for await (const chunk of request) text += chunk.toString();
+  if (!text) return {};
+  return JSON.parse(text);
+}
+
+function sendJson(response, status, payload) {
+  const text = `${JSON.stringify(payload)}\n`;
+  response.writeHead(status, {
+    "content-type": "application/json",
+    "content-length": Buffer.byteLength(text),
+  });
+  response.end(text);
+}
+
+function completionPayload({ requestId, model, content }) {
+  const completionTokens = tokenEstimate(content);
+  return {
+    id: requestId,
+    object: "chat.completion",
+    created: Math.floor(Date.now() / 1000),
+    model,
+    choices: [
+      {
+        index: 0,
+        message: {
+          role: "assistant",
+          content,
+        },
+        finish_reason: "stop",
+      },
+    ],
+    usage: {
+      prompt_tokens: 8,
+      completion_tokens: completionTokens,
+      total_tokens: 8 + completionTokens,
+    },
+  };
+}
+
+async function streamCompletion(response, {
+  requestId,
+  model,
+  content,
+  failAfterFirstChunk: failMidStream,
+  requestRecord,
+  startedPerf,
+}) {
+  response.writeHead(200, {
+    "content-type": "text/event-stream; charset=utf-8",
+    "cache-control": "no-cache",
+    "connection": "keep-alive",
+  });
+
+  await sleep(config.first_token_delay_ms);
+  markRequestTiming(requestRecord, "first_chunk", startedPerf);
+  writeSse(response, {
+    id: requestId,
+    object: "chat.completion.chunk",
+    created: Math.floor(Date.now() / 1000),
+    model,
+    choices: [{ index: 0, delta: { role: "assistant" }, finish_reason: null }],
+  });
+
+  const chunks = splitContent(content);
+  for (let index = 0; index < chunks.length; index += 1) {
+    await sleep(config.chunk_delay_ms);
+    if (index === 0) markRequestTiming(requestRecord, "first_content_chunk", startedPerf);
+    requestRecord.content_chunk_count = (requestRecord.content_chunk_count || 0) + 1;
+    writeSse(response, {
+      id: requestId,
+      object: "chat.completion.chunk",
+      created: Math.floor(Date.now() / 1000),
+      model,
+      choices: [{ index: 0, delta: { content: chunks[index] }, finish_reason: null }],
+    });
+    if (failMidStream && index === 0) {
+      finishRequestRecord(requestRecord, startedPerf, {
+        status: "mid_stream_disconnect",
+        http_status: 200,
+      });
+      response.destroy(new Error("LangBot fake provider injected mid-stream disconnect"));
+      return;
+    }
+  }
+
+  await sleep(config.chunk_delay_ms);
+  const completionTokens = tokenEstimate(content);
+  writeSse(response, {
+    id: requestId,
+    object: "chat.completion.chunk",
+    created: Math.floor(Date.now() / 1000),
+    model,
+    choices: [{ index: 0, delta: {}, finish_reason: "stop" }],
+    usage: {
+      prompt_tokens: 8,
+      completion_tokens: completionTokens,
+      total_tokens: 8 + completionTokens,
+    },
+  });
+  response.write("data: [DONE]\n\n");
+  response.end();
+  finishRequestRecord(requestRecord, startedPerf, {
+    status: "ok",
+    http_status: 200,
+  });
+}
+
+function writeSse(response, payload) {
+  response.write(`data: ${JSON.stringify(payload)}\n\n`);
+}
+
+function splitContent(content) {
+  const text = String(content);
+  const requested = config.chunk_count;
+  if (requested <= 1 || text.length <= 1) return [text];
+  const chunkSize = Math.max(1, Math.ceil(text.length / requested));
+  const chunks = [];
+  for (let index = 0; index < text.length; index += chunkSize) {
+    chunks.push(text.slice(index, index + chunkSize));
+  }
+  return chunks;
+}
+
+function tokenEstimate(content) {
+  return Math.max(1, Math.ceil(String(content || "").length / 4));
+}
+
+function responseTextForBody(body) {
+  if (!config.dynamic_response) {
+    return config.response_text;
+  }
+  const messages = Array.isArray(body.messages) ? body.messages : [];
+  const lastUser = [...messages].reverse().find((message) => message?.role === "user");
+  const text = flattenContent(lastUser?.content || "");
+  const quoted = text.match(/["'“”](.{1,80}?)["'“”]/);
+  if (quoted?.[1]) return quoted[1].trim();
+  const exact = text.match(/(?:reply|回复|输出|return)\s+(?:exactly\s+)?([A-Za-z0-9_.:@-]{1,80})/i);
+  if (exact?.[1]) return exact[1].trim().replace(/[。.!?]+$/, "");
+  const only = text.match(/只回复\s*([A-Za-z0-9_.:@-]{1,80})/);
+  if (only?.[1]) return only[1].trim().replace(/[。.!?]+$/, "");
+  return config.response_text;
+}
+
+function flattenContent(content) {
+  if (typeof content === "string") return content;
+  if (Array.isArray(content)) {
+    return content
+      .map((item) => {
+        if (typeof item === "string") return item;
+        if (item && typeof item === "object") return item.text || "";
+        return "";
+      })
+      .join("\n");
+  }
+  return "";
+}
+
+function recordRequest(entry) {
+  const item = {
+    ...entry,
+    at: new Date().toISOString(),
+    finished_at: null,
+    finished_epoch_ms: null,
+    duration_ms: null,
+    first_chunk_at: null,
+    first_chunk_epoch_ms: null,
+    first_chunk_ms: null,
+    first_content_chunk_at: null,
+    first_content_chunk_epoch_ms: null,
+    first_content_chunk_ms: null,
+    content_chunk_count: 0,
+  };
+  recentRequests.push(item);
+  while (recentRequests.length > config.request_log_limit) recentRequests.shift();
+  return item;
+}
+
+function markRequestTiming(entry, key, startedPerf) {
+  if (!entry || entry[`${key}_at`]) return;
+  const now = Date.now();
+  entry[`${key}_at`] = new Date(now).toISOString();
+  entry[`${key}_epoch_ms`] = now;
+  entry[`${key}_ms`] = rounded(performance.now() - startedPerf);
+}
+
+function finishRequestRecord(entry, startedPerf, updates = {}) {
+  if (!entry || entry.finished_at) return;
+  const now = Date.now();
+  Object.assign(entry, updates);
+  entry.finished_at = new Date(now).toISOString();
+  entry.finished_epoch_ms = now;
+  entry.duration_ms = rounded(performance.now() - startedPerf);
+}
+
+function rounded(value) {
+  return Number(value.toFixed(3));
+}
+
+function previewText(value) {
+  return String(value || "").slice(0, 120);
+}
+
+function resetRequestState() {
+  requestCount = 0;
+  recentRequests.length = 0;
+}
+
+function applyConfig(updates) {
+  if (!updates || typeof updates !== "object") return;
+  assignString(updates, "response_text");
+  assignNonNegativeInteger(updates, "first_token_delay_ms");
+  assignNonNegativeInteger(updates, "chunk_delay_ms");
+  assignNonNegativeInteger(updates, "chunk_count");
+  assignNonNegativeInteger(updates, "fail_first_n");
+  assignNonNegativeInteger(updates, "fail_every_n");
+  assignNonNegativeInteger(updates, "request_log_limit");
+  if (updates.fault_status !== undefined) {
+    const parsed = Number.parseInt(String(updates.fault_status), 10);
+    if (Number.isInteger(parsed) && parsed >= 400 && parsed <= 599) config.fault_status = parsed;
+  }
+  assignBoolean(updates, "fail_after_first_chunk");
+  assignBoolean(updates, "dynamic_response");
+}
+
+function assignString(updates, key) {
+  if (updates[key] !== undefined) config[key] = String(updates[key]);
+}
+
+function assignNonNegativeInteger(updates, key) {
+  if (updates[key] === undefined) return;
+  const parsed = Number.parseInt(String(updates[key]), 10);
+  if (Number.isInteger(parsed) && parsed >= 0) config[key] = parsed;
+}
+
+function assignBoolean(updates, key) {
+  if (updates[key] === undefined) return;
+  config[key] = bool(updates[key], config[key]);
+}
@@ -72,6 +72,7 @@ export async function writeResult(paths, result) {
 }

 export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"]) {
+  const processEnvKeys = new Set(Object.keys(env));
  for (const path of paths) {
    let text = "";
    try {
@@ -86,7 +87,7 @@ export async function loadEnvFiles(paths = ["skills/.env", "skills/.env.local"])
      if (equals <= 0) continue;
      const key = trimmed.slice(0, equals).trim();
      const value = trimmed.slice(equals + 1).trim().replace(/^["']|["']$/g, "");
-      if (!(key in env)) env[key] = value;
+      if (!processEnvKeys.has(key)) env[key] = value;
    }
  }
 }
@@ -54,6 +54,7 @@ const debugChatSessionType = env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person"
 const pipelineConfigDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-diagnostic.json");
 const debugChatResetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
 const pipelineConfigRestoreDiagnosticPath = resolve(paths.evidenceDir, "pipeline-config-restore-diagnostic.json");
+const metricsPath = resolve(paths.evidenceDir, "metrics.json");
 const startedAt = new Date();

 let browser;
@@ -80,10 +81,11 @@ let result = {
    console_log: paths.consoleLog,
    network_log: paths.networkLog,
    screenshot: paths.screenshot,
+    metrics_json: metricsPath,
    automation_result_json: paths.automationResultJson,
    result_json: paths.resultJson,
  },
-  evidence_collected: ["ui", "screenshot", "console", "network"],
+  evidence_collected: ["ui", "screenshot", "console", "network", "metrics"],
 };

 function boolFromEnv(value, defaultValue) {
@@ -103,6 +105,29 @@ function parseJsonEnv(key, fallback) {
  }
 }

+function positiveNumberEnv(key, fallback) {
+  const value = Number(env[key] || "");
+  return Number.isFinite(value) && value >= 0 ? value : fallback;
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return Number(sorted[index].toFixed(3));
+}
+
+function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: Number(Math.min(...values).toFixed(3)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: Number(Math.max(...values).toFixed(3)),
+  };
+}
+
 function promptStepsFromEnv() {
  const rawSteps = parseJsonEnv("LANGBOT_E2E_PROMPTS_JSON", null);
  if (rawSteps === null) {
@@ -658,6 +683,7 @@ try {
      } else {
        for (let index = 0; index < promptSteps.length; index += 1) {
          const step = promptSteps[index];
+          const promptStartedAt = Date.now();
          const chatResult = await runDebugChatPrompt(page, {
            prompt: step.prompt,
            expectedText: step.expectedText,
@@ -665,11 +691,13 @@ try {
            imagePath: index === 0 ? imagePath : "",
            failureSignals: failureSignals.length > 0 ? failureSignals : undefined,
          });
+          const promptDurationMs = Date.now() - promptStartedAt;
          result.chat_results.push({
            index,
            expected_text: step.expectedText,
            status: chatResult.status,
            reason: chatResult.reason,
+            response_duration_ms: promptDurationMs,
            min_expected_count: chatResult.min_expected_count,
            final_count: chatResult.final_count,
            before_assistant_expected_count: chatResult.before_assistant_expected_count,
@@ -714,6 +742,56 @@ try {
  const finishedAt = new Date();
  result.finished_at = finishedAt.toISOString();
  result.finished_at_local = localIsoWithOffset(finishedAt);
+  result.duration_ms = finishedAt.getTime() - startedAt.getTime();
+  const responseDurations = result.chat_results
+    .map((item) => item.response_duration_ms)
+    .filter((value) => Number.isFinite(value));
+  const passedPrompts = result.chat_results.filter((item) => item.status === "pass").length;
+  const attemptedPrompts = result.chat_results.length;
+  const errorRate = attemptedPrompts === 0 ? 1 : Number(((attemptedPrompts - passedPrompts) / attemptedPrompts).toFixed(4));
+  const responseStats = stats(responseDurations);
+  const responseP95BudgetMs = positiveNumberEnv(
+    "LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS",
+    positiveNumberEnv("LANGBOT_DEBUG_CHAT_RESPONSE_P95_MS", safeResponseTimeoutMs),
+  );
+  const maxErrorRate = positiveNumberEnv("LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE", 0);
+  const metrics = {
+    probe: caseId,
+    url: result.url,
+    prompt_count: result.prompt_count,
+    attempted_prompt_count: attemptedPrompts,
+    passed_prompt_count: passedPrompts,
+    error_rate: errorRate,
+    response_duration_ms: responseStats,
+    total_duration_ms: result.duration_ms,
+    chat_results: result.chat_results,
+  };
+  result.metrics_summary = {
+    prompt_count: metrics.prompt_count,
+    attempted_prompt_count: metrics.attempted_prompt_count,
+    passed_prompt_count: metrics.passed_prompt_count,
+    error_rate: metrics.error_rate,
+    response_p50_ms: metrics.response_duration_ms.p50,
+    response_p95_ms: metrics.response_duration_ms.p95,
+    total_duration_ms: metrics.total_duration_ms,
+  };
+  result.thresholds_summary = {
+    response_p95_ms: {
+      actual: metrics.response_duration_ms.p95,
+      max: responseP95BudgetMs,
+      pass: attemptedPrompts > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
+    },
+    error_rate: {
+      actual: metrics.error_rate,
+      max: maxErrorRate,
+      pass: metrics.error_rate <= maxErrorRate,
+    },
+  };
+  await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
+  if (result.status === "pass" && !Object.values(result.thresholds_summary).every((item) => item.pass)) {
+    result.status = "fail";
+    result.reason = "Debug Chat performance breached response latency or error-rate thresholds.";
+  }
  const existingEvidence = {};
  for (const [key, value] of Object.entries(result.evidence)) {
    if (typeof value !== "string") continue;
@@ -130,6 +130,7 @@
        "references/local-agent-runner.md",
        "references/mcp-stdio-testing.md",
        "references/model-provider-testing.md",
+        "references/performance-reliability-testing.md",
        "references/pipeline-debug-chat.md",
        "references/plugin-e2e-smoke.md",
        "references/sandbox-skill-authoring.md",
@@ -150,6 +151,16 @@
        "agent-runner-release-preflight",
        "agent-runner-runtime-chaos",
        "dify-agent-debug-chat",
+        "langbot-fake-provider-debug-chat-cross-pipeline-isolation",
+        "langbot-fake-provider-debug-chat-fault-recovery",
+        "langbot-fake-provider-debug-chat-load",
+        "langbot-fake-provider-debug-chat-slow-load",
+        "langbot-fault-taxonomy-contract",
+        "langbot-live-backend-latency",
+        "langbot-live-backend-log-health",
+        "langbot-live-control-plane-api",
+        "langbot-overhead-accounting-contract",
+        "langbot-space-debug-chat-concurrency-smoke",
        "langrag-kb-retrieve",
        "langrag-parser-golden-e2e",
        "langrag-sentinel-kb-discover",
@@ -165,6 +176,7 @@
        "mcp-stdio-register",
        "mcp-stdio-tool-call",
        "pipeline-debug-chat",
+        "pipeline-debug-chat-performance",
        "plugin-e2e-smoke",
        "provider-deepseek",
        "qa-plugin-smoke-live-install",
@@ -486,6 +498,316 @@
            "backend_log"
          ]
        },
+        {
+          "id": "langbot-fake-provider-debug-chat-cross-pipeline-isolation",
+          "title": "LangBot Debug Chat fake-provider cross-pipeline isolation probe",
+          "mode": "probe",
+          "area": "reliability",
+          "type": "reliability",
+          "priority": "p1",
+          "risk": "high",
+          "ci_eligible": false,
+          "tags": [
+            "reliability",
+            "debug-chat",
+            "websocket",
+            "fake-provider",
+            "isolation",
+            "concurrency",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_FAKE_PROVIDER_URL",
+            "LANGBOT_FAKE_PROVIDER_BASE_URL",
+            "LANGBOT_FAKE_PROVIDER_PID",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME"
+          ],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-fake-provider-debug-chat-fault-recovery",
+          "title": "LangBot Debug Chat fake-provider fault recovery probe",
+          "mode": "probe",
+          "area": "reliability",
+          "type": "chaos",
+          "priority": "p1",
+          "risk": "high",
+          "ci_eligible": false,
+          "tags": [
+            "reliability",
+            "chaos",
+            "debug-chat",
+            "websocket",
+            "fake-provider",
+            "fault-injection",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_FAKE_PROVIDER_URL",
+            "LANGBOT_FAKE_PROVIDER_BASE_URL",
+            "LANGBOT_FAKE_PROVIDER_PID",
+            "LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
+            "LANGBOT_FAKE_PROVIDER_MODEL_UUID",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
+          ],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-fake-provider-debug-chat-load",
+          "title": "LangBot Debug Chat controlled fake-provider load probe",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "debug-chat",
+            "websocket",
+            "fake-provider",
+            "load",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_FAKE_PROVIDER_URL",
+            "LANGBOT_FAKE_PROVIDER_BASE_URL",
+            "LANGBOT_FAKE_PROVIDER_PID",
+            "LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
+            "LANGBOT_FAKE_PROVIDER_MODEL_UUID",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
+          ],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-fake-provider-debug-chat-slow-load",
+          "title": "LangBot Debug Chat slow fake-provider load probe",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "debug-chat",
+            "websocket",
+            "fake-provider",
+            "slow-provider",
+            "load",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_FAKE_PROVIDER_URL",
+            "LANGBOT_FAKE_PROVIDER_BASE_URL",
+            "LANGBOT_FAKE_PROVIDER_PID",
+            "LANGBOT_FAKE_PROVIDER_PROVIDER_UUID",
+            "LANGBOT_FAKE_PROVIDER_MODEL_UUID",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_URL",
+            "LANGBOT_FAKE_PROVIDER_PIPELINE_NAME"
+          ],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-fault-taxonomy-contract",
+          "title": "LangBot fault taxonomy and cleanup contract",
+          "mode": "probe",
+          "area": "reliability",
+          "type": "chaos",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": true,
+          "tags": [
+            "reliability",
+            "chaos",
+            "contract",
+            "synthetic"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs",
+          "setup_automation": [],
+          "setup_provides_env": [],
+          "evidence_required": [
+            "metrics",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-live-backend-latency",
+          "title": "LangBot live backend basic latency probe",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "live-backend",
+            "latency",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-live-backend-latency.mjs",
+          "setup_automation": [],
+          "setup_provides_env": [],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-live-backend-log-health",
+          "title": "LangBot live backend log health probe",
+          "mode": "probe",
+          "area": "reliability",
+          "type": "reliability",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "reliability",
+            "live-backend",
+            "backend-log",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-live-backend-log-health.mjs",
+          "setup_automation": [],
+          "setup_provides_env": [],
+          "evidence_required": [
+            "metrics",
+            "backend_log",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-live-control-plane-api",
+          "title": "LangBot live control-plane API probe",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "reliability",
+            "live-backend",
+            "control-plane",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-live-control-plane-api.mjs",
+          "setup_automation": [],
+          "setup_provides_env": [],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-overhead-accounting-contract",
+          "title": "LangBot overhead accounting metrics contract",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": true,
+          "tags": [
+            "performance",
+            "metrics",
+            "contract",
+            "synthetic"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs",
+          "setup_automation": [],
+          "setup_provides_env": [],
+          "evidence_required": [
+            "metrics",
+            "resource_log",
+            "filesystem"
+          ]
+        },
+        {
+          "id": "langbot-space-debug-chat-concurrency-smoke",
+          "title": "LangBot Debug Chat real Space-provider concurrency smoke",
+          "mode": "probe",
+          "area": "performance",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "high",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "debug-chat",
+            "websocket",
+            "space",
+            "live-provider",
+            "smoke",
+            "metrics"
+          ],
+          "automation": "skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_PIPELINE_URL",
+            "LANGBOT_PIPELINE_NAME",
+            "LANGBOT_LOCAL_AGENT_PIPELINE_URL",
+            "LANGBOT_LOCAL_AGENT_PIPELINE_NAME",
+            "LANGBOT_LOCAL_AGENT_MODEL_UUID",
+            "LANGBOT_E2E_MODEL_UUID"
+          ],
+          "evidence_required": [
+            "metrics",
+            "network",
+            "api_diagnostic",
+            "filesystem"
+          ]
+        },
        {
          "id": "langrag-kb-retrieve",
          "title": "LangRAG knowledge base ingests and retrieves a sentinel document",
@@ -911,6 +1233,38 @@
            "backend_log"
          ]
        },
+        {
+          "id": "pipeline-debug-chat-performance",
+          "title": "Pipeline Debug Chat user-path performance probe",
+          "mode": "agent-browser",
+          "area": "pipeline",
+          "type": "performance",
+          "priority": "p1",
+          "risk": "medium",
+          "ci_eligible": false,
+          "tags": [
+            "performance",
+            "pipeline",
+            "debug-chat",
+            "user-path",
+            "metrics"
+          ],
+          "automation": "scripts/e2e/pipeline-debug-chat.mjs",
+          "setup_automation": [
+            "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
+          ],
+          "setup_provides_env": [
+            "LANGBOT_PIPELINE_URL",
+            "LANGBOT_PIPELINE_NAME"
+          ],
+          "evidence_required": [
+            "ui",
+            "screenshot",
+            "console",
+            "network",
+            "metrics"
+          ]
+        },
        {
          "id": "plugin-e2e-smoke",
          "title": "Plugin system installs a local plugin and exposes tool/page APIs",
@@ -1059,6 +1413,12 @@
      "suites": [
        "agent-runner-release-gate",
        "core-smoke",
+        "langbot-debug-chat-isolation-gate",
+        "langbot-debug-chat-load-gate",
+        "langbot-live-backend-gate",
+        "langbot-performance-contract-gate",
+        "langbot-performance-reliability-gate",
+        "langbot-user-path-performance-gate",
        "local-agent-gate"
      ],
      "suite_summaries": [
@@ -1121,6 +1481,113 @@
            "local-agent-basic-debug-chat"
          ]
        },
+        {
+          "id": "langbot-debug-chat-isolation-gate",
+          "title": "LangBot Debug Chat isolation gate",
+          "description": "Manual/non-required cross-pipeline Debug Chat isolation gate. Current releases may fail this gate because of product bug #2286; use it as regression evidence after the routing fix lands.",
+          "type": "reliability",
+          "priority": "p1",
+          "tags": [
+            "reliability",
+            "debug-chat",
+            "websocket",
+            "isolation",
+            "concurrency"
+          ],
+          "cases": [
+            "langbot-fake-provider-debug-chat-cross-pipeline-isolation"
+          ]
+        },
+        {
+          "id": "langbot-debug-chat-load-gate",
+          "title": "LangBot Debug Chat load gate",
+          "description": "Manual/non-required message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke. Cross-pipeline isolation is split into langbot-debug-chat-isolation-gate because current releases may fail it due to product bug #2286.",
+          "type": "performance",
+          "priority": "p1",
+          "tags": [
+            "performance",
+            "debug-chat",
+            "websocket",
+            "load"
+          ],
+          "cases": [
+            "langbot-fake-provider-debug-chat-load",
+            "langbot-fake-provider-debug-chat-slow-load",
+            "langbot-fake-provider-debug-chat-fault-recovery",
+            "langbot-space-debug-chat-concurrency-smoke"
+          ]
+        },
+        {
+          "id": "langbot-live-backend-gate",
+          "title": "LangBot live backend reliability gate",
+          "description": "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance.",
+          "type": "reliability",
+          "priority": "p1",
+          "tags": [
+            "performance",
+            "reliability",
+            "live-backend",
+            "metrics"
+          ],
+          "cases": [
+            "langbot-live-backend-latency",
+            "langbot-live-control-plane-api",
+            "langbot-live-backend-log-health"
+          ]
+        },
+        {
+          "id": "langbot-performance-contract-gate",
+          "title": "LangBot performance contract gate",
+          "description": "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy.",
+          "type": "contract",
+          "priority": "p1",
+          "tags": [
+            "performance",
+            "reliability",
+            "contract",
+            "metrics"
+          ],
+          "cases": [
+            "langbot-overhead-accounting-contract",
+            "langbot-fault-taxonomy-contract"
+          ]
+        },
+        {
+          "id": "langbot-performance-reliability-gate",
+          "title": "LangBot performance and reliability starter gate",
+          "description": "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks.",
+          "type": "reliability",
+          "priority": "p1",
+          "tags": [
+            "performance",
+            "reliability",
+            "metrics",
+            "chaos"
+          ],
+          "cases": [
+            "langbot-overhead-accounting-contract",
+            "langbot-fault-taxonomy-contract",
+            "langbot-live-backend-latency",
+            "langbot-live-control-plane-api",
+            "langbot-live-backend-log-health"
+          ]
+        },
+        {
+          "id": "langbot-user-path-performance-gate",
+          "title": "LangBot user-path performance gate",
+          "description": "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat.",
+          "type": "performance",
+          "priority": "p1",
+          "tags": [
+            "performance",
+            "browser",
+            "debug-chat",
+            "user-path"
+          ],
+          "cases": [
+            "pipeline-debug-chat-performance"
+          ]
+        },
        {
          "id": "local-agent-gate",
          "title": "Local Agent runner regression gate",
@@ -1265,6 +1732,7 @@
        "sandbox-native-tools-unavailable",
        "socks-proxy-without-socksio",
        "survey-widget-blocks-debug-chat",
+        "telemetry-proxy-noise",
        "tool-name-collision-between-mcp-and-plugin",
        "uv-run-resyncs-local-sdk"
      ],
@@ -1449,6 +1917,14 @@
            "mcp-stdio-tool-call"
          ]
        },
+        {
+          "id": "telemetry-proxy-noise",
+          "title": "Telemetry posting fails through the proxy while the target flow succeeds",
+          "category": "env_issue",
+          "related_cases": [
+            "langbot-space-debug-chat-concurrency-smoke"
+          ]
+        },
        {
          "id": "tool-name-collision-between-mcp-and-plugin",
          "title": "MCP and plugin expose the same tool name",
@@ -26,6 +26,23 @@ LANGBOT_NO_PROXY=localhost,127.0.0.1,::1
 LANGBOT_PIPELINE_URL=
 LANGBOT_PIPELINE_NAME=

+# Optional fake OpenAI-compatible provider controls for Debug Chat load tests.
+# Leave URL empty to let setup automation start a local provider and write the
+# selected URL to skills/.env.local.
+LANGBOT_FAKE_PROVIDER_URL=
+LANGBOT_FAKE_PROVIDER_HOST=127.0.0.1
+LANGBOT_FAKE_PROVIDER_PORT=
+LANGBOT_FAKE_PROVIDER_MODEL_NAME=gpt-4o-mini
+LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT=OK
+LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS=25
+LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS=10
+LANGBOT_FAKE_PROVIDER_CHUNK_COUNT=0
+LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N=0
+LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N=0
+LANGBOT_FAKE_PROVIDER_FAULT_STATUS=500
+LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK=false
+LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE=true
+
 # Optional case-specific runner targets. Prefer these for runner-specific cases
 # so the automation cannot silently test the wrong runner.
 LANGBOT_LOCAL_AGENT_PIPELINE_URL=
@@ -53,7 +53,7 @@ Start the new frontend from the web repo:

 ```bash
 cd "$LANGBOT_WEB_REPO"
-npm run dev
+VITE_API_BASE_URL="$LANGBOT_BACKEND_URL" pnpm dev --host 0.0.0.0
 ```

 Healthy startup includes:
@@ -68,6 +68,10 @@ Quick check:
 curl -I --max-time 3 "$LANGBOT_FRONTEND_URL"
 ```

+If `VITE_API_BASE_URL` is missing, Vite still serves the page but frontend API
+calls may go to the frontend port instead of the backend port. That produces
+false browser failures in login, wizard, pipeline, and Debug Chat cases.
+
 ## Completion Signal

 Environment setup is not complete until the required frontend/backend URLs are reachable and the chosen browser-control path can open the WebUI.
@@ -21,6 +21,7 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
 - **Sandbox-backed skill authoring**: read `references/sandbox-skill-authoring.md`.
 - **LangRAG knowledge bases**: read `references/langrag-knowledge-base.md`.
 - **MCP stdio tool testing**: read `references/mcp-stdio-testing.md`.
+- **Performance, reliability, or chaos probes**: read `references/performance-reliability-testing.md`.
 - **Drive a live instance over MCP (not raw HTTP)**: use the `langbot-mcp-ops` skill — the instance exposes an MCP server at `http://<host>:5300/mcp` (reuses API keys). Useful for setting up bots/pipelines/models as test fixtures programmatically.
 - **Known failures and fixes**: read `references/troubleshooting.md`.
 - **Reusable test groups**: run `bin/lbs suite list` and `bin/lbs suite plan <suite-id>` before manually assembling a case set.
@@ -36,6 +37,8 @@ Use this skill when an agent needs to verify LangBot behavior through the WebUI
 - Use an authenticated browser profile prepared by `langbot-env-setup`.
 - Do not expose API keys, OAuth secrets, tokens, or localStorage token values in output.
 - A WebUI test is not complete until the visible UI result is checked against backend logs or network behavior.
+- A performance result is not complete without `metrics` evidence and a clear split between LangBot overhead and external provider/tool/network time.
+- A chaos or reliability result is not complete until the fault scope, cleanup, and recovery checks are recorded.
 - For a suite, use `bin/lbs suite start <suite-id>` to create the suite evidence root, per-case directories, and `suite-start.json`/`suite-start.md` handoff files; use `bin/lbs test result <case-id>` to write final per-case `result.json`, then run `bin/lbs suite report <suite-id> --evidence-dir <dir>`.
 - Do not mark a case `pass` until `test result --evidence` covers every value in the case's `evidence_required`.
 - For runner-specific Debug Chat cases, use the case-specific pipeline env declared by `automation_pipeline_url_env` / `automation_pipeline_name_env`; do not silently reuse a generic `LANGBOT_PIPELINE_URL`.
@@ -0,0 +1,84 @@
+id: langbot-fake-provider-debug-chat-cross-pipeline-isolation
+title: "LangBot Debug Chat fake-provider cross-pipeline isolation probe"
+mode: probe
+area: reliability
+type: reliability
+priority: p1
+risk: high
+ci_eligible: false
+tags:
+  - reliability
+  - debug-chat
+  - websocket
+  - fake-provider
+  - isolation
+  - concurrency
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+automation: skills/langbot-testing/probes/langbot-debug-chat-cross-pipeline-isolation.mjs
+automation_env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+  - LANGBOT_FAKE_PROVIDER_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
+automation_debug_chat_load_requests: "6"
+automation_debug_chat_load_concurrency: "4"
+automation_debug_chat_load_timeout_ms: "30000"
+automation_debug_chat_load_response_p95_ms: "5000"
+automation_debug_chat_load_max_error_rate: "0"
+automation_debug_chat_load_prompt_template: '请只回复 "{expected}"，不要解释，不要添加其他字符。'
+automation_debug_chat_load_stream: "true"
+automation_debug_chat_load_reset: "true"
+metrics_thresholds_json: '{"cross_pipeline_leak_count":{"max":0},"response_p95_ms":{"max":5000},"error_rate":{"max":0}}'
+load_profile_json: '{"requests_per_pipeline":6,"pipelines":2,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"cross-pipeline response isolation and send-to-final-assistant-response"}'
+setup_automation:
+  - "node:scripts/e2e/ensure-fake-provider-cross-pipelines.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_FAKE_PROVIDER_URL
+  - LANGBOT_FAKE_PROVIDER_BASE_URL
+  - LANGBOT_FAKE_PROVIDER_PID
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME
+steps:
+  - "Start or reuse the local fake OpenAI-compatible provider."
+  - "Create or update two local-agent pipelines that both point at the controlled fake provider."
+  - "Reset both Debug Chat sessions and the fake-provider request log."
+  - "Open concurrent WebSocket Debug Chat connections to both pipelines and send unique pipeline-scoped response tokens."
+checks:
+  - "automation-result.json status is pass only when every request receives its own expected token and cross_pipeline_leak_count is zero."
+  - "metrics_summary includes by_pipeline status counts, fake-provider request count, and LangBot/provider timing estimates."
+  - "samples.json contains per-request pipeline labels so any leak can be attributed to the receiving pipeline."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe targets Debug Chat isolation under concurrent traffic from two pipelines."
+  - "It is designed to expose regressions where global pipeline state causes one pipeline's assistant response to be delivered to another pipeline's Debug Chat session."
+  - "Same-pipeline foreign responses are tolerated because Debug Chat intentionally broadcasts within the same pipeline/session; cross-pipeline tokens are never tolerated."
+  - "Known product bug: current releases may fail this probe because Debug Chat replies can read singleton WebSocket proxy pipeline state after another pipeline overwrites it. See https://github.com/langbot-app/LangBot/issues/2286."
+expected_failures:
+  - "https://github.com/langbot-app/LangBot/issues/2286"
+success_patterns:
+  - "Debug Chat cross-pipeline isolation probe passed"
+failure_patterns:
+  - "cross_pipeline_leak"
+  - "Timed out after"
+  - "WebSocket connection error"
+  - "Final assistant response did not include"
+troubleshooting:
+  - backend-not-listening
+  - debug-chat-history-contaminates-automation
+  - local-agent-model-route-unavailable
@@ -0,0 +1,95 @@
+id: langbot-fake-provider-debug-chat-fault-recovery
+title: "LangBot Debug Chat fake-provider fault recovery probe"
+mode: probe
+area: reliability
+type: chaos
+priority: p1
+risk: high
+ci_eligible: false
+tags:
+  - reliability
+  - chaos
+  - debug-chat
+  - websocket
+  - fake-provider
+  - fault-injection
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
+automation_env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_debug_chat_load_requests: "6"
+automation_debug_chat_load_concurrency: "1"
+automation_debug_chat_load_timeout_ms: "15000"
+automation_debug_chat_load_response_p95_ms: "5000"
+automation_debug_chat_load_max_error_rate: "0"
+automation_debug_chat_load_min_ok_count: "6"
+automation_debug_chat_load_min_provider_fault_count: "2"
+automation_debug_chat_load_expected_prefix: "FAULTQA"
+automation_debug_chat_load_prompt_template: '请只回复 "{expected}"，不要解释，不要添加其他字符。'
+automation_debug_chat_load_stream: "true"
+automation_debug_chat_load_reset: "true"
+automation_debug_chat_load_fail_on_final_mismatch: "true"
+automation_fake_provider_first_token_delay_ms: "25"
+automation_fake_provider_chunk_delay_ms: "10"
+automation_fake_provider_chunk_count: "0"
+automation_fake_provider_fail_first_n: "2"
+automation_fake_provider_fail_every_n: "0"
+automation_fake_provider_fault_status: "503"
+metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"error_rate":{"max":0},"ok_count_min":{"min":6},"fake_provider_fault_count_min":{"min":2}}'
+fault_model_json: '{"provider_fault":"HTTP 503 for first 2 fake-provider chat completions after reset","expected_behavior":"LangBot retries or otherwise recovers from bounded provider failures so every Debug Chat request receives its expected response without backend crash."}'
+load_profile_json: '{"requests":6,"concurrency":1,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","classification":"fault-recovery-not-throughput-benchmark"}'
+setup_automation:
+  - "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_FAKE_PROVIDER_URL
+  - LANGBOT_FAKE_PROVIDER_BASE_URL
+  - LANGBOT_FAKE_PROVIDER_PID
+  - LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
+  - LANGBOT_FAKE_PROVIDER_MODEL_UUID
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+steps:
+  - "Configure the local fake provider to return HTTP 503 for the first two chat completions after reset."
+  - "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
+  - "Reset the target Debug Chat session and fake-provider request counter."
+  - "Send a sequential Debug Chat batch and verify later requests recover after the injected provider faults."
+checks:
+  - "automation-result.json status is pass when the fake provider records at least two injected faults, every Debug Chat request succeeds, and total user-visible error rate stays at zero."
+  - "metrics_summary includes fake_provider_fault_count and status_counts for the same run window."
+  - "backend logs show request handling for the same run window without unexpected Traceback or task-leak findings."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This is a fault-recovery probe, not a throughput benchmark."
+  - "Provider faults may be retried inside the provider/requester path; judge this case by fake_provider_fault_count plus user-visible success/error metrics."
+  - "The profile uses concurrency 1 because Debug Chat broadcasts assistant responses to every connection in a session, and failed responses do not carry the unique success token needed for concurrent attribution."
+success_patterns:
+  - "Debug Chat WebSocket concurrency probe passed"
+  - "Streaming completed"
+failure_patterns:
+  - "fake_provider_fault"
+  - "HTTP 503"
+  - "Timed out after"
+  - "All models failed during streaming setup"
+expected_failures:
+  - "fake_provider_fault"
+  - "HTTP 503"
+troubleshooting:
+  - backend-not-listening
+  - debug-chat-history-contaminates-automation
+  - local-agent-model-route-unavailable
@@ -0,0 +1,81 @@
+id: langbot-fake-provider-debug-chat-load
+title: "LangBot Debug Chat controlled fake-provider load probe"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - performance
+  - debug-chat
+  - websocket
+  - fake-provider
+  - load
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
+automation_env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_debug_chat_load_requests: "12"
+automation_debug_chat_load_concurrency: "4"
+automation_debug_chat_load_timeout_ms: "30000"
+automation_debug_chat_load_response_p95_ms: "5000"
+automation_debug_chat_load_first_response_p95_ms: "3000"
+automation_debug_chat_load_max_error_rate: "0"
+automation_debug_chat_load_expected_prefix: "FAKEQA"
+automation_debug_chat_load_prompt_template: '请只回复 "{expected}"，不要解释，不要添加其他字符。'
+automation_debug_chat_load_stream: "true"
+automation_debug_chat_load_reset: "true"
+metrics_thresholds_json: '{"response_p95_ms":{"max":5000},"first_response_p95_ms":{"max":3000},"error_rate":{"max":0}}'
+load_profile_json: '{"requests":12,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled fake OpenAI-compatible provider","metric":"send-to-final-assistant-response"}'
+setup_automation:
+  - "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_FAKE_PROVIDER_URL
+  - LANGBOT_FAKE_PROVIDER_BASE_URL
+  - LANGBOT_FAKE_PROVIDER_PID
+  - LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
+  - LANGBOT_FAKE_PROVIDER_MODEL_UUID
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+steps:
+  - "Start or reuse the local fake OpenAI-compatible provider."
+  - "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
+  - "Reset the target Debug Chat session."
+  - "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
+checks:
+  - "automation-result.json status is pass when every request receives its own expected assistant response."
+  - "metrics_summary includes request count, concurrency, p50/p95 response latency, first response latency, throughput, and error rate."
+  - "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe removes external model latency from the measurement; it still exercises the live LangBot backend, provider requester, local-agent runner, pipeline, and Debug Chat WebSocket adapter."
+  - "Use this as the repeatable message-path baseline before comparing against Space or another real provider."
+success_patterns:
+  - "Debug Chat WebSocket concurrency probe passed"
+  - "Streaming completed"
+failure_patterns:
+  - "WebSocket connection error"
+  - "Timed out after"
+  - "Final assistant response did not include"
+  - "All models failed during streaming setup"
+troubleshooting:
+  - backend-not-listening
+  - debug-chat-history-contaminates-automation
+  - local-agent-model-route-unavailable
@@ -0,0 +1,88 @@
+id: langbot-fake-provider-debug-chat-slow-load
+title: "LangBot Debug Chat slow fake-provider load probe"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - performance
+  - debug-chat
+  - websocket
+  - fake-provider
+  - slow-provider
+  - load
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
+automation_env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_pipeline_url_env: LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+automation_pipeline_name_env: LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+automation_debug_chat_load_requests: "8"
+automation_debug_chat_load_concurrency: "4"
+automation_debug_chat_load_timeout_ms: "45000"
+automation_debug_chat_load_response_p95_ms: "10000"
+automation_debug_chat_load_first_response_p95_ms: "7000"
+automation_debug_chat_load_max_error_rate: "0"
+automation_debug_chat_load_expected_prefix: "SLOWQA"
+automation_debug_chat_load_prompt_template: '请只回复 "{expected}"，不要解释，不要添加其他字符。'
+automation_debug_chat_load_stream: "true"
+automation_debug_chat_load_reset: "true"
+automation_fake_provider_first_token_delay_ms: "1000"
+automation_fake_provider_chunk_delay_ms: "250"
+automation_fake_provider_chunk_count: "4"
+automation_fake_provider_fail_first_n: "0"
+automation_fake_provider_fail_every_n: "0"
+automation_fake_provider_fault_status: "500"
+metrics_thresholds_json: '{"response_p95_ms":{"max":10000},"first_response_p95_ms":{"max":7000},"error_rate":{"max":0}}'
+load_profile_json: '{"requests":8,"concurrency":4,"path":"Pipeline Debug Chat WebSocket","provider":"controlled slow fake OpenAI-compatible provider","metric":"send-to-final-assistant-response","provider_profile":{"first_token_delay_ms":1000,"chunk_delay_ms":250,"chunk_count":4}}'
+setup_automation:
+  - "node:scripts/e2e/ensure-fake-provider-pipeline.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_FAKE_PROVIDER_URL
+  - LANGBOT_FAKE_PROVIDER_BASE_URL
+  - LANGBOT_FAKE_PROVIDER_PID
+  - LANGBOT_FAKE_PROVIDER_PROVIDER_UUID
+  - LANGBOT_FAKE_PROVIDER_MODEL_UUID
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_URL
+  - LANGBOT_FAKE_PROVIDER_PIPELINE_NAME
+steps:
+  - "Configure the local fake provider with deterministic slow streaming latency."
+  - "Create or update the LangBot provider, model, and local-agent pipeline that points at the fake provider."
+  - "Reset the target Debug Chat session."
+  - "Open concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the real backend pipeline."
+checks:
+  - "automation-result.json status is pass when every request receives its own expected assistant response."
+  - "metrics_summary shows zero errors under the slow-provider profile."
+  - "thresholds_summary shows response_p95_ms, first_response_p95_ms, and error_rate pass."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe keeps the model deterministic while injecting provider latency, so it catches backend timeout, streaming, and WebSocket backpressure issues without Space variability."
+  - "Compare with langbot-fake-provider-debug-chat-load to separate fixed LangBot overhead from provider-latency amplification."
+success_patterns:
+  - "Debug Chat WebSocket concurrency probe passed"
+  - "Streaming completed"
+failure_patterns:
+  - "WebSocket connection error"
+  - "Timed out after"
+  - "Final assistant response did not include"
+  - "All models failed during streaming setup"
+troubleshooting:
+  - backend-not-listening
+  - debug-chat-history-contaminates-automation
+  - local-agent-model-route-unavailable
@@ -0,0 +1,35 @@
+id: langbot-fault-taxonomy-contract
+title: "LangBot fault taxonomy and cleanup contract"
+mode: probe
+area: reliability
+type: chaos
+priority: p1
+risk: medium
+ci_eligible: true
+tags:
+  - reliability
+  - chaos
+  - contract
+  - synthetic
+skills:
+  - langbot-testing
+automation: skills/langbot-testing/probes/langbot-fault-taxonomy-contract.mjs
+fault_model_json: '{"kind":"taxonomy-contract","destructive":false,"scenarios":["provider-timeout","plugin-runtime-disconnect","mcp-stdio-server-exit","operator-missing-login","transient-marketplace-timeout"]}'
+steps:
+  - "Run `rtk bin/lbs test run langbot-fault-taxonomy-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
+  - "Automation validates that representative fault scenarios declare target, injected fault, expected status, recovery check, and cleanup."
+  - "Review metrics.json, fault-model.json, and automation-result.json under LBS_EVIDENCE_DIR."
+checks:
+  - "automation-result.json status is pass."
+  - "Every scenario has an expected status in pass, fail, blocked, env_issue, or flaky."
+  - "Every scenario declares a cleanup action and recovery check."
+evidence_required:
+  - metrics
+  - filesystem
+diagnostics:
+  - "This is a non-destructive taxonomy contract probe; it does not inject real runtime faults."
+  - "Use it as a gate before adding live chaos cases that kill runtimes, route traffic through a proxy, or disrupt a backend dependency."
+success_patterns:
+  - "Fault taxonomy contract declares status"
+failure_patterns:
+  - "missing required scenario fields"
@@ -0,0 +1,42 @@
+id: langbot-live-backend-latency
+title: "LangBot live backend basic latency probe"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - performance
+  - live-backend
+  - latency
+  - metrics
+skills:
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+automation: skills/langbot-testing/probes/langbot-live-backend-latency.mjs
+metrics_thresholds_json: '{"backend_p95_ms":{"max":1000},"error_rate":{"max":0}}'
+load_profile_json: '{"requests":12,"concurrency":2,"endpoints":["/healthz"]}'
+steps:
+  - "Confirm the selected LangBot backend is the intended test target."
+  - "Run `rtk bin/lbs test run langbot-live-backend-latency --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
+  - "Automation sends a small request batch to LANGBOT_BACKEND_URL/healthz and records latency, status counts, and network errors."
+checks:
+  - "automation-result.json status is pass when the backend responds and p95/error-rate thresholds pass."
+  - "automation-result.json status is env_issue when the backend is not reachable."
+  - "metrics.json and network.log are written under LBS_EVIDENCE_DIR."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe measures backend health endpoint reachability latency only; it does not cover model/provider, browser, Debug Chat, RAG, or plugin runtime latency."
+success_patterns:
+  - "Live backend latency probe passed"
+failure_patterns:
+  - "Backend did not respond"
+  - "breached latency or error-rate thresholds"
+troubleshooting:
+  - socks-proxy-without-socksio
@@ -0,0 +1,45 @@
+id: langbot-live-backend-log-health
+title: "LangBot live backend log health probe"
+mode: probe
+area: reliability
+type: reliability
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - reliability
+  - live-backend
+  - backend-log
+  - metrics
+skills:
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+automation: skills/langbot-testing/probes/langbot-live-backend-log-health.mjs
+metrics_thresholds_json: '{"fail_count":{"max":0}}'
+load_profile_json: '{"lookback_seconds":300,"log_source":"LANGBOT_BACKEND_LOG or latest LANGBOT_REPO/data/logs/langbot-*.log"}'
+steps:
+  - "Confirm the selected LangBot backend log belongs to the intended test target."
+  - "Run `rtk bin/lbs test run langbot-live-backend-log-health --dry-run` first; remove `--dry-run` after checking evidence directory and log source."
+  - "Automation scans the recent backend log window for fail-severity runtime findings such as Traceback, ImportError, ERROR, unclosed sessions, and unawaited coroutines."
+checks:
+  - "automation-result.json status is pass only when fail_count is 0."
+  - "metrics_summary includes scanned_line_count, fail_count, warning_count, and finding_count."
+  - "findings.json and scanned-backend.log are written under LBS_EVIDENCE_DIR."
+evidence_required:
+  - metrics
+  - backend_log
+  - filesystem
+diagnostics:
+  - "Set LANGBOT_BACKEND_LOG to an explicit log path when the latest log file is not the run target."
+  - "Set LANGBOT_BACKEND_LOG_SINCE or LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS to control the scan window."
+  - "This probe measures runtime log health; it does not prove user-facing Debug Chat, plugin, model, or RAG behavior."
+success_patterns:
+  - "Live backend log health passed"
+failure_patterns:
+  - "Traceback"
+  - "ImportError"
+  - "ERROR"
+  - "unclosed"
+troubleshooting:
+  - socks-proxy-without-socksio
@@ -0,0 +1,44 @@
+id: langbot-live-control-plane-api
+title: "LangBot live control-plane API probe"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - performance
+  - reliability
+  - live-backend
+  - control-plane
+  - metrics
+skills:
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+automation: skills/langbot-testing/probes/langbot-live-control-plane-api.mjs
+metrics_thresholds_json: '{"error_rate":{"max":0},"response_shape_failures":{"max":0},"healthz_p95_ms":{"max":500},"system_info_p95_ms":{"max":1000}}'
+load_profile_json: '{"requests":20,"concurrency":4,"endpoints":["/healthz","/api/v1/system/info"],"auth_required":false}'
+steps:
+  - "Confirm the selected LangBot backend is the intended test target."
+  - "Run `rtk bin/lbs test run langbot-live-control-plane-api --dry-run` first; remove `--dry-run` after checking LANGBOT_BACKEND_URL and evidence directory."
+  - "Automation sends a small request batch to /healthz and /api/v1/system/info, then validates status code, JSON shape, and latency budgets."
+checks:
+  - "automation-result.json status is pass when every control-plane request returns HTTP 200, JSON code 0, and required response fields."
+  - "metrics_summary includes per-endpoint p50/p95 latency, error rate, status counts, and response_shape_failures."
+  - "thresholds_summary shows error_rate, response_shape_failures, healthz_p95_ms, and system_info_p95_ms all pass."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe measures unauthenticated backend control-plane readiness; it does not cover authenticated UI flows, Debug Chat, model calls, plugins, or RAG."
+  - "A system_info shape failure usually means the API contract or startup state changed and should be investigated before treating latency as healthy."
+success_patterns:
+  - "Live control-plane API probe passed"
+failure_patterns:
+  - "Backend did not respond"
+  - "breached shape, latency, or error-rate thresholds"
+troubleshooting:
+  - socks-proxy-without-socksio
@@ -0,0 +1,37 @@
+id: langbot-overhead-accounting-contract
+title: "LangBot overhead accounting metrics contract"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: medium
+ci_eligible: true
+tags:
+  - performance
+  - metrics
+  - contract
+  - synthetic
+skills:
+  - langbot-testing
+automation: skills/langbot-testing/probes/langbot-overhead-accounting-contract.mjs
+metrics_thresholds_json: '{"sample_count":{"min":50},"langbot_overhead_p95_ms":{"max":25},"accounting_gap_max_ms":{"max":0.001}}'
+load_profile_json: '{"kind":"synthetic-overhead-accounting","samples":80,"external_latency_segments":["provider","external_tool","network"]}'
+steps:
+  - "Run `rtk bin/lbs test run langbot-overhead-accounting-contract --dry-run` first; remove `--dry-run` after checking the evidence directory."
+  - "Automation generates deterministic message-path latency samples and separates LangBot overhead from provider/tool/network latency."
+  - "Review metrics.json, thresholds.json, resource-log.json, and automation-result.json under LBS_EVIDENCE_DIR."
+checks:
+  - "automation-result.json status is pass."
+  - "metrics_summary includes sample_count, langbot_overhead_p95_ms, e2e_latency_p95_ms, external_latency_p95_ms, and accounting_gap_max_ms."
+  - "thresholds_summary shows sample_count, langbot_overhead_p95_ms, and accounting_gap_max_ms all pass."
+evidence_required:
+  - metrics
+  - resource_log
+  - filesystem
+diagnostics:
+  - "This is a synthetic contract probe for the QA harness; it is not live product performance."
+  - "Use it to verify that reports can carry overhead accounting metrics before running live backend or browser performance probes."
+success_patterns:
+  - "Overhead accounting contract passed"
+failure_patterns:
+  - "breached one or more thresholds"
@@ -0,0 +1,84 @@
+id: langbot-space-debug-chat-concurrency-smoke
+title: "LangBot Debug Chat real Space-provider concurrency smoke"
+mode: probe
+area: performance
+type: performance
+priority: p1
+risk: high
+ci_eligible: false
+tags:
+  - performance
+  - debug-chat
+  - websocket
+  - space
+  - live-provider
+  - smoke
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+automation: skills/langbot-testing/probes/langbot-debug-chat-concurrency.mjs
+automation_env:
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_E2E_LOGIN_USER
+  - LANGBOT_LOCAL_AGENT_PIPELINE_URL
+  - LANGBOT_LOCAL_AGENT_PIPELINE_NAME
+automation_pipeline_url_env: LANGBOT_LOCAL_AGENT_PIPELINE_URL
+automation_pipeline_name_env: LANGBOT_LOCAL_AGENT_PIPELINE_NAME
+automation_debug_chat_load_requests: "3"
+automation_debug_chat_load_concurrency: "2"
+automation_debug_chat_load_timeout_ms: "120000"
+automation_debug_chat_load_response_p95_ms: "120000"
+automation_debug_chat_load_max_error_rate: "0"
+automation_debug_chat_load_expected_prefix: "SPACEQA"
+automation_debug_chat_load_prompt_template: '请只回复 "{expected}"，不要解释，不要添加其他字符。'
+automation_debug_chat_load_stream: "true"
+automation_debug_chat_load_reset: "true"
+metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
+load_profile_json: '{"requests":3,"concurrency":2,"path":"Pipeline Debug Chat WebSocket","provider":"LangBot Space model route","metric":"send-to-final-assistant-response","classification":"smoke-not-benchmark"}'
+setup_automation:
+  - "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_PIPELINE_URL
+  - LANGBOT_PIPELINE_NAME
+  - LANGBOT_LOCAL_AGENT_PIPELINE_URL
+  - LANGBOT_LOCAL_AGENT_PIPELINE_NAME
+  - LANGBOT_LOCAL_AGENT_MODEL_UUID
+  - LANGBOT_E2E_MODEL_UUID
+preconditions:
+  - "The selected local LangBot instance is safe for a low-volume real Space model smoke run."
+  - "Treat Space/provider/network failures as environment or dependency findings until fake-provider baseline evidence separates LangBot overhead."
+steps:
+  - "Prepare a local-agent pipeline with a tested Space model and fallback models."
+  - "Reset the target Debug Chat session."
+  - "Open a small number of concurrent WebSocket Debug Chat connections and send unique deterministic prompts through the live Space provider path."
+checks:
+  - "automation-result.json status is pass when every request receives its own expected assistant response."
+  - "metrics_summary includes request count, concurrency, p95 response latency, throughput, and error rate."
+  - "The report classifies the result as a live-provider smoke, not a stable LangBot overhead benchmark."
+evidence_required:
+  - metrics
+  - network
+  - api_diagnostic
+  - filesystem
+diagnostics:
+  - "This probe measures real user-path latency through Space and includes provider latency, model behavior, and network effects."
+  - "Compare with langbot-fake-provider-debug-chat-load before attributing slow or failed runs to LangBot itself."
+success_patterns:
+  - "Debug Chat WebSocket concurrency probe passed"
+  - "Streaming completed"
+failure_patterns:
+  - "invalid api key"
+  - "WebSocket connection error"
+  - "Timed out after"
+  - "Final assistant response did not include"
+  - "All models failed during streaming setup"
+troubleshooting:
+  - local-agent-model-route-unavailable
+  - marketplace-network-flaky
+  - proxy-env-mismatch
+  - telemetry-proxy-noise
@@ -0,0 +1,80 @@
+id: pipeline-debug-chat-performance
+title: "Pipeline Debug Chat user-path performance probe"
+mode: agent-browser
+area: pipeline
+type: performance
+priority: p1
+risk: medium
+ci_eligible: false
+tags:
+  - performance
+  - pipeline
+  - debug-chat
+  - user-path
+  - metrics
+skills:
+  - langbot-env-setup
+  - langbot-testing
+env:
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_BACKEND_URL
+env_any:
+  - LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
+automation: scripts/e2e/pipeline-debug-chat.mjs
+automation_env:
+  - LANGBOT_FRONTEND_URL
+  - LANGBOT_BACKEND_URL
+  - LANGBOT_BROWSER_PROFILE
+  - LANGBOT_CHROMIUM_EXECUTABLE
+  - LANGBOT_E2E_PROMPT
+  - LANGBOT_E2E_EXPECTED_TEXT
+  - LANGBOT_E2E_RESPONSE_TIMEOUT_MS
+automation_env_any:
+  - LANGBOT_PIPELINE_URL|LANGBOT_PIPELINE_NAME
+automation_prompt: "请只回复 OK，用于性能测试。"
+automation_expected_text: "OK"
+automation_response_timeout_ms: "120000"
+automation_reset_debug_chat: "true"
+automation_debug_chat_response_p95_ms: "120000"
+automation_debug_chat_max_error_rate: "0"
+metrics_thresholds_json: '{"response_p95_ms":{"max":120000},"error_rate":{"max":0}}'
+load_profile_json: '{"prompts":1,"browser":true,"path":"Pipeline Debug Chat","metric":"send-to-visible-completion"}'
+setup_automation:
+  - "node:scripts/e2e/ensure-local-agent-pipeline.mjs --write-env"
+setup_provides_env:
+  - LANGBOT_PIPELINE_URL
+  - LANGBOT_PIPELINE_NAME
+preconditions:
+  - "LANGBOT_PIPELINE_URL or LANGBOT_PIPELINE_NAME points to the pipeline intended for this Debug Chat performance run."
+  - "The target pipeline is safe to reset Debug Chat history for this run."
+  - "The target pipeline has a known-good runner/model; provider latency should be interpreted separately from LangBot overhead."
+steps:
+  - "Open LANGBOT_FRONTEND_URL with the prepared browser profile."
+  - "Open the target pipeline and select Debug Chat."
+  - "Reset Debug Chat history through the backend API when configured."
+  - "Send the deterministic prompt and wait for the expected assistant response."
+checks:
+  - "automation-result.json status is pass when the expected assistant response appears."
+  - "metrics_summary includes response_p50_ms, response_p95_ms, error_rate, and total_duration_ms."
+  - "thresholds_summary shows response_p95_ms and error_rate pass."
+evidence_required:
+  - ui
+  - screenshot
+  - console
+  - network
+  - metrics
+diagnostics:
+  - "This case measures browser-visible send-to-completion latency; it does not split provider latency from LangBot overhead."
+  - "Use backend logs and provider diagnostics to explain slow runs before calling them LangBot regressions."
+success_patterns:
+  - "Processing request from person_websocket"
+  - "Streaming completed"
+failure_patterns:
+  - "Action invoke_llm_stream call timed out"
+  - "Task exception was never retrieved"
+  - "All models failed during streaming setup"
+troubleshooting:
+  - debug-chat-history-contaminates-automation
+  - local-agent-model-route-unavailable
+  - plugin-runtime-timeout
+  - proxy-env-mismatch
@@ -1,58 +0,0 @@
-id: skill-discovery-via-mcp-gateway
-title: "External harness discovers LangBot skills via langbot_list_assets (all-tool model)"
-mode: agent-browser
-area: sandbox
-type: regression
-priority: p2
-risk: medium
-ci_eligible: false
-tags:
-  - skills
-  - mcp-gateway
-  - acp-agent-runner
-  - all-tool-model
-  - tools
-skills:
-  - langbot-env-setup
-  - langbot-testing
-env:
-  - LANGBOT_FRONTEND_URL
-  - LANGBOT_BACKEND_URL
-  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
-  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
-preconditions:
-  - "An external-harness runner pipeline (e.g. ACP remote claude-code) is configured with langbot-assets-enabled=true so the LangBot MCP gateway is exposed to the harness."
-  - "The remote harness (claude-code) is reachable and responsive (claude -p returns within the runner timeout)."
-  - "At least one pipeline-visible skill exists in the Box skill store (otherwise the count is 0, which is still a valid pass for the discovery surface)."
-automation: scripts/e2e/pipeline-debug-chat.mjs
-automation_env:
-  - LANGBOT_FRONTEND_URL
-  - LANGBOT_BROWSER_PROFILE
-  - LANGBOT_CHROMIUM_EXECUTABLE
-  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
-  - LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
-automation_pipeline_url_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_URL
-automation_pipeline_name_env: LANGBOT_ACP_AGENT_RUNNER_PIPELINE_NAME
-automation_prompt: "You have LangBot tools available via an MCP server (tools prefixed langbot_). Call langbot_list_assets with asset_types = [\"skills\",\"tools\"]. Then reply with one single line: the literal token PROBEDONE, a space, the number of skills you found, a space, and the number of tools you found."
-automation_expected_text: "PROBEDONE"
-automation_response_timeout_ms: "540000"
-steps:
-  - "Open LANGBOT_FRONTEND_URL and navigate to the external-harness (ACP) pipeline."
-  - "Open Debug Chat with langbot-assets-enabled on the runner."
-  - "Send the automation_prompt asking the harness to call langbot_list_assets with asset_types [skills, tools]."
-  - "Capture the final reply, backend logs, and the MCP gateway call trace."
-checks:
-  - "UI: final reply contains PROBEDONE followed by a skill count and a tool count."
-  - "Logs: backend shows the harness invoked langbot_list_assets and the response included a 'skills' asset class (this is the all-tool-model discovery surface added on this branch)."
-  - "Behavior parity: a local-agent runner reaches the same skills via use_funcs / activate; the external harness reaches them via langbot_list_assets + langbot_call_tool."
-evidence_required:
-  - ui
-  - screenshot
-  - backend_log
-expected_failures:
-  - "runner.timeout when the remote claude-code harness is unauthenticated or slow to start — this is an environment issue, not a discovery-surface regression."
-diagnostics:
-  - "If runner.timeout: ssh into the harness host and confirm `claude -p 'hi'` returns quickly; the ACP runner cannot complete until the harness responds."
-  - "Activated-skill OPERATE on docker+shared-fs is tracked separately by issue #2271 and is out of scope for this discovery case."
-troubleshooting:
-  - sandbox-native-tools-unavailable
@@ -1 +1,3 @@
-dist/
+dist/*
+!dist/
+!dist/qa-plugin-smoke-0.1.0.lbpkg
@@ -0,0 +1,837 @@
+#!/usr/bin/env node
+
+import crypto from "node:crypto";
+import net from "node:net";
+import tls from "node:tls";
+import { mkdir, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+import {
+  apiJson,
+  appendLine,
+  ensureEvidence,
+  evidencePaths,
+  loadEnvFiles,
+  localIsoWithOffset,
+  redact,
+  resetAndAuthLocalUser,
+  writeResult,
+} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
+import {
+  buildProviderTimingMetrics,
+  summarizeFakeProviderState,
+} from "./lib/fake-provider-timing.mjs";
+
+const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
+
+await loadEnvFiles();
+const caseId = env.LBS_CASE_ID || "langbot-debug-chat-concurrency";
+const paths = evidencePaths(caseId);
+await ensureEvidence(paths);
+
+const startedAt = new Date();
+const metricsPath = resolve(paths.evidenceDir, "metrics.json");
+const samplesPath = resolve(paths.evidenceDir, "samples.json");
+const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
+const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
+const backendUrl = env.LANGBOT_BACKEND_URL || "";
+const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
+const pipelineUrl = env.LANGBOT_E2E_PIPELINE_URL || env.LANGBOT_PIPELINE_URL || "";
+const pipelineName = env.LANGBOT_E2E_PIPELINE_NAME || env.LANGBOT_PIPELINE_NAME || "";
+const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
+const totalRequests = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, defaultRequests(caseId));
+const concurrency = Math.min(totalRequests, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, defaultConcurrency(caseId)));
+const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, defaultTimeout(caseId));
+const expectedPrefix = env.LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX || "LBQA";
+const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
+  || "请只回复 \"{expected}\"，不要解释，不要添加其他字符。";
+const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
+const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
+const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, defaultP95Budget(caseId));
+const firstResponseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS, 0);
+const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
+const minErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE, 0);
+const minErrorCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT, 0);
+const minOkCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT, 0);
+const minProviderFaultCount = nonNegativeInteger(env.LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT, 0);
+const failOnFinalMismatch = bool(env.LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH, false);
+const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
+
+const result = {
+  source: "automation",
+  case_id: caseId,
+  run_id: paths.runId,
+  status: "fail",
+  reason: "",
+  started_at: startedAt.toISOString(),
+  started_at_local: localIsoWithOffset(startedAt),
+  finished_at: "",
+  finished_at_local: "",
+  duration_ms: 0,
+  backend_url: backendUrl,
+  pipeline_url: pipelineUrl,
+  pipeline_name: pipelineName,
+  pipeline_id: "",
+  session_type: sessionType,
+  load_profile: {
+    requests: totalRequests,
+    concurrency,
+    timeout_ms: timeoutMs,
+    stream,
+    reset_before_run: resetBeforeRun,
+    fail_on_final_mismatch: failOnFinalMismatch,
+  },
+  evidence: {
+    network_log: paths.networkLog,
+    metrics_json: metricsPath,
+    samples_json: samplesPath,
+    fake_provider_state_json: fakeProviderStatePath,
+    debug_chat_reset_diagnostic_json: resetDiagnosticPath,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  },
+  evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
+};
+
+try {
+  if (!backendUrl) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_BACKEND_URL is not configured.");
+  }
+  if (!["person", "group"].includes(sessionType)) {
+    throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
+  }
+  const backendReady = await backendReachable(backendUrl);
+  if (!backendReady) {
+    result.status = "env_issue";
+    throw new Error(`Backend did not respond at ${backendUrl}.`);
+  }
+
+  const user = env.LANGBOT_E2E_LOGIN_USER || "";
+  const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
+  if (!user) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset the Debug Chat session.");
+  }
+  const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
+
+  const pipeline = await resolvePipeline({ backendUrl, token: auth.token, pipelineUrl, pipelineName });
+  result.pipeline_id = pipeline.id;
+  result.pipeline_name = pipeline.name || pipelineName;
+  if (!result.pipeline_url && env.LANGBOT_FRONTEND_URL) {
+    result.pipeline_url = `${env.LANGBOT_FRONTEND_URL.replace(/\/$/, "")}/home/pipelines?id=${encodeURIComponent(pipeline.id)}`;
+  }
+
+  if (resetBeforeRun) {
+    const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
+      method: "POST",
+      token: auth.token,
+    });
+    const resetDiagnostic = {
+      status: isApiFailure(reset) ? "fail" : "ready",
+      http_status: reset.status,
+      code: reset.json.code ?? null,
+      reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
+    };
+    await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostic, null, 2)}\n`, "utf8");
+    if (resetDiagnostic.status === "fail") {
+      throw new Error(resetDiagnostic.reason);
+    }
+  }
+
+  const wsUrl = websocketUrl(backendUrl, pipeline.id, sessionType);
+  const loadStartedAt = performance.now();
+  const samples = await runLoad({
+    wsUrl,
+    totalRequests,
+    concurrency,
+    timeoutMs,
+    promptTemplate,
+    expectedPrefix,
+    stream,
+    failOnFinalMismatch,
+    failureSignals,
+  });
+  const loadDurationMs = performance.now() - loadStartedAt;
+  const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
+  if (fakeProviderState) {
+    await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
+  }
+  const metrics = buildMetrics({
+    samples,
+    totalRequests,
+    concurrency,
+    timeoutMs,
+    loadDurationMs,
+    backendUrl,
+    pipelineId: pipeline.id,
+    sessionType,
+    fakeProviderState,
+  });
+  const thresholds = buildThresholds(metrics);
+  const passed = Object.values(thresholds).every((item) => item.pass);
+  result.status = passed ? "pass" : "fail";
+  result.reason = passed
+    ? "Debug Chat WebSocket concurrency probe passed all thresholds."
+    : "Debug Chat WebSocket concurrency probe breached latency or error-rate thresholds.";
+  result.metrics_summary = {
+    requests: metrics.total_requests,
+    concurrency: metrics.concurrency,
+    ok_count: metrics.ok_count,
+    error_count: metrics.error_count,
+    timeout_count: metrics.timeout_count,
+    error_rate: metrics.error_rate,
+    response_p50_ms: metrics.response_duration_ms.p50,
+    response_p95_ms: metrics.response_duration_ms.p95,
+    first_assistant_event_p95_ms: metrics.first_assistant_event_ms.p95,
+    first_assistant_content_p95_ms: metrics.first_assistant_content_ms.p95,
+    first_response_p95_ms: metrics.first_response_ms.p95,
+    throughput_rps: metrics.throughput_rps,
+    status_counts: metrics.status_counts,
+    fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
+    fake_provider_fault_count: metrics.fake_provider?.fault_count ?? null,
+    fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
+    langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
+    send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
+    provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
+    provider_timing_matched_request_count: metrics.provider_timing?.matched_request_count ?? null,
+  };
+  result.thresholds_summary = thresholds;
+  result.artifacts = {
+    metrics_json: metricsPath,
+    samples_json: samplesPath,
+    fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
+    network_log: paths.networkLog,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  };
+
+  await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
+  await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
+} catch (error) {
+  if (!["env_issue", "blocked"].includes(result.status)) {
+    result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
+  }
+  result.reason = result.reason || safeReason(error.message);
+} finally {
+  const finishedAt = new Date();
+  result.finished_at = finishedAt.toISOString();
+  result.finished_at_local = localIsoWithOffset(finishedAt);
+  result.duration_ms = finishedAt.getTime() - startedAt.getTime();
+  await mkdir(paths.evidenceDir, { recursive: true });
+  await writeResult(paths, result);
+  console.log(JSON.stringify(result, null, 2));
+}
+
+exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
+
+function defaultRequests(id) {
+  return id.includes("space") ? 3 : 12;
+}
+
+function defaultConcurrency(id) {
+  return id.includes("space") ? 1 : 4;
+}
+
+function defaultTimeout(id) {
+  return id.includes("space") ? 120_000 : 30_000;
+}
+
+function defaultP95Budget(id) {
+  return id.includes("space") ? 120_000 : 5_000;
+}
+
+function positiveInteger(value, fallback) {
+  const parsed = Number.parseInt(String(value || ""), 10);
+  return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
+}
+
+function nonNegativeInteger(value, fallback) {
+  const parsed = Number.parseInt(String(value ?? ""), 10);
+  return Number.isInteger(parsed) && parsed >= 0 ? parsed : fallback;
+}
+
+function positiveNumber(value, fallback) {
+  const parsed = Number(value || "");
+  return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
+}
+
+function bool(value, fallback) {
+  if (value === undefined || value === "") return fallback;
+  if (/^(1|true|yes|on)$/i.test(String(value))) return true;
+  if (/^(0|false|no|off)$/i.test(String(value))) return false;
+  return fallback;
+}
+
+function textList(value) {
+  return String(value || "")
+    .split(/\r?\n|,/)
+    .map((item) => item.trim())
+    .filter(Boolean);
+}
+
+async function backendReachable(baseUrl) {
+  try {
+    const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
+      signal: AbortSignal.timeout(3000),
+    });
+    return response.status < 500;
+  } catch {
+    return false;
+  }
+}
+
+async function readFakeProviderState(rootUrl) {
+  if (!rootUrl) return null;
+  try {
+    const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
+      signal: AbortSignal.timeout(3000),
+    });
+    const json = await response.json().catch(() => ({}));
+    return {
+      status: response.ok && json.ok === true ? "loaded" : "unavailable",
+      url: normalizeProviderRootUrl(rootUrl),
+      http_status: response.status,
+      model: json.model || "",
+      config: json.config || {},
+      request_count: Number.isFinite(json.request_count) ? json.request_count : null,
+      recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
+    };
+  } catch (error) {
+    return {
+      status: "unavailable",
+      url: normalizeProviderRootUrl(rootUrl),
+      reason: safeReason(error.message),
+      request_count: null,
+      recent_requests: [],
+    };
+  }
+}
+
+function normalizeProviderRootUrl(value) {
+  const trimmed = String(value || "").trim().replace(/\/$/, "");
+  return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
+}
+
+function pipelineIdFromUrl(url) {
+  if (!url) return "";
+  try {
+    const parsed = new URL(url);
+    return parsed.searchParams.get("id") || "";
+  } catch {
+    return "";
+  }
+}
+
+async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
+  const idFromUrl = pipelineIdFromUrl(pipelineUrl);
+  if (idFromUrl) {
+    const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
+    const pipeline = response.json.data?.pipeline;
+    if (isApiFailure(response) || !pipeline?.uuid) {
+      throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
+    }
+    return { id: pipeline.uuid, name: pipeline.name || "" };
+  }
+  if (!pipelineName) {
+    throw new Error("Set LANGBOT_E2E_PIPELINE_URL or LANGBOT_E2E_PIPELINE_NAME before running this probe.");
+  }
+  const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
+  if (isApiFailure(response)) {
+    throw new Error(response.json.msg || "Failed to list pipelines.");
+  }
+  const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
+  if (!pipeline?.uuid) {
+    throw new Error(`Could not find pipeline named ${pipelineName}.`);
+  }
+  return { id: pipeline.uuid, name: pipeline.name || pipelineName };
+}
+
+function isApiFailure(response) {
+  return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
+}
+
+function websocketUrl(baseUrl, pipelineId, sessionType) {
+  const parsed = new URL(baseUrl);
+  parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
+  parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
+  parsed.search = `?session_type=${encodeURIComponent(sessionType)}`;
+  return parsed.toString();
+}
+
+async function runLoad(options) {
+  const samples = [];
+  let nextIndex = 0;
+  const workers = Array.from({ length: options.concurrency }, async () => {
+    while (nextIndex < options.totalRequests) {
+      const index = nextIndex;
+      nextIndex += 1;
+      const sample = await runSingleRequest({ ...options, index });
+      samples.push(sample);
+    }
+  });
+  await Promise.all(workers);
+  return samples.sort((left, right) => left.index - right.index);
+}
+
+function expectedForIndex(prefix, index) {
+  return `${prefix}-${String(index + 1).padStart(4, "0")}`;
+}
+
+function promptForIndex(template, expected) {
+  return template.replaceAll("{expected}", expected);
+}
+
+function runSingleRequest({
+  wsUrl,
+  index,
+  timeoutMs,
+  promptTemplate,
+  expectedPrefix,
+  stream,
+  failOnFinalMismatch,
+  failureSignals,
+}) {
+  return new Promise((resolve) => {
+    const expected = expectedForIndex(expectedPrefix, index);
+    const prompt = promptForIndex(promptTemplate, expected);
+    const sample = {
+      index,
+      status: "running",
+      ok: false,
+      expected_text: expected,
+      prompt,
+      response_text: "",
+      started_at: new Date().toISOString(),
+      started_epoch_ms: Date.now(),
+      connected_at: null,
+      connected_epoch_ms: null,
+      sent_at: null,
+      sent_epoch_ms: null,
+      first_assistant_event_at: null,
+      first_assistant_event_epoch_ms: null,
+      first_assistant_event_ms: null,
+      first_assistant_content_at: null,
+      first_assistant_content_epoch_ms: null,
+      first_assistant_content_ms: null,
+      first_response_at: null,
+      first_response_epoch_ms: null,
+      connected_ms: null,
+      first_response_ms: null,
+      response_duration_ms: null,
+      finished_at: null,
+      finished_epoch_ms: null,
+      event_count: 0,
+      foreign_response_count: 0,
+      last_foreign_response_text: "",
+      error: "",
+      close_code: null,
+      close_reason: "",
+    };
+    let closed = false;
+    let connectedAt = 0;
+    let sentAt = 0;
+    const startedAt = performance.now();
+    let client = null;
+    const timer = setTimeout(() => {
+      finish("timeout", `Timed out after ${timeoutMs} ms.`);
+    }, timeoutMs);
+
+    client = openRawWebSocket(wsUrl, {
+      onOpen() {
+        connectedAt = performance.now();
+        const now = Date.now();
+        sample.connected_at = new Date(now).toISOString();
+        sample.connected_epoch_ms = now;
+        sample.connected_ms = rounded(connectedAt - startedAt);
+      },
+      onMessage(text) {
+        sample.event_count += 1;
+        let data;
+        try {
+          data = JSON.parse(String(text || ""));
+        } catch (error) {
+          finish("error", `Invalid WebSocket JSON: ${error.message}`);
+          return;
+        }
+        appendLine(paths.networkLog, JSON.stringify({
+          request_index: index,
+          type: data.type,
+          session_type: data.session_type || "",
+          role: data.data?.role || "",
+          is_final: data.data?.is_final ?? null,
+          content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
+        })).catch(() => {});
+
+        if (data.type === "connected") {
+          sentAt = performance.now();
+          const now = Date.now();
+          sample.sent_at = new Date(now).toISOString();
+          sample.sent_epoch_ms = now;
+          client.send(JSON.stringify({
+            type: "message",
+            message: [{ type: "Plain", text: prompt }],
+            stream,
+          }));
+          return;
+        }
+        if (data.type === "error") {
+          finish("error", data.message || "WebSocket error message.");
+          return;
+        }
+        if (data.type !== "response" || data.data?.role !== "assistant") return;
+
+        const content = String(data.data.content || "");
+        markFirstAssistantEvent(sample, sentAt);
+        if (content) sample.response_text = content;
+        if (content) markFirstAssistantContent(sample, sentAt);
+        if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
+          const now = Date.now();
+          sample.first_response_at = new Date(now).toISOString();
+          sample.first_response_epoch_ms = now;
+          sample.first_response_ms = rounded(performance.now() - sentAt);
+        }
+        if (data.data.is_final === true) {
+          const ok = sample.response_text.includes(expected);
+          if (ok) {
+            if (sample.first_response_ms === null && sentAt > 0) {
+              sample.first_response_ms = rounded(performance.now() - sentAt);
+            }
+            finish("pass", "");
+          } else if (matchesFailureSignal(sample.response_text, failureSignals)) {
+            finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
+          } else if (failOnFinalMismatch && !containsLoadToken(sample.response_text, expectedPrefix)) {
+            finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
+          } else {
+            sample.foreign_response_count += 1;
+            sample.last_foreign_response_text = sample.response_text;
+          }
+        }
+      },
+      onError(error) {
+        finish("connection_error", `WebSocket connection error: ${error.message}`);
+      },
+      onClose(event) {
+        sample.close_code = event.code;
+        sample.close_reason = event.reason || "";
+        if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
+      },
+    });
+
+    function finish(status, reason) {
+      if (closed) return;
+      closed = true;
+      clearTimeout(timer);
+      sample.status = status;
+      sample.ok = status === "pass";
+      sample.error = status === "timeout" && sample.foreign_response_count > 0
+        ? `${reason || ""} Saw ${sample.foreign_response_count} foreign assistant response(s); last=${sample.last_foreign_response_text}`
+        : reason || "";
+      if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
+      else sample.response_duration_ms = rounded(performance.now() - startedAt);
+      const now = Date.now();
+      sample.finished_at = new Date(now).toISOString();
+      sample.finished_epoch_ms = now;
+      try {
+        client?.close();
+      } catch {
+        // Closing a failed socket should not hide the sample result.
+      }
+      resolve(sample);
+    }
+  });
+}
+
+function markFirstAssistantEvent(sample, sentAt) {
+  if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
+  const now = Date.now();
+  sample.first_assistant_event_at = new Date(now).toISOString();
+  sample.first_assistant_event_epoch_ms = now;
+  sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
+}
+
+function markFirstAssistantContent(sample, sentAt) {
+  if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
+  const now = Date.now();
+  sample.first_assistant_content_at = new Date(now).toISOString();
+  sample.first_assistant_content_epoch_ms = now;
+  sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
+}
+
+function containsLoadToken(text, prefix) {
+  const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
+}
+
+function matchesFailureSignal(text, signals) {
+  const lower = String(text || "").toLowerCase();
+  return signals.some((signal) => lower.includes(signal.toLowerCase()));
+}
+
+function openRawWebSocket(wsUrl, handlers) {
+  const parsed = new URL(wsUrl);
+  const secure = parsed.protocol === "wss:";
+  const port = Number(parsed.port || (secure ? 443 : 80));
+  const host = parsed.hostname;
+  const path = `${parsed.pathname}${parsed.search}`;
+  const key = crypto.randomBytes(16).toString("base64");
+  const socket = secure
+    ? tls.connect({ host, port, servername: host })
+    : net.connect({ host, port });
+  let opened = false;
+  let closed = false;
+  let buffer = Buffer.alloc(0);
+
+  socket.setNoDelay(true);
+  socket.on("connect", () => {
+    const originProtocol = secure ? "https" : "http";
+    const request = [
+      `GET ${path} HTTP/1.1`,
+      `Host: ${parsed.host}`,
+      "Upgrade: websocket",
+      "Connection: Upgrade",
+      `Sec-WebSocket-Key: ${key}`,
+      "Sec-WebSocket-Version: 13",
+      `Origin: ${originProtocol}://${parsed.host}`,
+      "",
+      "",
+    ].join("\r\n");
+    socket.write(request);
+  });
+  socket.on("data", (chunk) => {
+    buffer = Buffer.concat([buffer, chunk]);
+    if (!opened) {
+      const headerEnd = buffer.indexOf("\r\n\r\n");
+      if (headerEnd === -1) return;
+      const headerText = buffer.slice(0, headerEnd).toString("utf8");
+      buffer = buffer.slice(headerEnd + 4);
+      if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
+        handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
+        socket.destroy();
+        return;
+      }
+      opened = true;
+      handlers.onOpen();
+    }
+    processFrames();
+  });
+  socket.on("error", (error) => {
+    if (!closed) handlers.onError(error);
+  });
+  socket.on("close", () => {
+    if (closed) return;
+    closed = true;
+    handlers.onClose({ code: null, reason: "" });
+  });
+
+  function processFrames() {
+    while (true) {
+      const frame = readFrame(buffer);
+      if (!frame) return;
+      buffer = buffer.slice(frame.consumed);
+      if (frame.opcode === 0x1) {
+        handlers.onMessage(frame.payload.toString("utf8"));
+      } else if (frame.opcode === 0x8) {
+        const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
+        const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
+        closed = true;
+        handlers.onClose({ code, reason });
+        socket.end();
+        return;
+      } else if (frame.opcode === 0x9) {
+        writeFrame(socket, 0xA, frame.payload);
+      }
+    }
+  }
+
+  return {
+    send(text) {
+      if (closed || !opened) return;
+      writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
+    },
+    close() {
+      if (closed) return;
+      closed = true;
+      if (!socket.destroyed) {
+        if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
+        setTimeout(() => socket.end(), 50).unref();
+      }
+    },
+  };
+}
+
+function readFrame(buffer) {
+  if (buffer.length < 2) return null;
+  const first = buffer[0];
+  const second = buffer[1];
+  const opcode = first & 0x0f;
+  const masked = Boolean(second & 0x80);
+  let length = second & 0x7f;
+  let offset = 2;
+  if (length === 126) {
+    if (buffer.length < offset + 2) return null;
+    length = buffer.readUInt16BE(offset);
+    offset += 2;
+  } else if (length === 127) {
+    if (buffer.length < offset + 8) return null;
+    const high = buffer.readUInt32BE(offset);
+    const low = buffer.readUInt32BE(offset + 4);
+    length = high * 2 ** 32 + low;
+    offset += 8;
+  }
+  let mask = null;
+  if (masked) {
+    if (buffer.length < offset + 4) return null;
+    mask = buffer.slice(offset, offset + 4);
+    offset += 4;
+  }
+  if (buffer.length < offset + length) return null;
+  let payload = buffer.slice(offset, offset + length);
+  if (mask) {
+    payload = Buffer.from(payload);
+    for (let index = 0; index < payload.length; index += 1) {
+      payload[index] ^= mask[index % 4];
+    }
+  }
+  return {
+    opcode,
+    payload,
+    consumed: offset + length,
+  };
+}
+
+function writeFrame(socket, opcode, payload) {
+  const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
+  const mask = crypto.randomBytes(4);
+  const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
+  const header = Buffer.alloc(headerLength);
+  header[0] = 0x80 | opcode;
+  if (body.length < 126) {
+    header[1] = 0x80 | body.length;
+  } else if (body.length <= 0xffff) {
+    header[1] = 0x80 | 126;
+    header.writeUInt16BE(body.length, 2);
+  } else {
+    header[1] = 0x80 | 127;
+    header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
+    header.writeUInt32BE(body.length >>> 0, 6);
+  }
+  const masked = Buffer.from(body);
+  for (let index = 0; index < masked.length; index += 1) {
+    masked[index] ^= mask[index % 4];
+  }
+  socket.write(Buffer.concat([header, mask, masked]));
+}
+
+function rounded(value) {
+  return Number(value.toFixed(3));
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return rounded(sorted[index]);
+}
+
+function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: rounded(Math.min(...values)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: rounded(Math.max(...values)),
+  };
+}
+
+function buildMetrics({ samples, totalRequests, concurrency, timeoutMs, loadDurationMs, backendUrl, pipelineId, sessionType, fakeProviderState }) {
+  const okSamples = samples.filter((sample) => sample.ok);
+  const statusCounts = {};
+  for (const sample of samples) {
+    statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
+  }
+  const errorCount = samples.length - okSamples.length;
+  return {
+    probe: caseId,
+    backend_url: backendUrl,
+    pipeline_id: pipelineId,
+    session_type: sessionType,
+    total_requests: totalRequests,
+    completed_requests: samples.length,
+    concurrency,
+    timeout_ms: timeoutMs,
+    ok_count: okSamples.length,
+    error_count: errorCount,
+    timeout_count: samples.filter((sample) => sample.status === "timeout").length,
+    error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
+    load_duration_ms: rounded(loadDurationMs),
+    throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
+    status_counts: statusCounts,
+    connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
+    first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
+    first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
+    first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
+    response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
+    fake_provider: summarizeFakeProviderState(fakeProviderState),
+    provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
+    samples,
+  };
+}
+
+function buildThresholds(metrics) {
+  const thresholds = {
+    error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
+    response_p95_ms: {
+      actual: metrics.response_duration_ms.p95,
+      max: responseP95BudgetMs,
+      pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
+    },
+  };
+  if (minErrorRate > 0) {
+    thresholds.error_rate_min = {
+      actual: metrics.error_rate,
+      min: minErrorRate,
+      pass: metrics.error_rate >= minErrorRate,
+    };
+  }
+  if (minErrorCount > 0) {
+    thresholds.error_count_min = {
+      actual: metrics.error_count,
+      min: minErrorCount,
+      pass: metrics.error_count >= minErrorCount,
+    };
+  }
+  if (minOkCount > 0) {
+    thresholds.ok_count_min = {
+      actual: metrics.ok_count,
+      min: minOkCount,
+      pass: metrics.ok_count >= minOkCount,
+    };
+  }
+  if (minProviderFaultCount > 0) {
+    const actual = metrics.fake_provider?.fault_count ?? 0;
+    thresholds.fake_provider_fault_count_min = {
+      actual,
+      min: minProviderFaultCount,
+      pass: actual >= minProviderFaultCount,
+    };
+  }
+  if (firstResponseP95BudgetMs > 0) {
+    thresholds.first_response_p95_ms = {
+      actual: metrics.first_response_ms.p95,
+      max: firstResponseP95BudgetMs,
+      pass: metrics.ok_count > 0 && metrics.first_response_ms.p95 <= firstResponseP95BudgetMs,
+    };
+  }
+  return thresholds;
+}
+
+function looksLikeEnvIssue(error) {
+  const message = String(error?.message || error || "");
+  return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
+}
+
+function safeReason(value) {
+  return redact(String(value || "")).slice(0, 1000);
+}
@@ -0,0 +1,861 @@
+#!/usr/bin/env node
+
+import crypto from "node:crypto";
+import net from "node:net";
+import tls from "node:tls";
+import { mkdir, writeFile } from "node:fs/promises";
+import { resolve } from "node:path";
+import { env, exit } from "node:process";
+import {
+  apiJson,
+  appendLine,
+  ensureEvidence,
+  evidencePaths,
+  loadEnvFiles,
+  localIsoWithOffset,
+  redact,
+  resetAndAuthLocalUser,
+  writeResult,
+} from "../../../scripts/e2e/lib/langbot-e2e.mjs";
+import {
+  buildProviderTimingMetrics,
+  summarizeFakeProviderState,
+} from "./lib/fake-provider-timing.mjs";
+
+const DEFAULT_LOCAL_PASSWORD = "LangBotE2ELocalPass!2026";
+
+await loadEnvFiles();
+const caseId = env.LBS_CASE_ID || "langbot-debug-chat-cross-pipeline-isolation";
+const paths = evidencePaths(caseId);
+await ensureEvidence(paths);
+
+const startedAt = new Date();
+const metricsPath = resolve(paths.evidenceDir, "metrics.json");
+const samplesPath = resolve(paths.evidenceDir, "samples.json");
+const fakeProviderStatePath = resolve(paths.evidenceDir, "fake-provider-state.json");
+const resetDiagnosticPath = resolve(paths.evidenceDir, "debug-chat-reset-diagnostic.json");
+const backendUrl = env.LANGBOT_BACKEND_URL || "";
+const fakeProviderUrl = env.LANGBOT_FAKE_PROVIDER_URL || "";
+const sessionType = env.LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE || env.LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE || "person";
+const requestsPerPipeline = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_REQUESTS, 6);
+const concurrency = Math.min(requestsPerPipeline * 2, positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY, 4));
+const timeoutMs = positiveInteger(env.LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS, 30_000);
+const stream = bool(env.LANGBOT_DEBUG_CHAT_LOAD_STREAM, true);
+const resetBeforeRun = bool(env.LANGBOT_DEBUG_CHAT_LOAD_RESET, true);
+const responseP95BudgetMs = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS, 5_000);
+const maxErrorRate = positiveNumber(env.LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE, 0);
+const promptTemplate = env.LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE
+  || "请只回复 \"{expected}\"，不要解释，不要添加其他字符。";
+const failureSignals = textList(env.LANGBOT_E2E_FAILURE_SIGNALS || env.LANGBOT_DEBUG_CHAT_LOAD_FAILURE_SIGNALS || "");
+
+const pipelineTargets = [
+  {
+    label: "A",
+    expectedPrefix: "PIPEA",
+    otherPrefix: "PIPEB",
+    url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_URL || "",
+    name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_A_NAME || "",
+  },
+  {
+    label: "B",
+    expectedPrefix: "PIPEB",
+    otherPrefix: "PIPEA",
+    url: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_URL || "",
+    name: env.LANGBOT_FAKE_PROVIDER_PIPELINE_B_NAME || "",
+  },
+];
+
+const result = {
+  source: "automation",
+  case_id: caseId,
+  run_id: paths.runId,
+  status: "fail",
+  reason: "",
+  started_at: startedAt.toISOString(),
+  started_at_local: localIsoWithOffset(startedAt),
+  finished_at: "",
+  finished_at_local: "",
+  duration_ms: 0,
+  backend_url: backendUrl,
+  session_type: sessionType,
+  pipelines: [],
+  load_profile: {
+    requests_per_pipeline: requestsPerPipeline,
+    total_requests: requestsPerPipeline * 2,
+    concurrency,
+    timeout_ms: timeoutMs,
+    stream,
+    reset_before_run: resetBeforeRun,
+  },
+  evidence: {
+    network_log: paths.networkLog,
+    metrics_json: metricsPath,
+    samples_json: samplesPath,
+    fake_provider_state_json: fakeProviderStatePath,
+    debug_chat_reset_diagnostic_json: resetDiagnosticPath,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  },
+  evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
+};
+
+try {
+  if (!backendUrl) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_BACKEND_URL is not configured.");
+  }
+  if (!["person", "group"].includes(sessionType)) {
+    throw new Error(`LANGBOT_DEBUG_CHAT_LOAD_SESSION_TYPE must be person or group, got ${sessionType}.`);
+  }
+  for (const target of pipelineTargets) {
+    if (!target.url && !target.name) {
+      result.status = "env_issue";
+      throw new Error(`Set LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_URL or LANGBOT_FAKE_PROVIDER_PIPELINE_${target.label}_NAME.`);
+    }
+  }
+
+  const backendReady = await backendReachable(backendUrl);
+  if (!backendReady) {
+    result.status = "env_issue";
+    throw new Error(`Backend did not respond at ${backendUrl}.`);
+  }
+
+  const user = env.LANGBOT_E2E_LOGIN_USER || "";
+  const password = env.LANGBOT_E2E_LOGIN_PASSWORD || DEFAULT_LOCAL_PASSWORD;
+  if (!user) {
+    result.status = "env_issue";
+    throw new Error("LANGBOT_E2E_LOGIN_USER is required so this probe can resolve/reset Debug Chat sessions.");
+  }
+  const auth = await resetAndAuthLocalUser({ backendUrl, user, password });
+  const pipelines = [];
+  for (const target of pipelineTargets) {
+    const pipeline = await resolvePipeline({
+      backendUrl,
+      token: auth.token,
+      pipelineUrl: target.url,
+      pipelineName: target.name,
+    });
+    pipelines.push({
+      ...target,
+      id: pipeline.id,
+      name: pipeline.name || target.name,
+      wsUrl: websocketUrl(backendUrl, pipeline.id, sessionType),
+    });
+  }
+  result.pipelines = pipelines.map((pipeline) => ({
+    label: pipeline.label,
+    id: pipeline.id,
+    name: pipeline.name,
+    url: pipeline.url,
+  }));
+
+  if (resetBeforeRun) {
+    const resetDiagnostics = [];
+    for (const pipeline of pipelines) {
+      const reset = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(pipeline.id)}/ws/reset/${encodeURIComponent(sessionType)}`, {
+        method: "POST",
+        token: auth.token,
+      });
+      resetDiagnostics.push({
+        pipeline_label: pipeline.label,
+        pipeline_id: pipeline.id,
+        status: isApiFailure(reset) ? "fail" : "ready",
+        http_status: reset.status,
+        code: reset.json.code ?? null,
+        reason: isApiFailure(reset) ? reset.json.msg || "Debug Chat reset failed." : "Debug Chat session reset.",
+      });
+    }
+    await writeFile(resetDiagnosticPath, `${JSON.stringify(resetDiagnostics, null, 2)}\n`, "utf8");
+    const failedReset = resetDiagnostics.find((item) => item.status === "fail");
+    if (failedReset) throw new Error(failedReset.reason);
+  }
+  await resetFakeProvider(fakeProviderUrl);
+
+  const jobs = [];
+  for (let index = 0; index < requestsPerPipeline; index += 1) {
+    for (const pipeline of pipelines) {
+      jobs.push({ ...pipeline, index });
+    }
+  }
+
+  const loadStartedAt = performance.now();
+  const samples = await runLoad({
+    jobs,
+    concurrency,
+    timeoutMs,
+    promptTemplate,
+    stream,
+    failureSignals,
+  });
+  const loadDurationMs = performance.now() - loadStartedAt;
+  const fakeProviderState = await readFakeProviderState(fakeProviderUrl);
+  if (fakeProviderState) {
+    await writeFile(fakeProviderStatePath, `${JSON.stringify(fakeProviderState, null, 2)}\n`, "utf8");
+  }
+  const metrics = buildMetrics({
+    samples,
+    requestsPerPipeline,
+    concurrency,
+    timeoutMs,
+    loadDurationMs,
+    backendUrl,
+    sessionType,
+    fakeProviderState,
+  });
+  const thresholds = buildThresholds(metrics);
+  const passed = Object.values(thresholds).every((item) => item.pass);
+  result.status = passed ? "pass" : "fail";
+  result.reason = passed
+    ? "Debug Chat cross-pipeline isolation probe passed all thresholds."
+    : "Debug Chat cross-pipeline isolation probe found leaks, errors, or latency threshold breaches.";
+  result.metrics_summary = {
+    requests_per_pipeline: metrics.requests_per_pipeline,
+    total_requests: metrics.total_requests,
+    concurrency: metrics.concurrency,
+    ok_count: metrics.ok_count,
+    error_count: metrics.error_count,
+    cross_pipeline_leak_count: metrics.cross_pipeline_leak_count,
+    timeout_count: metrics.timeout_count,
+    error_rate: metrics.error_rate,
+    response_p95_ms: metrics.response_duration_ms.p95,
+    first_response_p95_ms: metrics.first_response_ms.p95,
+    throughput_rps: metrics.throughput_rps,
+    status_counts: metrics.status_counts,
+    by_pipeline: metrics.by_pipeline,
+    fake_provider_request_count: metrics.fake_provider?.request_count ?? null,
+    fake_provider_duration_p95_ms: metrics.provider_timing?.provider_duration_ms.p95 ?? null,
+    langbot_overhead_estimate_p95_ms: metrics.provider_timing?.langbot_overhead_estimate_ms.p95 ?? null,
+    send_to_provider_start_p95_ms: metrics.provider_timing?.send_to_provider_start_ms.p95 ?? null,
+    provider_finish_to_ws_final_p95_ms: metrics.provider_timing?.provider_finish_to_ws_final_ms.p95 ?? null,
+  };
+  result.thresholds_summary = thresholds;
+  result.artifacts = {
+    metrics_json: metricsPath,
+    samples_json: samplesPath,
+    fake_provider_state_json: fakeProviderState ? fakeProviderStatePath : "",
+    network_log: paths.networkLog,
+    automation_result_json: paths.automationResultJson,
+    result_json: paths.resultJson,
+  };
+
+  await writeFile(metricsPath, `${JSON.stringify({ ...metrics, thresholds }, null, 2)}\n`, "utf8");
+  await writeFile(samplesPath, `${JSON.stringify(samples, null, 2)}\n`, "utf8");
+} catch (error) {
+  if (!["env_issue", "blocked"].includes(result.status)) {
+    result.status = looksLikeEnvIssue(error) ? "env_issue" : "fail";
+  }
+  result.reason = result.reason || safeReason(error.message);
+} finally {
+  const finishedAt = new Date();
+  result.finished_at = finishedAt.toISOString();
+  result.finished_at_local = localIsoWithOffset(finishedAt);
+  result.duration_ms = finishedAt.getTime() - startedAt.getTime();
+  await mkdir(paths.evidenceDir, { recursive: true });
+  await writeResult(paths, result);
+  console.log(JSON.stringify(result, null, 2));
+}
+
+exit(result.status === "pass" ? 0 : result.status === "env_issue" || result.status === "blocked" ? 2 : 1);
+
+async function backendReachable(baseUrl) {
+  try {
+    const response = await fetch(`${baseUrl.replace(/\/$/, "")}/healthz`, {
+      signal: AbortSignal.timeout(3000),
+    });
+    return response.status < 500;
+  } catch {
+    return false;
+  }
+}
+
+async function resetFakeProvider(rootUrl) {
+  if (!rootUrl) return;
+  try {
+    await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/reset`, {
+      method: "POST",
+      signal: AbortSignal.timeout(3000),
+    });
+  } catch {
+    // Missing fake-provider diagnostics should not hide the isolation result.
+  }
+}
+
+async function readFakeProviderState(rootUrl) {
+  if (!rootUrl) return null;
+  try {
+    const response = await fetch(`${normalizeProviderRootUrl(rootUrl)}/__qa/config`, {
+      signal: AbortSignal.timeout(3000),
+    });
+    const json = await response.json().catch(() => ({}));
+    return {
+      status: response.ok && json.ok === true ? "loaded" : "unavailable",
+      url: normalizeProviderRootUrl(rootUrl),
+      http_status: response.status,
+      model: json.model || "",
+      config: json.config || {},
+      request_count: Number.isFinite(json.request_count) ? json.request_count : null,
+      recent_requests: Array.isArray(json.recent_requests) ? json.recent_requests : [],
+    };
+  } catch (error) {
+    return {
+      status: "unavailable",
+      url: normalizeProviderRootUrl(rootUrl),
+      reason: safeReason(error.message),
+      request_count: null,
+      recent_requests: [],
+    };
+  }
+}
+
+function normalizeProviderRootUrl(value) {
+  const trimmed = String(value || "").trim().replace(/\/$/, "");
+  return trimmed.endsWith("/v1") ? trimmed.slice(0, -3) : trimmed;
+}
+
+function pipelineIdFromUrl(url) {
+  if (!url) return "";
+  try {
+    const parsed = new URL(url);
+    return parsed.searchParams.get("id") || "";
+  } catch {
+    return "";
+  }
+}
+
+async function resolvePipeline({ backendUrl, token, pipelineUrl, pipelineName }) {
+  const idFromUrl = pipelineIdFromUrl(pipelineUrl);
+  if (idFromUrl) {
+    const response = await apiJson(backendUrl, `/api/v1/pipelines/${encodeURIComponent(idFromUrl)}`, { token });
+    const pipeline = response.json.data?.pipeline;
+    if (isApiFailure(response) || !pipeline?.uuid) {
+      throw new Error(response.json.msg || `Could not load pipeline ${idFromUrl}.`);
+    }
+    return { id: pipeline.uuid, name: pipeline.name || "" };
+  }
+  if (!pipelineName) {
+    throw new Error("Set pipeline URL or name before running this probe.");
+  }
+  const response = await apiJson(backendUrl, "/api/v1/pipelines", { token });
+  if (isApiFailure(response)) {
+    throw new Error(response.json.msg || "Failed to list pipelines.");
+  }
+  const pipeline = (response.json.data?.pipelines || []).find((item) => item.name === pipelineName);
+  if (!pipeline?.uuid) {
+    throw new Error(`Could not find pipeline named ${pipelineName}.`);
+  }
+  return { id: pipeline.uuid, name: pipeline.name || pipelineName };
+}
+
+function isApiFailure(response) {
+  return response.status >= 400 || (response.json.code !== undefined && response.json.code !== 0);
+}
+
+function websocketUrl(baseUrl, pipelineId, sessionTypeValue) {
+  const parsed = new URL(baseUrl);
+  parsed.protocol = parsed.protocol === "https:" ? "wss:" : "ws:";
+  parsed.pathname = `/api/v1/pipelines/${encodeURIComponent(pipelineId)}/ws/connect`;
+  parsed.search = `?session_type=${encodeURIComponent(sessionTypeValue)}`;
+  return parsed.toString();
+}
+
+async function runLoad(options) {
+  const samples = [];
+  const queue = [...options.jobs];
+  const workers = Array.from({ length: options.concurrency }, async () => {
+    while (queue.length > 0) {
+      const job = queue.shift();
+      if (!job) continue;
+      const sample = await runSingleRequest({ ...options, job });
+      samples.push(sample);
+    }
+  });
+  await Promise.all(workers);
+  return samples.sort((left, right) => (
+    left.pipeline_label.localeCompare(right.pipeline_label) || left.index - right.index
+  ));
+}
+
+function expectedForIndex(prefix, index) {
+  return `${prefix}-${String(index + 1).padStart(4, "0")}`;
+}
+
+function promptForIndex(template, expected) {
+  return template.replaceAll("{expected}", expected);
+}
+
+function runSingleRequest({
+  job,
+  timeoutMs,
+  promptTemplate,
+  stream,
+  failureSignals,
+}) {
+  return new Promise((resolvePromise) => {
+    const expected = expectedForIndex(job.expectedPrefix, job.index);
+    const prompt = promptForIndex(promptTemplate, expected);
+    const sample = {
+      index: job.index,
+      pipeline_label: job.label,
+      pipeline_id: job.id,
+      pipeline_name: job.name,
+      status: "running",
+      ok: false,
+      expected_text: expected,
+      expected_prefix: job.expectedPrefix,
+      other_prefix: job.otherPrefix,
+      prompt,
+      response_text: "",
+      started_at: new Date().toISOString(),
+      started_epoch_ms: Date.now(),
+      connected_at: null,
+      connected_epoch_ms: null,
+      sent_at: null,
+      sent_epoch_ms: null,
+      first_assistant_event_at: null,
+      first_assistant_event_epoch_ms: null,
+      first_assistant_event_ms: null,
+      first_assistant_content_at: null,
+      first_assistant_content_epoch_ms: null,
+      first_assistant_content_ms: null,
+      first_response_at: null,
+      first_response_epoch_ms: null,
+      connected_ms: null,
+      first_response_ms: null,
+      response_duration_ms: null,
+      finished_at: null,
+      finished_epoch_ms: null,
+      event_count: 0,
+      same_pipeline_foreign_response_count: 0,
+      cross_pipeline_leak_count: 0,
+      last_foreign_response_text: "",
+      error: "",
+      close_code: null,
+      close_reason: "",
+    };
+    let closed = false;
+    let connectedAt = 0;
+    let sentAt = 0;
+    const startedPerf = performance.now();
+    let client = null;
+    const timer = setTimeout(() => {
+      finish("timeout", `Timed out after ${timeoutMs} ms.`);
+    }, timeoutMs);
+
+    client = openRawWebSocket(job.wsUrl, {
+      onOpen() {
+        connectedAt = performance.now();
+        const now = Date.now();
+        sample.connected_at = new Date(now).toISOString();
+        sample.connected_epoch_ms = now;
+        sample.connected_ms = rounded(connectedAt - startedPerf);
+      },
+      onMessage(text) {
+        sample.event_count += 1;
+        let data;
+        try {
+          data = JSON.parse(String(text || ""));
+        } catch (error) {
+          finish("error", `Invalid WebSocket JSON: ${error.message}`);
+          return;
+        }
+        appendLine(paths.networkLog, JSON.stringify({
+          pipeline_label: job.label,
+          request_index: job.index,
+          type: data.type,
+          session_type: data.session_type || "",
+          role: data.data?.role || "",
+          is_final: data.data?.is_final ?? null,
+          content_preview: redact(String(data.data?.content || data.message || "").slice(0, 200)),
+        })).catch(() => {});
+
+        if (data.type === "connected") {
+          sentAt = performance.now();
+          const now = Date.now();
+          sample.sent_at = new Date(now).toISOString();
+          sample.sent_epoch_ms = now;
+          client.send(JSON.stringify({
+            type: "message",
+            message: [{ type: "Plain", text: prompt }],
+            stream,
+          }));
+          return;
+        }
+        if (data.type === "error") {
+          finish("error", data.message || "WebSocket error message.");
+          return;
+        }
+        if (data.type !== "response" || data.data?.role !== "assistant") return;
+
+        const content = String(data.data.content || "");
+        markFirstAssistantEvent(sample, sentAt);
+        if (content) sample.response_text = content;
+        if (content) markFirstAssistantContent(sample, sentAt);
+        if (containsPipelineToken(content, job.otherPrefix)) {
+          sample.cross_pipeline_leak_count += 1;
+          finish("cross_pipeline_leak", `Pipeline ${job.label} received response from ${job.otherPrefix}: ${content}`);
+          return;
+        }
+        if (content.includes(expected) && sample.first_response_ms === null && sentAt > 0) {
+          const now = Date.now();
+          sample.first_response_at = new Date(now).toISOString();
+          sample.first_response_epoch_ms = now;
+          sample.first_response_ms = rounded(performance.now() - sentAt);
+        }
+        if (data.data.is_final === true) {
+          const ok = sample.response_text.includes(expected);
+          if (ok) {
+            if (sample.first_response_ms === null && sentAt > 0) {
+              const now = Date.now();
+              sample.first_response_at = new Date(now).toISOString();
+              sample.first_response_epoch_ms = now;
+              sample.first_response_ms = rounded(performance.now() - sentAt);
+            }
+            finish("pass", "");
+          } else if (matchesFailureSignal(sample.response_text, failureSignals)) {
+            finish("app_error", `Assistant final response matched a failure signal: ${sample.response_text}`);
+          } else if (containsPipelineToken(sample.response_text, job.expectedPrefix)) {
+            sample.same_pipeline_foreign_response_count += 1;
+            sample.last_foreign_response_text = sample.response_text;
+          } else {
+            finish("mismatch", `Final assistant response did not include ${expected}: ${sample.response_text}`);
+          }
+        }
+      },
+      onError(error) {
+        finish("connection_error", `WebSocket connection error: ${error.message}`);
+      },
+      onClose(event) {
+        sample.close_code = event.code;
+        sample.close_reason = event.reason || "";
+        if (!closed) finish("closed", `WebSocket closed before final assistant response: ${event.code}`);
+      },
+    });
+
+    function finish(status, reason) {
+      if (closed) return;
+      closed = true;
+      clearTimeout(timer);
+      sample.status = status;
+      sample.ok = status === "pass";
+      sample.error = status === "timeout" && sample.same_pipeline_foreign_response_count > 0
+        ? `${reason || ""} Saw ${sample.same_pipeline_foreign_response_count} same-pipeline foreign assistant response(s); last=${sample.last_foreign_response_text}`
+        : reason || "";
+      if (sentAt > 0) sample.response_duration_ms = rounded(performance.now() - sentAt);
+      else sample.response_duration_ms = rounded(performance.now() - startedPerf);
+      const now = Date.now();
+      sample.finished_at = new Date(now).toISOString();
+      sample.finished_epoch_ms = now;
+      try {
+        client?.close();
+      } catch {
+        // Closing a failed socket should not hide the sample result.
+      }
+      resolvePromise(sample);
+    }
+  });
+}
+
+function markFirstAssistantEvent(sample, sentAt) {
+  if (sample.first_assistant_event_ms !== null || sentAt <= 0) return;
+  const now = Date.now();
+  sample.first_assistant_event_at = new Date(now).toISOString();
+  sample.first_assistant_event_epoch_ms = now;
+  sample.first_assistant_event_ms = rounded(performance.now() - sentAt);
+}
+
+function markFirstAssistantContent(sample, sentAt) {
+  if (sample.first_assistant_content_ms !== null || sentAt <= 0) return;
+  const now = Date.now();
+  sample.first_assistant_content_at = new Date(now).toISOString();
+  sample.first_assistant_content_epoch_ms = now;
+  sample.first_assistant_content_ms = rounded(performance.now() - sentAt);
+}
+
+function containsPipelineToken(text, prefix) {
+  const escaped = String(prefix).replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  return new RegExp(`${escaped}-\\d{4}`).test(String(text || ""));
+}
+
+function matchesFailureSignal(text, signals) {
+  const lower = String(text || "").toLowerCase();
+  return signals.some((signal) => lower.includes(signal.toLowerCase()));
+}
+
+function openRawWebSocket(wsUrl, handlers) {
+  const parsed = new URL(wsUrl);
+  const secure = parsed.protocol === "wss:";
+  const port = Number(parsed.port || (secure ? 443 : 80));
+  const host = parsed.hostname;
+  const path = `${parsed.pathname}${parsed.search}`;
+  const key = crypto.randomBytes(16).toString("base64");
+  const socket = secure
+    ? tls.connect({ host, port, servername: host })
+    : net.connect({ host, port });
+  let opened = false;
+  let closed = false;
+  let buffer = Buffer.alloc(0);
+
+  socket.setNoDelay(true);
+  socket.on("connect", () => {
+    const originProtocol = secure ? "https" : "http";
+    const request = [
+      `GET ${path} HTTP/1.1`,
+      `Host: ${parsed.host}`,
+      "Upgrade: websocket",
+      "Connection: Upgrade",
+      `Sec-WebSocket-Key: ${key}`,
+      "Sec-WebSocket-Version: 13",
+      `Origin: ${originProtocol}://${parsed.host}`,
+      "",
+      "",
+    ].join("\r\n");
+    socket.write(request);
+  });
+  socket.on("data", (chunk) => {
+    buffer = Buffer.concat([buffer, chunk]);
+    if (!opened) {
+      const headerEnd = buffer.indexOf("\r\n\r\n");
+      if (headerEnd === -1) return;
+      const headerText = buffer.slice(0, headerEnd).toString("utf8");
+      buffer = buffer.slice(headerEnd + 4);
+      if (!/^HTTP\/1\.1 101\b/i.test(headerText)) {
+        handlers.onError(new Error(`Handshake failed: ${headerText.split("\r\n")[0] || "missing status"}`));
+        socket.destroy();
+        return;
+      }
+      opened = true;
+      handlers.onOpen();
+    }
+    processFrames();
+  });
+  socket.on("error", (error) => {
+    if (!closed) handlers.onError(error);
+  });
+  socket.on("close", () => {
+    if (closed) return;
+    closed = true;
+    handlers.onClose({ code: null, reason: "" });
+  });
+
+  function processFrames() {
+    while (true) {
+      const frame = readFrame(buffer);
+      if (!frame) return;
+      buffer = buffer.slice(frame.consumed);
+      if (frame.opcode === 0x1) {
+        handlers.onMessage(frame.payload.toString("utf8"));
+      } else if (frame.opcode === 0x8) {
+        const code = frame.payload.length >= 2 ? frame.payload.readUInt16BE(0) : null;
+        const reason = frame.payload.length > 2 ? frame.payload.slice(2).toString("utf8") : "";
+        closed = true;
+        handlers.onClose({ code, reason });
+        socket.end();
+        return;
+      } else if (frame.opcode === 0x9) {
+        writeFrame(socket, 0xA, frame.payload);
+      }
+    }
+  }
+
+  return {
+    send(text) {
+      if (closed || !opened) return;
+      writeFrame(socket, 0x1, Buffer.from(text, "utf8"));
+    },
+    close() {
+      if (closed) return;
+      closed = true;
+      if (!socket.destroyed) {
+        if (opened) writeFrame(socket, 0x8, Buffer.alloc(0));
+        setTimeout(() => socket.end(), 50).unref();
+      }
+    },
+  };
+}
+
+function readFrame(buffer) {
+  if (buffer.length < 2) return null;
+  const first = buffer[0];
+  const second = buffer[1];
+  const opcode = first & 0x0f;
+  const masked = Boolean(second & 0x80);
+  let length = second & 0x7f;
+  let offset = 2;
+  if (length === 126) {
+    if (buffer.length < offset + 2) return null;
+    length = buffer.readUInt16BE(offset);
+    offset += 2;
+  } else if (length === 127) {
+    if (buffer.length < offset + 8) return null;
+    const high = buffer.readUInt32BE(offset);
+    const low = buffer.readUInt32BE(offset + 4);
+    length = high * 2 ** 32 + low;
+    offset += 8;
+  }
+  let mask = null;
+  if (masked) {
+    if (buffer.length < offset + 4) return null;
+    mask = buffer.slice(offset, offset + 4);
+    offset += 4;
+  }
+  if (buffer.length < offset + length) return null;
+  let payload = buffer.slice(offset, offset + length);
+  if (mask) {
+    payload = Buffer.from(payload);
+    for (let index = 0; index < payload.length; index += 1) {
+      payload[index] ^= mask[index % 4];
+    }
+  }
+  return {
+    opcode,
+    payload,
+    consumed: offset + length,
+  };
+}
+
+function writeFrame(socket, opcode, payload) {
+  const body = Buffer.isBuffer(payload) ? payload : Buffer.from(payload || "");
+  const mask = crypto.randomBytes(4);
+  const headerLength = body.length < 126 ? 2 : body.length <= 0xffff ? 4 : 10;
+  const header = Buffer.alloc(headerLength);
+  header[0] = 0x80 | opcode;
+  if (body.length < 126) {
+    header[1] = 0x80 | body.length;
+  } else if (body.length <= 0xffff) {
+    header[1] = 0x80 | 126;
+    header.writeUInt16BE(body.length, 2);
+  } else {
+    header[1] = 0x80 | 127;
+    header.writeUInt32BE(Math.floor(body.length / 2 ** 32), 2);
+    header.writeUInt32BE(body.length >>> 0, 6);
+  }
+  const masked = Buffer.from(body);
+  for (let index = 0; index < masked.length; index += 1) {
+    masked[index] ^= mask[index % 4];
+  }
+  socket.write(Buffer.concat([header, mask, masked]));
+}
+
+function buildMetrics({ samples, requestsPerPipeline, concurrency, timeoutMs, loadDurationMs, backendUrl, sessionType, fakeProviderState }) {
+  const okSamples = samples.filter((sample) => sample.ok);
+  const statusCounts = {};
+  const byPipeline = {};
+  for (const sample of samples) {
+    statusCounts[sample.status] = (statusCounts[sample.status] || 0) + 1;
+    if (!byPipeline[sample.pipeline_label]) {
+      byPipeline[sample.pipeline_label] = {
+        ok_count: 0,
+        error_count: 0,
+        cross_pipeline_leak_count: 0,
+        timeout_count: 0,
+      };
+    }
+    if (sample.ok) byPipeline[sample.pipeline_label].ok_count += 1;
+    else byPipeline[sample.pipeline_label].error_count += 1;
+    byPipeline[sample.pipeline_label].cross_pipeline_leak_count += sample.cross_pipeline_leak_count || 0;
+    if (sample.status === "timeout") byPipeline[sample.pipeline_label].timeout_count += 1;
+  }
+  const errorCount = samples.length - okSamples.length;
+  return {
+    probe: caseId,
+    backend_url: backendUrl,
+    session_type: sessionType,
+    requests_per_pipeline: requestsPerPipeline,
+    total_requests: requestsPerPipeline * 2,
+    completed_requests: samples.length,
+    concurrency,
+    timeout_ms: timeoutMs,
+    ok_count: okSamples.length,
+    error_count: errorCount,
+    timeout_count: samples.filter((sample) => sample.status === "timeout").length,
+    cross_pipeline_leak_count: samples.reduce((count, sample) => count + (sample.cross_pipeline_leak_count || 0), 0),
+    error_rate: samples.length === 0 ? 1 : rounded(errorCount / samples.length),
+    load_duration_ms: rounded(loadDurationMs),
+    throughput_rps: loadDurationMs <= 0 ? 0 : rounded(okSamples.length / (loadDurationMs / 1000)),
+    status_counts: statusCounts,
+    by_pipeline: byPipeline,
+    connected_ms: stats(samples.map((sample) => sample.connected_ms).filter(Number.isFinite)),
+    first_assistant_event_ms: stats(samples.map((sample) => sample.first_assistant_event_ms).filter(Number.isFinite)),
+    first_assistant_content_ms: stats(samples.map((sample) => sample.first_assistant_content_ms).filter(Number.isFinite)),
+    first_response_ms: stats(okSamples.map((sample) => sample.first_response_ms).filter(Number.isFinite)),
+    response_duration_ms: stats(okSamples.map((sample) => sample.response_duration_ms).filter(Number.isFinite)),
+    fake_provider: summarizeFakeProviderState(fakeProviderState),
+    provider_timing: buildProviderTimingMetrics(samples, fakeProviderState),
+    samples,
+  };
+}
+
+function buildThresholds(metrics) {
+  return {
+    cross_pipeline_leak_count: {
+      actual: metrics.cross_pipeline_leak_count,
+      max: 0,
+      pass: metrics.cross_pipeline_leak_count === 0,
+    },
+    error_rate: {
+      actual: metrics.error_rate,
+      max: maxErrorRate,
+      pass: metrics.error_rate <= maxErrorRate,
+    },
+    response_p95_ms: {
+      actual: metrics.response_duration_ms.p95,
+      max: responseP95BudgetMs,
+      pass: metrics.ok_count > 0 && metrics.response_duration_ms.p95 <= responseP95BudgetMs,
+    },
+  };
+}
+
+function positiveInteger(value, fallback) {
+  const parsed = Number.parseInt(String(value || ""), 10);
+  return Number.isInteger(parsed) && parsed > 0 ? parsed : fallback;
+}
+
+function positiveNumber(value, fallback) {
+  const parsed = Number(value || "");
+  return Number.isFinite(parsed) && parsed >= 0 ? parsed : fallback;
+}
+
+function bool(value, fallback) {
+  if (value === undefined || value === "") return fallback;
+  if (/^(1|true|yes|on)$/i.test(String(value))) return true;
+  if (/^(0|false|no|off)$/i.test(String(value))) return false;
+  return fallback;
+}
+
+function textList(value) {
+  return String(value || "")
+    .split(/\r?\n|,/)
+    .map((item) => item.trim())
+    .filter(Boolean);
+}
+
+function rounded(value) {
+  return Number(value.toFixed(3));
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return rounded(sorted[index]);
+}
+
+function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: rounded(Math.min(...values)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: rounded(Math.max(...values)),
+  };
+}
+
+function looksLikeEnvIssue(error) {
+  const message = String(error?.message || error || "");
+  return /fetch failed|ECONNREFUSED|ENOTFOUND|LANGBOT_.*not configured|Could not read recovery_key|Backend did not respond/i.test(message);
+}
+
+function safeReason(value) {
+  return redact(String(value || "")).slice(0, 1000);
+}
@@ -0,0 +1,159 @@
+#!/usr/bin/env node
+
+import { mkdir, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+function pad(value, size = 2) {
+  return String(value).padStart(size, "0");
+}
+
+function localIsoWithOffset(date = new Date()) {
+  const offsetMinutes = -date.getTimezoneOffset();
+  const sign = offsetMinutes >= 0 ? "+" : "-";
+  const absolute = Math.abs(offsetMinutes);
+  return [
+    `${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
+    `T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
+    `${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
+  ].join("");
+}
+
+function timestampSlug(date = new Date()) {
+  return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
+}
+
+const scenarios = [
+  {
+    id: "provider-timeout",
+    target: "provider",
+    injected_fault: "fake provider request exceeds the configured timeout",
+    expected_status: "env_issue",
+    recovery_check: "provider route is reachable or the case remains outside product pass/fail",
+    cleanup: "stop fake provider or reset proxy route",
+  },
+  {
+    id: "plugin-runtime-disconnect",
+    target: "plugin-runtime",
+    injected_fault: "runtime control channel disconnects during an action",
+    expected_status: "fail",
+    recovery_check: "runtime reconnects and a deterministic plugin action succeeds",
+    cleanup: "restart the local plugin runtime process",
+  },
+  {
+    id: "mcp-stdio-server-exit",
+    target: "mcp",
+    injected_fault: "stdio server exits mid-call",
+    expected_status: "fail",
+    recovery_check: "server can be registered again and exposes the expected tool",
+    cleanup: "remove temporary MCP server registration",
+  },
+  {
+    id: "operator-missing-login",
+    target: "webui",
+    injected_fault: "browser profile is not authenticated",
+    expected_status: "blocked",
+    recovery_check: "authenticated profile can open the same WebUI origin",
+    cleanup: "no product cleanup; refresh local login state",
+  },
+  {
+    id: "transient-marketplace-timeout",
+    target: "marketplace",
+    injected_fault: "marketplace request times out once and then succeeds",
+    expected_status: "flaky",
+    recovery_check: "rerun passes with the same product revision and no code change",
+    cleanup: "clear retry-only evidence and keep the run classified as flaky",
+  },
+];
+
+function validateScenario(scenario) {
+  const missing = ["id", "target", "injected_fault", "expected_status", "recovery_check", "cleanup"]
+    .filter((key) => !scenario[key]);
+  const allowedStatuses = new Set(["pass", "fail", "blocked", "env_issue", "flaky"]);
+  return {
+    id: scenario.id,
+    pass: missing.length === 0 && allowedStatuses.has(scenario.expected_status),
+    missing,
+    expected_status: scenario.expected_status,
+  };
+}
+
+async function main() {
+  const root = resolve(env.LBS_ROOT || process.cwd());
+  const caseId = "langbot-fault-taxonomy-contract";
+  const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
+  const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
+  await mkdir(evidenceDir, { recursive: true });
+
+  const startedAt = new Date();
+  const validations = scenarios.map(validateScenario);
+  const statusCounts = {};
+  for (const scenario of scenarios) {
+    statusCounts[scenario.expected_status] = (statusCounts[scenario.expected_status] || 0) + 1;
+  }
+  const metrics = {
+    probe: caseId,
+    scenario_count: scenarios.length,
+    status_counts: statusCounts,
+    scenarios,
+    validations,
+  };
+  const thresholds = {
+    scenario_count: { actual: scenarios.length, min: 5, pass: scenarios.length >= 5 },
+    invalid_scenario_count: {
+      actual: validations.filter((item) => !item.pass).length,
+      max: 0,
+      pass: validations.every((item) => item.pass),
+    },
+    cleanup_declared_count: {
+      actual: scenarios.filter((item) => item.cleanup).length,
+      min: scenarios.length,
+      pass: scenarios.every((item) => item.cleanup),
+    },
+  };
+  const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
+  const metricsPath = join(evidenceDir, "metrics.json");
+  const faultModelPath = join(evidenceDir, "fault-model.json");
+  const automationResultPath = join(evidenceDir, "automation-result.json");
+  const resultPath = join(evidenceDir, "result.json");
+
+  await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
+  await writeFile(faultModelPath, `${JSON.stringify({ scenarios }, null, 2)}\n`, "utf8");
+
+  const finishedAt = new Date();
+  const result = {
+    source: "automation",
+    case_id: caseId,
+    run_id: runId,
+    status,
+    reason: status === "pass"
+      ? "Fault taxonomy contract declares status, recovery, and cleanup for every scenario."
+      : "Fault taxonomy contract is missing required scenario fields.",
+    started_at: startedAt.toISOString(),
+    started_at_local: localIsoWithOffset(startedAt),
+    finished_at: finishedAt.toISOString(),
+    finished_at_local: localIsoWithOffset(finishedAt),
+    duration_ms: finishedAt.getTime() - startedAt.getTime(),
+    metrics_summary: {
+      scenario_count: metrics.scenario_count,
+      status_counts: metrics.status_counts,
+      invalid_scenario_count: thresholds.invalid_scenario_count.actual,
+    },
+    thresholds_summary: thresholds,
+    artifacts: {
+      metrics_json: metricsPath,
+      fault_model_json: faultModelPath,
+      automation_result_json: automationResultPath,
+      result_json: resultPath,
+    },
+    evidence_collected: ["metrics", "filesystem"],
+  };
+
+  const resultText = `${JSON.stringify(result, null, 2)}\n`;
+  await writeFile(automationResultPath, resultText, "utf8");
+  await writeFile(resultPath, resultText, "utf8");
+  console.log(JSON.stringify(result, null, 2));
+  exit(status === "pass" ? 0 : 1);
+}
+
+await main();
@@ -0,0 +1,212 @@
+#!/usr/bin/env node
+
+import { mkdir, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+function pad(value, size = 2) {
+  return String(value).padStart(size, "0");
+}
+
+function localIsoWithOffset(date = new Date()) {
+  const offsetMinutes = -date.getTimezoneOffset();
+  const sign = offsetMinutes >= 0 ? "+" : "-";
+  const absolute = Math.abs(offsetMinutes);
+  return [
+    `${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
+    `T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
+    `${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
+  ].join("");
+}
+
+function timestampSlug(date = new Date()) {
+  return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return Number(sorted[index].toFixed(3));
+}
+
+function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: Number(Math.min(...values).toFixed(3)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: Number(Math.max(...values).toFixed(3)),
+  };
+}
+
+function parseJsonList(value, fallback) {
+  if (!value) return fallback;
+  try {
+    const parsed = JSON.parse(value);
+    return Array.isArray(parsed) && parsed.every((item) => typeof item === "string") ? parsed : fallback;
+  } catch {
+    return fallback;
+  }
+}
+
+function joinUrl(baseUrl, path) {
+  const base = baseUrl.replace(/\/+$/, "");
+  const suffix = path.startsWith("/") ? path : `/${path}`;
+  return `${base}${suffix}`;
+}
+
+async function fetchOnce(url, timeoutMs) {
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), timeoutMs);
+  const started = performance.now();
+  try {
+    const response = await fetch(url, { method: "GET", signal: controller.signal });
+    await response.arrayBuffer();
+    const latencyMs = performance.now() - started;
+    return {
+      url,
+      ok: response.status < 500,
+      status: response.status,
+      latency_ms: Number(latencyMs.toFixed(3)),
+      error: "",
+    };
+  } catch (error) {
+    const latencyMs = performance.now() - started;
+    return {
+      url,
+      ok: false,
+      status: 0,
+      latency_ms: Number(latencyMs.toFixed(3)),
+      error: error instanceof Error ? error.message : String(error),
+    };
+  } finally {
+    clearTimeout(timeout);
+  }
+}
+
+async function runBatches(urls, totalRequests, concurrency, timeoutMs) {
+  const queue = Array.from({ length: totalRequests }, (_, index) => urls[index % urls.length]);
+  const results = [];
+  while (queue.length > 0) {
+    const batch = queue.splice(0, concurrency);
+    results.push(...await Promise.all(batch.map((url) => fetchOnce(url, timeoutMs))));
+  }
+  return results;
+}
+
+async function main() {
+  const root = resolve(env.LBS_ROOT || process.cwd());
+  const caseId = "langbot-live-backend-latency";
+  const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
+  const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
+  await mkdir(evidenceDir, { recursive: true });
+
+  const startedAt = new Date();
+  const backendUrl = env.LANGBOT_BACKEND_URL || "";
+  const endpoints = parseJsonList(env.LANGBOT_PERF_ENDPOINTS_JSON, ["/healthz"]);
+  const totalRequests = Number(env.LANGBOT_PERF_REQUESTS || "12");
+  const concurrency = Number(env.LANGBOT_PERF_CONCURRENCY || "2");
+  const timeoutMs = Number(env.LANGBOT_PERF_TIMEOUT_MS || "5000");
+  const p95BudgetMs = Number(env.LANGBOT_PERF_BACKEND_P95_MS || "1000");
+  const maxErrorRate = Number(env.LANGBOT_PERF_MAX_ERROR_RATE || "0");
+  const metricsPath = join(evidenceDir, "metrics.json");
+  const networkLogPath = join(evidenceDir, "network.log");
+  const automationResultPath = join(evidenceDir, "automation-result.json");
+  const resultPath = join(evidenceDir, "result.json");
+
+  let status = "fail";
+  let reason = "";
+  let results = [];
+  if (!backendUrl) {
+    status = "env_issue";
+    reason = "LANGBOT_BACKEND_URL is not configured.";
+  } else {
+    const urls = endpoints.map((path) => joinUrl(backendUrl, path));
+    results = await runBatches(urls, totalRequests, concurrency, timeoutMs);
+    const okCount = results.filter((item) => item.ok).length;
+    const errorCount = results.length - okCount;
+    const errorRate = results.length === 0 ? 1 : errorCount / results.length;
+    const latencies = results.filter((item) => item.ok).map((item) => item.latency_ms);
+    const latencyStats = stats(latencies);
+    const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
+    if (allConnectionFailures) {
+      status = "env_issue";
+      reason = `Backend did not respond at ${backendUrl}.`;
+    } else if (latencyStats.p95 <= p95BudgetMs && errorRate <= maxErrorRate) {
+      status = "pass";
+      reason = "Live backend latency probe passed all thresholds.";
+    } else {
+      status = "fail";
+      reason = "Live backend latency probe breached latency or error-rate thresholds.";
+    }
+  }
+
+  const statusCounts = {};
+  for (const item of results) {
+    const key = item.status === 0 ? "network_error" : String(item.status);
+    statusCounts[key] = (statusCounts[key] || 0) + 1;
+  }
+  const okResults = results.filter((item) => item.ok);
+  const metrics = {
+    probe: caseId,
+    backend_url: backendUrl,
+    endpoints,
+    total_requests: totalRequests,
+    concurrency,
+    timeout_ms: timeoutMs,
+    ok_count: okResults.length,
+    error_count: results.length - okResults.length,
+    error_rate: results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4)),
+    latency_ms: stats(okResults.map((item) => item.latency_ms)),
+    status_counts: statusCounts,
+  };
+  const thresholds = {
+    backend_p95_ms: { actual: metrics.latency_ms.p95, max: p95BudgetMs, pass: metrics.latency_ms.p95 <= p95BudgetMs },
+    error_rate: { actual: metrics.error_rate, max: maxErrorRate, pass: metrics.error_rate <= maxErrorRate },
+  };
+
+  await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
+  await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
+
+  const finishedAt = new Date();
+  const result = {
+    source: "automation",
+    case_id: caseId,
+    run_id: runId,
+    status,
+    reason,
+    started_at: startedAt.toISOString(),
+    started_at_local: localIsoWithOffset(startedAt),
+    finished_at: finishedAt.toISOString(),
+    finished_at_local: localIsoWithOffset(finishedAt),
+    duration_ms: finishedAt.getTime() - startedAt.getTime(),
+    url: backendUrl,
+    metrics_summary: {
+      requests: metrics.total_requests,
+      concurrency: metrics.concurrency,
+      ok_count: metrics.ok_count,
+      error_rate: metrics.error_rate,
+      latency_p50_ms: metrics.latency_ms.p50,
+      latency_p95_ms: metrics.latency_ms.p95,
+      status_counts: metrics.status_counts,
+    },
+    thresholds_summary: thresholds,
+    artifacts: {
+      metrics_json: metricsPath,
+      network_log: networkLogPath,
+      automation_result_json: automationResultPath,
+      result_json: resultPath,
+    },
+    evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
+  };
+
+  const resultText = `${JSON.stringify(result, null, 2)}\n`;
+  await writeFile(automationResultPath, resultText, "utf8");
+  await writeFile(resultPath, resultText, "utf8");
+  console.log(JSON.stringify(result, null, 2));
+  exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
+}
+
+await main();
@@ -0,0 +1,205 @@
+#!/usr/bin/env node
+
+import { existsSync, readdirSync, statSync } from "node:fs";
+import { mkdir, readFile, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+function pad(value, size = 2) {
+  return String(value).padStart(size, "0");
+}
+
+function localIsoWithOffset(date = new Date()) {
+  const offsetMinutes = -date.getTimezoneOffset();
+  const sign = offsetMinutes >= 0 ? "+" : "-";
+  const absolute = Math.abs(offsetMinutes);
+  return [
+    `${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
+    `T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
+    `${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
+  ].join("");
+}
+
+function timestampSlug(date = new Date()) {
+  return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
+}
+
+function repoRootFromEnv(root) {
+  return env.LANGBOT_REPO ? resolve(env.LANGBOT_REPO) : resolve(root, "..");
+}
+
+function latestBackendLog(root) {
+  const explicit = env.LANGBOT_BACKEND_LOG;
+  if (explicit) return resolve(explicit);
+
+  const logsDir = join(repoRootFromEnv(root), "data", "logs");
+  if (!existsSync(logsDir)) return "";
+  const candidates = readdirSync(logsDir)
+    .filter((name) => /^langbot-.*\.log$/.test(name))
+    .map((name) => join(logsDir, name))
+    .filter((path) => {
+      try {
+        return statSync(path).isFile();
+      } catch {
+        return false;
+      }
+    })
+    .sort((left, right) => statSync(right).mtimeMs - statSync(left).mtimeMs);
+  return candidates[0] || "";
+}
+
+function parseSince(startedAt) {
+  if (env.LANGBOT_BACKEND_LOG_SINCE) return new Date(env.LANGBOT_BACKEND_LOG_SINCE);
+  const lookbackSeconds = Number(env.LANGBOT_BACKEND_LOG_LOOKBACK_SECONDS || "300");
+  return new Date(startedAt.getTime() - lookbackSeconds * 1000);
+}
+
+function parseTimestamp(line, year) {
+  const localMatch = line.match(/^\[(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})\.(\d{3})\]/);
+  if (localMatch) {
+    const [, month, day, hour, minute, second, millisecond] = localMatch;
+    return new Date(`${year}-${month}-${day}T${hour}:${minute}:${second}.${millisecond}+08:00`);
+  }
+
+  const accessMatch = line.match(/^\[(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}) ([+-]\d{4})\]/);
+  if (accessMatch) {
+    const [, fullYear, month, day, hour, minute, second, offset] = accessMatch;
+    const normalizedOffset = `${offset.slice(0, 3)}:${offset.slice(3)}`;
+    return new Date(`${fullYear}-${month}-${day}T${hour}:${minute}:${second}${normalizedOffset}`);
+  }
+
+  return null;
+}
+
+function findingForLine(line, number) {
+  const rules = [
+    { severity: "fail", kind: "python_traceback", pattern: /\bTraceback(?: \(most recent call last\))?/i },
+    { severity: "fail", kind: "unretrieved_task_exception", pattern: /Task exception was never retrieved/i },
+    { severity: "fail", kind: "unawaited_coroutine", pattern: /RuntimeWarning:\s+coroutine .* was never awaited/i },
+    { severity: "fail", kind: "unclosed_client_session", pattern: /Unclosed client session/i },
+    { severity: "fail", kind: "unclosed_connector", pattern: /Unclosed connector/i },
+    { severity: "fail", kind: "import_error", pattern: /\bImportError\b/i },
+    { severity: "fail", kind: "error_log", pattern: /\b(?:ERROR|CRITICAL)\b/ },
+    { severity: "warning", kind: "warning_log", pattern: /\bWARNING\b/ },
+  ];
+
+  for (const rule of rules) {
+    if (rule.pattern.test(line)) {
+      return {
+        severity: rule.severity,
+        kind: rule.kind,
+        line: number,
+        excerpt: line,
+      };
+    }
+  }
+  return null;
+}
+
+function scanLines(text, since, year) {
+  const findings = [];
+  const scanned = [];
+  let includeContinuation = false;
+  const lines = text.split(/\r?\n/);
+  for (const [index, line] of lines.entries()) {
+    const number = index + 1;
+    const timestamp = parseTimestamp(line, year);
+    if (timestamp) includeContinuation = timestamp >= since;
+    if (!includeContinuation) continue;
+    scanned.push({ number, text: line });
+    const finding = findingForLine(line, number);
+    if (finding) findings.push(finding);
+  }
+  return { findings, scanned, total_lines: lines.length };
+}
+
+async function main() {
+  const root = resolve(env.LBS_ROOT || process.cwd());
+  const caseId = "langbot-live-backend-log-health";
+  const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
+  const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
+  await mkdir(evidenceDir, { recursive: true });
+
+  const startedAt = new Date();
+  const since = parseSince(startedAt);
+  const logPath = latestBackendLog(root);
+  const metricsPath = join(evidenceDir, "metrics.json");
+  const findingsPath = join(evidenceDir, "findings.json");
+  const scannedLogPath = join(evidenceDir, "scanned-backend.log");
+  const automationResultPath = join(evidenceDir, "automation-result.json");
+  const resultPath = join(evidenceDir, "result.json");
+
+  let status = "fail";
+  let reason = "";
+  let scan = { findings: [], scanned: [], total_lines: 0 };
+  if (!logPath || !existsSync(logPath)) {
+    status = "env_issue";
+    reason = "No LangBot backend log file was found. Set LANGBOT_BACKEND_LOG or LANGBOT_REPO.";
+  } else {
+    const text = await readFile(logPath, "utf8");
+    scan = scanLines(text, since, startedAt.getFullYear());
+    const failCount = scan.findings.filter((item) => item.severity === "fail").length;
+    status = failCount === 0 ? "pass" : "fail";
+    reason = status === "pass"
+      ? "Live backend log health passed; no fail-severity findings in the scanned window."
+      : "Live backend log health found fail-severity backend log findings.";
+  }
+
+  const warningCount = scan.findings.filter((item) => item.severity === "warning").length;
+  const failCount = scan.findings.filter((item) => item.severity === "fail").length;
+  const metrics = {
+    probe: caseId,
+    backend_log: logPath,
+    since: since.toISOString(),
+    scanned_line_count: scan.scanned.length,
+    total_line_count: scan.total_lines,
+    fail_count: failCount,
+    warning_count: warningCount,
+    finding_count: scan.findings.length,
+  };
+  const thresholds = {
+    fail_count: { actual: failCount, max: 0, pass: failCount === 0 },
+  };
+
+  await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
+  await writeFile(findingsPath, `${JSON.stringify(scan.findings, null, 2)}\n`, "utf8");
+  await writeFile(scannedLogPath, scan.scanned.map((item) => `${item.number}: ${item.text}`).join("\n") + (scan.scanned.length > 0 ? "\n" : ""), "utf8");
+
+  const finishedAt = new Date();
+  const result = {
+    source: "automation",
+    case_id: caseId,
+    run_id: runId,
+    status,
+    reason,
+    started_at: startedAt.toISOString(),
+    started_at_local: localIsoWithOffset(startedAt),
+    finished_at: finishedAt.toISOString(),
+    finished_at_local: localIsoWithOffset(finishedAt),
+    duration_ms: finishedAt.getTime() - startedAt.getTime(),
+    url: logPath,
+    metrics_summary: {
+      scanned_line_count: metrics.scanned_line_count,
+      fail_count: metrics.fail_count,
+      warning_count: metrics.warning_count,
+      finding_count: metrics.finding_count,
+    },
+    thresholds_summary: thresholds,
+    artifacts: {
+      metrics_json: metricsPath,
+      findings_json: findingsPath,
+      scanned_backend_log: scannedLogPath,
+      automation_result_json: automationResultPath,
+      result_json: resultPath,
+    },
+    evidence_collected: ["metrics", "backend_log", "filesystem"],
+  };
+
+  const resultText = `${JSON.stringify(result, null, 2)}\n`;
+  await writeFile(automationResultPath, resultText, "utf8");
+  await writeFile(resultPath, resultText, "utf8");
+  console.log(JSON.stringify(result, null, 2));
+  exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
+}
+
+await main();
@@ -0,0 +1,311 @@
+#!/usr/bin/env node
+
+import { mkdir, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+function pad(value, size = 2) {
+  return String(value).padStart(size, "0");
+}
+
+function localIsoWithOffset(date = new Date()) {
+  const offsetMinutes = -date.getTimezoneOffset();
+  const sign = offsetMinutes >= 0 ? "+" : "-";
+  const absolute = Math.abs(offsetMinutes);
+  return [
+    `${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
+    `T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
+    `${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
+  ].join("");
+}
+
+function timestampSlug(date = new Date()) {
+  return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return Number(sorted[index].toFixed(3));
+}
+
+function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: Number(Math.min(...values).toFixed(3)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: Number(Math.max(...values).toFixed(3)),
+  };
+}
+
+function joinUrl(baseUrl, path) {
+  const base = baseUrl.replace(/\/+$/, "");
+  const suffix = path.startsWith("/") ? path : `/${path}`;
+  return `${base}${suffix}`;
+}
+
+function parseJsonObject(value, fallback) {
+  if (!value) return fallback;
+  try {
+    const parsed = JSON.parse(value);
+    return parsed && typeof parsed === "object" && !Array.isArray(parsed) ? parsed : fallback;
+  } catch {
+    return fallback;
+  }
+}
+
+function controlPlaneEndpoints() {
+  return [
+    {
+      id: "healthz",
+      path: "/healthz",
+      expected_status: 200,
+      expected_code: 0,
+      p95_budget_ms: Number(env.LANGBOT_PERF_HEALTHZ_P95_MS || "500"),
+      required_data_fields: [],
+    },
+    {
+      id: "system_info",
+      path: "/api/v1/system/info",
+      expected_status: 200,
+      expected_code: 0,
+      p95_budget_ms: Number(env.LANGBOT_PERF_SYSTEM_INFO_P95_MS || "1000"),
+      required_data_fields: ["version", "edition", "enable_marketplace"],
+    },
+  ];
+}
+
+async function fetchEndpoint(backendUrl, endpoint, timeoutMs) {
+  const url = joinUrl(backendUrl, endpoint.path);
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), timeoutMs);
+  const started = performance.now();
+  let bodyText = "";
+  let json = null;
+  let jsonValid = false;
+  let error = "";
+
+  try {
+    const response = await fetch(url, {
+      method: "GET",
+      headers: { "accept": "application/json" },
+      signal: controller.signal,
+    });
+    bodyText = await response.text();
+    try {
+      json = bodyText ? JSON.parse(bodyText) : null;
+      jsonValid = json !== null;
+    } catch (parseError) {
+      error = parseError instanceof Error ? parseError.message : String(parseError);
+    }
+
+    const data = json && typeof json === "object" && json.data && typeof json.data === "object" ? json.data : {};
+    const missingFields = endpoint.required_data_fields.filter((field) => !(field in data));
+    const statusOk = response.status === endpoint.expected_status;
+    const codeOk = !json || typeof json !== "object" ? false : json.code === endpoint.expected_code;
+    const shapeOk = jsonValid && missingFields.length === 0;
+    const latencyMs = performance.now() - started;
+    return {
+      endpoint_id: endpoint.id,
+      path: endpoint.path,
+      url,
+      status: response.status,
+      ok: statusOk && codeOk && shapeOk,
+      status_ok: statusOk,
+      code_ok: codeOk,
+      json_valid: jsonValid,
+      missing_fields: missingFields,
+      response_code: json && typeof json === "object" ? json.code : null,
+      latency_ms: Number(latencyMs.toFixed(3)),
+      error,
+    };
+  } catch (fetchError) {
+    const latencyMs = performance.now() - started;
+    return {
+      endpoint_id: endpoint.id,
+      path: endpoint.path,
+      url,
+      status: 0,
+      ok: false,
+      status_ok: false,
+      code_ok: false,
+      json_valid: false,
+      missing_fields: endpoint.required_data_fields,
+      response_code: null,
+      latency_ms: Number(latencyMs.toFixed(3)),
+      error: fetchError instanceof Error ? fetchError.message : String(fetchError),
+    };
+  } finally {
+    clearTimeout(timeout);
+  }
+}
+
+async function runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs) {
+  const queue = Array.from({ length: totalRequests }, (_, index) => endpoints[index % endpoints.length]);
+  const results = [];
+  while (queue.length > 0) {
+    const batch = queue.splice(0, concurrency);
+    results.push(...await Promise.all(batch.map((endpoint) => fetchEndpoint(backendUrl, endpoint, timeoutMs))));
+  }
+  return results;
+}
+
+function endpointMetrics(endpoints, results) {
+  return Object.fromEntries(endpoints.map((endpoint) => {
+    const samples = results.filter((item) => item.endpoint_id === endpoint.id);
+    const okSamples = samples.filter((item) => item.ok);
+    return [
+      endpoint.id,
+      {
+        path: endpoint.path,
+        requests: samples.length,
+        ok_count: okSamples.length,
+        error_rate: samples.length === 0 ? 1 : Number(((samples.length - okSamples.length) / samples.length).toFixed(4)),
+        latency_ms: stats(okSamples.map((item) => item.latency_ms)),
+        p95_budget_ms: endpoint.p95_budget_ms,
+      },
+    ];
+  }));
+}
+
+async function main() {
+  const root = resolve(env.LBS_ROOT || process.cwd());
+  const caseId = "langbot-live-control-plane-api";
+  const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
+  const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
+  await mkdir(evidenceDir, { recursive: true });
+
+  const startedAt = new Date();
+  const backendUrl = env.LANGBOT_BACKEND_URL || "";
+  const endpoints = controlPlaneEndpoints();
+  const configuredBudgets = parseJsonObject(env.LANGBOT_CONTROL_PLANE_P95_BUDGETS_JSON, {});
+  for (const endpoint of endpoints) {
+    const budget = configuredBudgets[endpoint.id];
+    if (typeof budget === "number" && Number.isFinite(budget)) endpoint.p95_budget_ms = budget;
+  }
+  const totalRequests = Number(env.LANGBOT_CONTROL_PLANE_REQUESTS || "20");
+  const concurrency = Number(env.LANGBOT_CONTROL_PLANE_CONCURRENCY || "4");
+  const timeoutMs = Number(env.LANGBOT_CONTROL_PLANE_TIMEOUT_MS || "5000");
+  const maxErrorRate = Number(env.LANGBOT_CONTROL_PLANE_MAX_ERROR_RATE || "0");
+  const metricsPath = join(evidenceDir, "metrics.json");
+  const endpointsPath = join(evidenceDir, "endpoints.json");
+  const networkLogPath = join(evidenceDir, "network.log");
+  const automationResultPath = join(evidenceDir, "automation-result.json");
+  const resultPath = join(evidenceDir, "result.json");
+
+  let status = "fail";
+  let reason = "";
+  let results = [];
+  if (!backendUrl) {
+    status = "env_issue";
+    reason = "LANGBOT_BACKEND_URL is not configured.";
+  } else {
+    results = await runBatches(backendUrl, endpoints, totalRequests, concurrency, timeoutMs);
+    const allConnectionFailures = results.length > 0 && results.every((item) => item.status === 0);
+    if (allConnectionFailures) {
+      status = "env_issue";
+      reason = `Backend did not respond at ${backendUrl}.`;
+    }
+  }
+
+  const okResults = results.filter((item) => item.ok);
+  const statusCounts = {};
+  for (const item of results) {
+    const key = item.status === 0 ? "network_error" : String(item.status);
+    statusCounts[key] = (statusCounts[key] || 0) + 1;
+  }
+  const perEndpoint = endpointMetrics(endpoints, results);
+  const responseShapeFailures = results.filter((item) => !item.json_valid || item.missing_fields.length > 0 || !item.code_ok).length;
+  const errorRate = results.length === 0 ? 1 : Number(((results.length - okResults.length) / results.length).toFixed(4));
+  const thresholds = {
+    error_rate: { actual: errorRate, max: maxErrorRate, pass: errorRate <= maxErrorRate },
+    response_shape_failures: { actual: responseShapeFailures, max: 0, pass: responseShapeFailures === 0 },
+  };
+  for (const endpoint of endpoints) {
+    const actual = perEndpoint[endpoint.id].latency_ms.p95;
+    thresholds[`${endpoint.id}_p95_ms`] = {
+      actual,
+      max: endpoint.p95_budget_ms,
+      pass: actual <= endpoint.p95_budget_ms,
+    };
+  }
+
+  if (status !== "env_issue") {
+    const passed = Object.values(thresholds).every((item) => item.pass);
+    status = passed ? "pass" : "fail";
+    reason = passed
+      ? "Live control-plane API probe passed all thresholds."
+      : "Live control-plane API probe breached shape, latency, or error-rate thresholds.";
+  }
+
+  const metrics = {
+    probe: caseId,
+    backend_url: backendUrl,
+    total_requests: totalRequests,
+    concurrency,
+    timeout_ms: timeoutMs,
+    ok_count: okResults.length,
+    error_count: results.length - okResults.length,
+    error_rate: errorRate,
+    status_counts: statusCounts,
+    response_shape_failures: responseShapeFailures,
+    endpoints: perEndpoint,
+  };
+
+  await writeFile(metricsPath, `${JSON.stringify({ ...metrics, samples: results }, null, 2)}\n`, "utf8");
+  await writeFile(endpointsPath, `${JSON.stringify(endpoints, null, 2)}\n`, "utf8");
+  await writeFile(networkLogPath, results.map((item) => JSON.stringify(item)).join("\n") + (results.length > 0 ? "\n" : ""), "utf8");
+
+  const finishedAt = new Date();
+  const result = {
+    source: "automation",
+    case_id: caseId,
+    run_id: runId,
+    status,
+    reason,
+    started_at: startedAt.toISOString(),
+    started_at_local: localIsoWithOffset(startedAt),
+    finished_at: finishedAt.toISOString(),
+    finished_at_local: localIsoWithOffset(finishedAt),
+    duration_ms: finishedAt.getTime() - startedAt.getTime(),
+    url: backendUrl,
+    metrics_summary: {
+      requests: metrics.total_requests,
+      concurrency: metrics.concurrency,
+      ok_count: metrics.ok_count,
+      error_rate: metrics.error_rate,
+      response_shape_failures: metrics.response_shape_failures,
+      endpoints: Object.fromEntries(Object.entries(metrics.endpoints).map(([id, value]) => [
+        id,
+        {
+          path: value.path,
+          ok_count: value.ok_count,
+          error_rate: value.error_rate,
+          latency_p50_ms: value.latency_ms.p50,
+          latency_p95_ms: value.latency_ms.p95,
+        },
+      ])),
+      status_counts: metrics.status_counts,
+    },
+    thresholds_summary: thresholds,
+    artifacts: {
+      metrics_json: metricsPath,
+      endpoints_json: endpointsPath,
+      network_log: networkLogPath,
+      automation_result_json: automationResultPath,
+      result_json: resultPath,
+    },
+    evidence_collected: ["metrics", "network", "api_diagnostic", "filesystem"],
+  };
+
+  const resultText = `${JSON.stringify(result, null, 2)}\n`;
+  await writeFile(automationResultPath, resultText, "utf8");
+  await writeFile(resultPath, resultText, "utf8");
+  console.log(JSON.stringify(result, null, 2));
+  exit(status === "pass" ? 0 : status === "env_issue" ? 2 : 1);
+}
+
+await main();
@@ -0,0 +1,162 @@
+#!/usr/bin/env node
+
+import { mkdir, writeFile } from "node:fs/promises";
+import { join, resolve } from "node:path";
+import { env, exit } from "node:process";
+
+function pad(value, size = 2) {
+  return String(value).padStart(size, "0");
+}
+
+function localIsoWithOffset(date = new Date()) {
+  const offsetMinutes = -date.getTimezoneOffset();
+  const sign = offsetMinutes >= 0 ? "+" : "-";
+  const absolute = Math.abs(offsetMinutes);
+  return [
+    `${date.getFullYear()}-${pad(date.getMonth() + 1)}-${pad(date.getDate())}`,
+    `T${pad(date.getHours())}:${pad(date.getMinutes())}:${pad(date.getSeconds())}.${pad(date.getMilliseconds(), 3)}`,
+    `${sign}${pad(Math.floor(absolute / 60))}:${pad(absolute % 60)}`,
+  ].join("");
+}
+
+function timestampSlug(date = new Date()) {
+  return date.toISOString().replace(/\.\d{3}Z$/, "Z").replace(/[^0-9A-Za-z]+/g, "-").replace(/^-|-$/g, "");
+}
+
+function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return Number(sorted[index].toFixed(3));
+}
+
+function stats(values) {
+  return {
+    min: Number(Math.min(...values).toFixed(3)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: Number(Math.max(...values).toFixed(3)),
+  };
+}
+
+function threshold(actual, limit, operator) {
+  const pass = operator === "<=" ? actual <= limit : actual >= limit;
+  return { actual, [operator === "<=" ? "max" : "min"]: limit, pass };
+}
+
+function makeSample(index) {
+  const ingress = 1 + (index % 5) * 0.22;
+  const pipeline = 2.8 + (index % 7) * 0.31;
+  const persistence = 1.1 + (index % 4) * 0.2;
+  const pluginIpc = 1.9 + (index % 6) * 0.27;
+  const rag = index % 3 === 0 ? 4.4 : 0.8 + (index % 5) * 0.18;
+  const streaming = 1.5 + (index % 8) * 0.24;
+  const provider = 80 + (index % 13) * 11;
+  const externalTool = index % 4 === 0 ? 25 + (index % 9) * 3 : 0;
+  const network = 8 + (index % 10) * 1.7;
+  const overhead = ingress + pipeline + persistence + pluginIpc + rag + streaming;
+  const external = provider + externalTool + network;
+  const total = overhead + external;
+  return {
+    index,
+    segments_ms: {
+      ingress,
+      pipeline,
+      persistence,
+      plugin_ipc: pluginIpc,
+      rag,
+      streaming,
+      provider,
+      external_tool: externalTool,
+      network,
+    },
+    langbot_overhead_ms: Number(overhead.toFixed(3)),
+    external_latency_ms: Number(external.toFixed(3)),
+    e2e_latency_ms: Number(total.toFixed(3)),
+    accounting_gap_ms: Number((total - external - overhead).toFixed(6)),
+  };
+}
+
+async function main() {
+  const root = resolve(env.LBS_ROOT || process.cwd());
+  const caseId = "langbot-overhead-accounting-contract";
+  const runId = env.LBS_RUN_ID || `${timestampSlug()}-${caseId}`;
+  const evidenceDir = resolve(env.LBS_EVIDENCE_DIR || join(root, "reports", "evidence", runId));
+  await mkdir(evidenceDir, { recursive: true });
+
+  const startedAt = new Date();
+  const sampleCount = Number(env.LANGBOT_PERF_CONTRACT_SAMPLES || "80");
+  const overheadP95BudgetMs = Number(env.LANGBOT_PERF_OVERHEAD_P95_MS || "25");
+  const samples = Array.from({ length: sampleCount }, (_, index) => makeSample(index));
+  const overheads = samples.map((sample) => sample.langbot_overhead_ms);
+  const e2e = samples.map((sample) => sample.e2e_latency_ms);
+  const external = samples.map((sample) => sample.external_latency_ms);
+  const gaps = samples.map((sample) => Math.abs(sample.accounting_gap_ms));
+  const memory = process.memoryUsage();
+
+  const metrics = {
+    probe: caseId,
+    sample_count: sampleCount,
+    langbot_overhead_ms: stats(overheads),
+    e2e_latency_ms: stats(e2e),
+    external_latency_ms: stats(external),
+    accounting_gap_max_ms: Number(Math.max(...gaps).toFixed(6)),
+    samples,
+  };
+  const thresholds = {
+    sample_count: threshold(sampleCount, 50, ">="),
+    langbot_overhead_p95_ms: threshold(metrics.langbot_overhead_ms.p95, overheadP95BudgetMs, "<="),
+    accounting_gap_max_ms: threshold(metrics.accounting_gap_max_ms, 0.001, "<="),
+  };
+  const status = Object.values(thresholds).every((item) => item.pass) ? "pass" : "fail";
+  const metricsPath = join(evidenceDir, "metrics.json");
+  const thresholdsPath = join(evidenceDir, "thresholds.json");
+  const resourceLogPath = join(evidenceDir, "resource-log.json");
+  const automationResultPath = join(evidenceDir, "automation-result.json");
+  const resultPath = join(evidenceDir, "result.json");
+
+  await writeFile(metricsPath, `${JSON.stringify(metrics, null, 2)}\n`, "utf8");
+  await writeFile(thresholdsPath, `${JSON.stringify(thresholds, null, 2)}\n`, "utf8");
+  await writeFile(resourceLogPath, `${JSON.stringify({ memory, pid: process.pid }, null, 2)}\n`, "utf8");
+
+  const finishedAt = new Date();
+  const result = {
+    source: "automation",
+    case_id: caseId,
+    run_id: runId,
+    status,
+    reason: status === "pass"
+      ? "Overhead accounting contract passed all thresholds."
+      : "Overhead accounting contract breached one or more thresholds.",
+    started_at: startedAt.toISOString(),
+    started_at_local: localIsoWithOffset(startedAt),
+    finished_at: finishedAt.toISOString(),
+    finished_at_local: localIsoWithOffset(finishedAt),
+    duration_ms: finishedAt.getTime() - startedAt.getTime(),
+    metrics_summary: {
+      sample_count: metrics.sample_count,
+      langbot_overhead_p95_ms: metrics.langbot_overhead_ms.p95,
+      e2e_latency_p95_ms: metrics.e2e_latency_ms.p95,
+      external_latency_p95_ms: metrics.external_latency_ms.p95,
+      accounting_gap_max_ms: metrics.accounting_gap_max_ms,
+    },
+    thresholds_summary: thresholds,
+    artifacts: {
+      metrics_json: metricsPath,
+      thresholds_json: thresholdsPath,
+      resource_log_json: resourceLogPath,
+      automation_result_json: automationResultPath,
+      result_json: resultPath,
+    },
+    evidence_collected: ["metrics", "resource_log", "filesystem"],
+  };
+
+  const resultText = `${JSON.stringify(result, null, 2)}\n`;
+  await writeFile(automationResultPath, resultText, "utf8");
+  await writeFile(resultPath, resultText, "utf8");
+  console.log(JSON.stringify(result, null, 2));
+  exit(status === "pass" ? 0 : 1);
+}
+
+await main();
@@ -0,0 +1,134 @@
+export function summarizeFakeProviderState(state) {
+  if (!state) return null;
+  const recentRequests = Array.isArray(state.recent_requests) ? state.recent_requests : [];
+  const chatRequests = recentRequests.filter((request) => String(request?.path || "").includes("/chat/completions"));
+  const successfulRequests = chatRequests.filter((request) => request?.status === "ok");
+  const faultRequests = chatRequests.filter((request) => (
+    request?.should_fail === true
+      || request?.status === "http_fault"
+      || (Number.isFinite(request?.http_status) && request.http_status >= 400)
+  ));
+
+  return {
+    status: state.status || "unknown",
+    url: state.url || "",
+    request_count: Number.isFinite(state.request_count) ? state.request_count : recentRequests.length,
+    recent_request_count: recentRequests.length,
+    chat_request_count: chatRequests.length,
+    fault_count: faultRequests.length,
+    streamed_request_count: chatRequests.filter((request) => request?.stream === true).length,
+    duration_ms: stats(chatRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
+    successful_duration_ms: stats(successfulRequests.map((request) => numberOrNull(request?.duration_ms)).filter(Number.isFinite)),
+    first_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_chunk_ms)).filter(Number.isFinite)),
+    first_content_chunk_ms: stats(successfulRequests.map((request) => numberOrNull(request?.first_content_chunk_ms)).filter(Number.isFinite)),
+    content_chunk_count: stats(successfulRequests.map((request) => numberOrNull(request?.content_chunk_count)).filter(Number.isFinite)),
+    config: state.config || {},
+  };
+}
+
+export function buildProviderTimingMetrics(samples, state) {
+  const recentRequests = Array.isArray(state?.recent_requests) ? state.recent_requests : [];
+  const byExpectedText = new Map();
+  for (const request of recentRequests) {
+    const expected = String(request?.expected_text || "");
+    if (!expected) continue;
+    if (!byExpectedText.has(expected)) byExpectedText.set(expected, []);
+    byExpectedText.get(expected).push(request);
+  }
+
+  const segments = [];
+  const missingExpectedText = [];
+  for (const sample of samples) {
+    const expected = String(sample?.expected_text || "");
+    if (!expected) continue;
+    const request = (byExpectedText.get(expected) || []).shift();
+    if (!request) {
+      missingExpectedText.push(expected);
+      continue;
+    }
+    const segment = buildTimingSegment(sample, request);
+    if (segment) segments.push(segment);
+  }
+
+  const values = (key) => segments.map((segment) => numberOrNull(segment[key])).filter(Number.isFinite);
+  return {
+    matched_request_count: segments.length,
+    missing_provider_match_count: missingExpectedText.length,
+    missing_expected_text: missingExpectedText.slice(0, 20),
+    send_to_provider_start_ms: stats(values("send_to_provider_start_ms")),
+    provider_duration_ms: stats(values("provider_duration_ms")),
+    provider_finish_to_ws_final_ms: stats(values("provider_finish_to_ws_final_ms")),
+    langbot_overhead_estimate_ms: stats(values("langbot_overhead_estimate_ms")),
+    e2e_minus_provider_ms: stats(values("e2e_minus_provider_ms")),
+    provider_first_content_to_ws_first_content_ms: stats(values("provider_first_content_to_ws_first_content_ms")),
+    segments,
+  };
+}
+
+function buildTimingSegment(sample, request) {
+  const sentEpochMs = numberOrNull(sample.sent_epoch_ms);
+  const finishedEpochMs = numberOrNull(sample.finished_epoch_ms);
+  const providerStartedEpochMs = numberOrNull(request.started_epoch_ms);
+  const providerFinishedEpochMs = numberOrNull(request.finished_epoch_ms);
+  const providerFirstContentEpochMs = numberOrNull(request.first_content_chunk_epoch_ms);
+  const wsFirstContentEpochMs = numberOrNull(sample.first_assistant_content_epoch_ms);
+  const responseDurationMs = numberOrNull(sample.response_duration_ms);
+  const providerDurationMs = numberOrNull(request.duration_ms);
+
+  const sendToProviderStartMs = finiteDelta(providerStartedEpochMs, sentEpochMs);
+  const providerFinishToWsFinalMs = finiteDelta(finishedEpochMs, providerFinishedEpochMs);
+  const e2eMinusProviderMs = Number.isFinite(responseDurationMs) && Number.isFinite(providerDurationMs)
+    ? rounded(responseDurationMs - providerDurationMs)
+    : null;
+  const overheadEstimateMs = Number.isFinite(sendToProviderStartMs) && Number.isFinite(providerFinishToWsFinalMs)
+    ? rounded(sendToProviderStartMs + providerFinishToWsFinalMs)
+    : e2eMinusProviderMs;
+
+  return {
+    sample_index: sample.index,
+    pipeline_label: sample.pipeline_label || "",
+    expected_text: sample.expected_text || "",
+    provider_request_id: request.id || "",
+    provider_request_number: request.request_number ?? null,
+    response_duration_ms: responseDurationMs,
+    provider_duration_ms: providerDurationMs,
+    send_to_provider_start_ms: sendToProviderStartMs,
+    provider_finish_to_ws_final_ms: providerFinishToWsFinalMs,
+    langbot_overhead_estimate_ms: overheadEstimateMs,
+    e2e_minus_provider_ms: e2eMinusProviderMs,
+    provider_first_content_to_ws_first_content_ms: finiteDelta(wsFirstContentEpochMs, providerFirstContentEpochMs),
+    provider_status: request.status || "",
+    provider_http_status: request.http_status ?? null,
+  };
+}
+
+function finiteDelta(left, right) {
+  return Number.isFinite(left) && Number.isFinite(right) ? rounded(left - right) : null;
+}
+
+export function stats(values) {
+  if (values.length === 0) return { min: 0, p50: 0, p95: 0, p99: 0, max: 0 };
+  return {
+    min: rounded(Math.min(...values)),
+    p50: percentile(values, 50),
+    p95: percentile(values, 95),
+    p99: percentile(values, 99),
+    max: rounded(Math.max(...values)),
+  };
+}
+
+export function percentile(values, percentileValue) {
+  if (values.length === 0) return 0;
+  const sorted = [...values].sort((a, b) => a - b);
+  const index = Math.min(sorted.length - 1, Math.ceil((percentileValue / 100) * sorted.length) - 1);
+  return rounded(sorted[index]);
+}
+
+export function rounded(value) {
+  return Number(value.toFixed(3));
+}
+
+function numberOrNull(value) {
+  const number = Number(value);
+  return Number.isFinite(number) ? number : null;
+}
@@ -0,0 +1,285 @@
+# Performance And Reliability Testing
+
+Use this reference when a QA request asks whether LangBot is fast enough,
+stable under load, or resilient to controlled faults.
+
+These probes are manual/non-required QA gates unless a case or suite explicitly
+states otherwise. They depend on a live local backend, mutable QA fixtures, and
+operator-selected environment variables, so do not promote them to required CI
+checks until fake-provider isolation, ownership markers, and cleanup are in
+place.
+
+## Scope
+
+Treat `skills/` as the QA control plane:
+
+- Cases define intent, readiness, thresholds, and required evidence.
+- Probe scripts collect metrics, traces, resource logs, and artifacts.
+- Reports classify the same run as `pass`, `fail`, `blocked`,
+  `env_issue`, or `flaky`.
+
+Do not turn `skills/` into a load generator or chaos engine. Call a focused
+tool from a `mode: probe` case when the test needs one, for example k6,
+Locust, pytest-benchmark, Playwright trace collection, Toxiproxy, Docker, or a
+Kubernetes disruption tool.
+
+## LangBot Performance Model
+
+For LangBot, performance is the cost LangBot adds around external systems:
+
+```text
+LangBot overhead = end-to-end latency - provider latency - external tool latency - network/fault injection latency
+```
+
+Measure user experience and internal composition separately:
+
+- WebUI load and interaction latency.
+- Debug Chat send-to-first-visible-token and send-to-completion latency.
+- Pipeline, RAG, plugin runtime, MCP, AgentRunner, and persistence segment
+  latency.
+- Queue wait time, concurrency, throughput, timeout rate, and p95/p99 latency.
+- Startup, plugin install, knowledge-base ingestion, migration, and recovery
+  time.
+
+Do not report a single message round-trip time as "LangBot performance" unless
+the report also explains external provider/tool/network time.
+
+## Evidence Contract
+
+Performance and reliability cases should declare the evidence they need:
+
+- `metrics`: machine-readable latency, throughput, error-rate, or recovery
+  metrics, usually `metrics.json`.
+- `resource_log`: CPU, memory, process, connection, queue, or file descriptor
+  samples.
+- `trace`: browser, HTTP, database, or runtime trace artifacts.
+- `profile`: CPU, memory, or flamegraph profile artifacts.
+- `backend_log`, `network`, `api_diagnostic`, and `filesystem` as supporting
+  evidence when relevant.
+
+Automation should write `automation-result.json` with these fields when
+available:
+
+```json
+{
+  "status": "pass",
+  "reason": "Probe passed all thresholds.",
+  "metrics_summary": {
+    "langbot_overhead_p95_ms": 12.4,
+    "error_rate": 0
+  },
+  "thresholds_summary": {
+    "langbot_overhead_p95_ms": { "actual": 12.4, "max": 50, "pass": true }
+  },
+  "artifacts": {
+    "metrics_json": "/path/to/metrics.json"
+  },
+  "evidence_collected": ["metrics", "filesystem"]
+}
+```
+
+Synthetic contract probes are useful for checking the QA harness, but they are
+not live product performance results. Label them as contract probes in the case
+title, checks, and report.
+
+## Chaos And Reliability Rules
+
+Chaos tests must be narrow and reversible:
+
+- Declare the fault model in `fault_model_json`.
+- Record blast radius, target component, injection method, duration, and abort
+  conditions.
+- Capture recovery checks and cleanup steps in the case.
+- Classify unavailable dependencies as `env_issue` unless the target behavior
+  is LangBot's handling of that dependency failure.
+- Do not run destructive fault injection against a shared or production-like
+  instance without explicit operator approval.
+
+Recommended first fault models:
+
+- Provider timeout or HTTP 429 from a fake provider endpoint.
+- Plugin runtime disconnect/reconnect in a local instance.
+- MCP stdio server exits mid-call.
+- RAG parser fixture fails once and recovers on retry.
+- Backend API endpoint returns 5xx from a controlled local proxy.
+
+## Starter Live Probes
+
+The starter gate separates QA-harness contracts from live product checks:
+
+- `langbot-overhead-accounting-contract` verifies that reports can carry
+  overhead accounting metrics. It uses deterministic synthetic samples and is
+  not live product performance.
+- `langbot-fault-taxonomy-contract` verifies that fault scenarios declare
+  expected status, recovery, and cleanup before destructive chaos tests are
+  added.
+- `langbot-live-backend-latency` checks the unauthenticated `/healthz`
+  endpoint for basic backend responsiveness.
+- `langbot-live-control-plane-api` checks `/healthz` and
+  `/api/v1/system/info` for HTTP 200, JSON `code: 0`, response shape, and
+  per-endpoint p95 latency.
+- `langbot-live-backend-log-health` scans the recent backend log window for
+  fail-severity runtime findings. It is the reliability guard that should fail
+  the gate when HTTP probes pass but backend logs contain Traceback, ImportError,
+  ERROR, unclosed sessions, or unawaited coroutine signals.
+
+Do not treat these starter live probes as Debug Chat or model-provider
+performance. They are control-plane readiness checks; user-facing performance
+needs browser/WebSocket/message-path measurements.
+
+## Debug Chat Load And Fake Provider Baseline
+
+Use `langbot-fake-provider-debug-chat-load` before real-provider load checks.
+The setup automation starts a local OpenAI-compatible fake provider, registers
+it as a normal LangBot provider/model, configures a local-agent pipeline, resets
+Debug Chat, and then drives concurrent WebSocket messages through the live
+backend.
+
+This is not a mocked backend test. It still exercises:
+
+- provider/model persistence and runtime reload;
+- LiteLLM OpenAI-compatible requester path;
+- local-agent runner selection and pipeline execution;
+- Debug Chat WebSocket adapter and broadcast behavior;
+- backend concurrency, timeout, and error-rate accounting.
+
+The fake provider is deterministic and can inject controlled latency or faults
+with `LANGBOT_FAKE_PROVIDER_*` variables, so it is the baseline for LangBot
+message-path overhead. A fake-provider process keeps process-global config,
+request counters, and recent request history; run fake-provider probes serially
+or give each run its own provider instance. Concurrent probes against the same
+fake-provider URL can reset or reconfigure each other's metrics.
+
+The probe uses unique expected response tokens per
+request because Debug Chat broadcasts messages to every connection in the same
+session; unique tokens prevent one connection from counting another
+connection's response as its own.
+
+When the fake provider is used, reports also include provider-side timing in
+`metrics.json`:
+
+- `fake_provider.duration_ms` and `fake_provider.first_content_chunk_ms`
+  measure the controlled provider itself.
+- `provider_timing.send_to_provider_start_ms` estimates WebSocket ingress,
+  pipeline dispatch, runner setup, and requester time before the provider
+  receives the request.
+- `provider_timing.provider_finish_to_ws_final_ms` estimates the path from
+  provider completion back to the final Debug Chat WebSocket response.
+- `provider_timing.langbot_overhead_estimate_ms` is the sum of those two
+  LangBot-side segments when wall-clock timestamps can be matched by the
+  unique expected response token.
+
+After the baseline passes, run `langbot-fake-provider-debug-chat-slow-load` to
+keep the same live backend path while injecting deterministic streaming latency.
+Run `langbot-fake-provider-debug-chat-fault-recovery` to inject bounded HTTP
+provider failures and require both observed failures and later successful
+requests. The fault-recovery case is deliberately sequential because failed
+Debug Chat responses do not carry a unique success token that can be attributed
+to one concurrent connection.
+
+Run `langbot-fake-provider-debug-chat-cross-pipeline-isolation` separately via
+`langbot-debug-chat-isolation-gate`. Current LangBot releases may fail it because
+of product bug [#2286](https://github.com/langbot-app/LangBot/issues/2286), where
+Debug Chat replies can read singleton WebSocket proxy pipeline state after a
+later message overwrites it. Treat that failure as regression evidence for the
+product fix rather than as a fake-provider latency finding.
+
+Use `langbot-space-debug-chat-concurrency-smoke` after the fake-provider
+baseline. It runs a deliberately small real Space-provider batch and reports
+user-visible latency, not pure LangBot overhead. Space/model/network failures
+are dependency findings until the fake baseline shows the same symptom.
+If a Space smoke passes but log guard finds telemetry posting Tracebacks,
+classify that separately as `telemetry-proxy-noise` instead of clearing the
+proxy or treating the Debug Chat path as failed.
+
+Useful commands:
+
+```bash
+rtk bin/lbs test run langbot-fake-provider-debug-chat-load --run-id langbot-fake-load-local
+rtk bin/lbs test run langbot-fake-provider-debug-chat-slow-load --run-id langbot-fake-slow-local
+rtk bin/lbs test run langbot-fake-provider-debug-chat-fault-recovery --run-id langbot-fake-fault-local
+rtk bin/lbs suite run langbot-debug-chat-isolation-gate --run-id langbot-debug-chat-isolation-local --include-manual-check
+rtk bin/lbs test run langbot-space-debug-chat-concurrency-smoke --run-id langbot-space-smoke-local
+rtk bin/lbs suite run langbot-debug-chat-load-gate --run-id langbot-debug-chat-load-local --include-manual-check
+```
+
+## Gate Layers
+
+Use the smallest gate that answers the quality question:
+
+- `langbot-performance-contract-gate`: fast synthetic checks for report shape,
+  threshold accounting, and fault taxonomy. Good for PR feedback when no live
+  service is running.
+- `langbot-live-backend-gate`: live backend `/healthz`,
+  `/api/v1/system/info`, and backend log health. Good after starting a local
+  LangBot backend.
+- `langbot-user-path-performance-gate`: browser-visible user path performance,
+  starting with Pipeline Debug Chat send-to-visible-completion latency. Run it
+  only when the browser profile and target pipeline are ready.
+- `langbot-debug-chat-load-gate`: manual WebSocket Debug Chat load checks,
+  starting with controlled fake-provider baseline, slow-provider, and
+  fault-recovery profiles, plus an optional low-volume real Space-provider
+  smoke. Run fake-provider cases serially when they share a provider URL.
+- `langbot-debug-chat-isolation-gate`: manual cross-pipeline Debug Chat
+  isolation regression gate. Current releases may fail because of #2286; keep it
+  separate from the normal load gate until that product fix lands.
+- `langbot-performance-reliability-gate`: combined starter gate for synthetic
+  contracts plus live backend checks.
+
+Keep environment diagnostics separate from product regressions. For example, a
+SOCKS proxy without Python `socksio` support should be fixed or clearly
+classified by `bin/lbs env doctor`; do not hide the resulting backend
+Traceback in reports.
+
+## Debug Chat Performance
+
+`pipeline-debug-chat-performance` reuses the browser Debug Chat automation and
+adds `metrics.json`, `metrics_summary`, and `thresholds_summary` to
+`automation-result.json`.
+
+Current metric:
+
+```text
+response_duration_ms = prompt send -> expected assistant response visible and stable
+```
+
+This is a user-path metric, not pure LangBot overhead. If it regresses, inspect
+provider latency, model route health, plugin/runtime logs, WebSocket behavior,
+and browser console/network evidence before attributing the whole duration to
+LangBot.
+
+### User-Path Gate Runbook
+
+1. Start the backend and frontend. The frontend must be launched with
+   `VITE_API_BASE_URL="$LANGBOT_BACKEND_URL"` so browser API calls reach the
+   backend.
+2. Run `node scripts/e2e/ensure-local-agent-pipeline.mjs --write-env`. The
+   setup refreshes the local QA login, skips the wizard, prepares a Debug Chat
+   pipeline, scans Space models, tests candidates, writes tested fallback
+   models, and writes the selected pipeline/model env values to
+   `skills/.env.local`.
+3. If setup returns `env_issue`, read `model_tests` and provider errors first.
+   A missing Space key, failed Space scan, or unavailable model route is not a
+   LangBot performance regression.
+4. Run
+   `bin/lbs suite run langbot-user-path-performance-gate --include-manual-check`.
+5. Interpret `response_p95_ms` as browser-visible send-to-completion time. It
+   includes provider latency; use backend logs and model test evidence to
+   separate LangBot overhead from the external model route.
+
+The setup keeps a `max-round` value in the generated pipeline config only
+because the current backend truncator still reads that field directly. Do not
+use it as a quality requirement for future local-agent behavior.
+
+## Running The First Gate
+
+Start with the reusable suite:
+
+```bash
+rtk bin/lbs suite plan langbot-performance-reliability-gate
+rtk bin/lbs suite start langbot-performance-reliability-gate --run-id langbot-perf-rel-local
+```
+
+Run synthetic contract probes first. Run live probes only after the selected
+backend/frontend instance is reachable and the run owner accepts any fault
+scope.
@@ -1,68 +0,0 @@
-# Acceptance matrix — skill all-tool model
-
-Acceptance criteria for the branch that unifies LangBot skills as **authorized
-tools** (`feat/agent-runner-plugin`). Skills are no longer gated behind the
-`skill_authoring` capability; `activate` / `register_skill` / native `exec` are
-exposed like native tools, gated only on **sandbox + skill_mgr**. Discovery is
-tool-driven (`langbot_list_assets` gains a `skills` asset class for external
-harnesses). Host persists activated skills to `host.activated_skills`
-(last-write-wins) and prefills `ToolResource.parameters` so runners skip
-per-tool `get_tool_detail`.
-
-## What changed (scope under test)
-
-| Layer | Change |
-| --- | --- |
-| host | `toolmgr.get_all_tools` drops `include_skill_authoring`; `SkillToolLoader` self-gates on sandbox+skill_mgr |
-| host | `preproc` drops the `include_skill_authoring` branch; bound-skills + skills resource gate on `skill_mgr` |
-| host | `resource_builder` stops gating skills on `skill_authoring`; fills `ToolResource.parameters` via `tool_mgr.get_tool_schema` |
-| host | `persist_activated_skill` writes `host.activated_skills` (conversation scope) |
-| sdk | `ToolResource.parameters` (full JSON schema); `langbot_list_assets` `skills` asset class |
-| local-agent | `build_llm_tools` prefers `ctx.resources.tools.parameters`, falls back to `get_tool_detail`; `DEFAULT_MAX_TOOL_ITERATIONS` 20→100 |
-
-## Dimensions
-
- **Runner**: `local-agent` (in-process logic, direct Run API, skill tools in `use_funcs`) · `acp-agent-runner` (external harness, remote-ssh claude-code over ACP, MCP gateway via **HTTP proxy**) · `claude-code-agent` (external harness, claude-code CLI, MCP gateway via **stdio bridge** — pipeline `28fd37ac`, remote-ssh→101).
-
-### Runner transport difference (both work out-of-the-box on remote-ssh)
-
-Both external runners receive the same host-generated gateway `AgentMCPServerConfig`, but inject it differently — and **both are made remote-reachable automatically; neither requires `public-url` on remote-ssh**:
-
- **claude-code-agent → stdio bridge.** The mcp config is shipped to the remote host base64-over-SSH-stdin and consumed via `--mcp-config`; the gateway entry is a `command/args` (stdio) MCP server whose process tunnels back to the host over the SSH stdio pipe.
- **acp-agent-runner → HTTP proxy + SSH reverse tunnel.** The gateway is a localhost HTTP MCP proxy passed via ACP `session/new {mcpServers}`. On `remote-ssh` with no `public-url`, the SDK's `AgentRunMCPAccess` (`mcp_access.py` `_remote_reverse_tunnel`: location==remote-ssh and empty public_url) emits an `ssh -R 127.0.0.1:<port>:127.0.0.1:<port>` reverse tunnel — consumed by acp `default.py:521` (`ssh_args.extend(access.reverse_tunnel.ssh_args())`) — and points `server_config.public_url` at the host-local `http_mcp_endpoint`. The remote claude hits `127.0.0.1:<port>` which tunnels back to the host bridge. **`langbot-assets-gateway-public-url` is an optional alternative for topologies where the reverse tunnel can't be used — not a requirement.**
-
-This is a **runner-plugin transport detail, not a host all-tool-branch issue** — proven by **both** runners discovering skills end-to-end with the unmodified branch (see cases below).
-
-> **Correction (2026-06-22).** An earlier revision of this doc claimed acp was "blocked" on remote-ssh and *required* `langbot-assets-gateway-public-url`, based on a run that returned `PROBEDONE 0 0` / timeout. That was an **environment artifact, not an acp defect**: a duplicate backend instance (a second checkout `LangBot-master/` whose box runtime contended for the same `--ws-control-port 5410`) plus a wedged plugin runtime (host `emit_event` / `list_agent_runners` action calls timing out with `ActionCallTimeoutError`). Re-run on a clean single-instance runtime, **acp passes via the reverse tunnel with no `public-url`** (`PROBEDONE 1 17`, 8–24s).
- **Lifecycle**: discover → activate → operate (native exec under the activated mount path) → register.
- **Backend**: docker · nsjail · e2b.
-
-## Cases & status
-
-| Case | Asserts | Runner(s) | Status |
-| --- | --- | --- | --- |
-| `skill-tool-exposure-no-capability` | skill tools offered to a tool-calling runner **without** `skill_authoring`; gated only on sandbox+skill_mgr | local-agent | **covered (unit)** — `test_tool_manager_native.py`, `test_preproc.py` |
-| `skill-activation-persistence` | activated skill survives a new run in the same conversation (`host.activated_skills` restore) | local-agent | **covered (unit)** — `test_skill_tools.py` |
-| `toolresource-parameters-prefill` | runner builds LLM tools from `ctx.resources.tools.parameters` without per-tool `get_tool_detail` | local-agent | **covered (unit)** — `test_run_assembly.py::test_build_llm_tools_uses_prefilled_schema_without_fetch` |
-| `regression-existing-runner-behavior` | existing local-agent cases (basic/rag/tool-call/steering/multimodal) unchanged | local-agent | **covered (unit)** — full host/sdk/local-agent suites green, 0 new failures |
-| `sandbox-skill-authoring-e2e` | create → register → activate → exec-from-activated-path → `E2E_OK` | local-agent | **PASS (nsjail + docker)** — full chain green via local-agent Debug Chat (pipeline `3e645b04`): create+run in `/workspace` → `exit 0` `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`; register+activate; exec in `/workspace/.skills/<name>` runs `scripts/use.py` (reads `data/input.json`) and writes `activated_writeback.txt` → `exit 0`, both markers, file written through to host skill store. Verified on **nsjail** first, then on **docker** after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)). |
-| `skill-discovery-via-mcp-gateway` | external harness calls `langbot_list_assets(['skills'])` and receives pipeline-visible skills | claude-code / acp | **PASS (both)** — clean single-instance runtime, remote-ssh→101. claude-code-agent (pipeline `28fd37ac`, stdio bridge): `PROBEDONE skills=1 tools=15`. acp-agent-runner (pipeline `b00794d2`, HTTP proxy + SSH reverse tunnel, **no public-url**): `PROBEDONE skills=1 tools=17`, 8–24s. Both prove the all-tool `skills` asset class is discoverable end-to-end by an external harness. |
-| `skill-activation-cross-runner-parity` | local-agent and external harness both reach skills via their paths (`use_funcs` vs `langbot_call_tool`) | local-agent + claude-code + acp | **PASS** — local-agent (use_funcs) ✓, claude-code-agent (stdio gateway, `skills=1 tools=15`) ✓, and acp-agent-runner (HTTP-proxy gateway over reverse tunnel, `skills=1 tools=17`) ✓ all discover skills. `skills` count matches (1==1); the `tools` count (17 vs 15) is claude's self-reported tally and not yet checked against the authoritative gateway count — most likely model-counting variance, not an asset difference. |
-
-## Known issues
-
- [#2271](https://github.com/langbot-app/LangBot/issues/2271) — activated `/workspace/.skills/<name>` `scripts/`/`data/` missing on the docker backend. **FIXED** by [langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87) (`fix(box): recreate sandbox container when extra_mounts change`), rebased into this branch. **Corrected root cause:** not "docker masks the nested bind mount" (disproven) — the real bug is **container reuse**: `extra_mounts` was not part of the box session compatibility check, so when a skill is activated mid-conversation docker reused the already-running container and could not append the new bind mount; the activated skill therefore appeared empty. The fix records a mount signature on the session and recreates the container when the mount set changes (idempotent, no data loss). Pre-existing (Feat/sandbox #2072), reproduced on pure `origin/master` + the built-in local-agent runner, so not introduced by this branch — this branch only exposed the path end-to-end for the first time. After the fix, the OPERATE step passes on **both** docker and nsjail (see exit criterion 3). Merging needs a new SDK release + a `langbot-plugin` pin bump in LangBot's `pyproject.toml` to reach a released LangBot.
- **nsjail + stale docker workspace artifacts (environment, not a code bug).** If a prior docker run left root-owned dirs under the workspace (e.g. `data/box/default/.skills/`, created root-owned because docker runs as root), nsjail — which runs as the invoking user — cannot create the nested skill mount target under that root-owned dir and `runChild()` fails with `Launching child process failed`, poisoning **every** exec in the session (the exact symptom documented in `box/service.py::build_skill_extra_mounts`). Fix: remove the root-owned leftovers (`sudo rm -rf data/box/default/.skills data/box/default/<stale-skill>`) before running nsjail e2e. New nsjail runs create user-owned artifacts, so this is a one-time cleanup after switching off docker.
-
-## Exit criteria
-
-1. Unit matrix green across host/sdk/local-agent, 0 new failures. **(DONE)**
-2. `skill-tool-exposure-no-capability` + `skill-activation-persistence` + `toolresource-parameters-prefill` covered by unit. **(DONE)**
-3. `sandbox-skill-authoring-e2e` OPERATE step passes on a real backend, proving end-to-end skill use. **(DONE — nsjail + docker)** — full create→exec→register→activate→exec-from-`/workspace/.skills/<name>` chain returns `exit 0`; the activated mount runs `scripts/use.py` (reads `data/input.json` → `SANDBOX_COMPLEX_SKILL_OK sum=10 product=24`) and writes `activated_writeback.txt` through to the host skill store. Verified on nsjail, then on docker after the #2271 fix ([langbot-plugin-sdk#87](https://github.com/langbot-app/langbot-plugin-sdk/pull/87)).
-4. `skill-discovery-via-mcp-gateway` passes on an external harness. **(DONE — claude-code-agent: skills=1 tools=15, 24s)**
-5. `skill-activation-cross-runner-parity` passes on acp. **(DONE — acp: skills=1 tools=17, 8s, via SSH reverse tunnel with no public-url; clean single-instance runtime)**
-
-## How to run
-
- **Unit**: LangBot `make test`; SDK `uv run pytest`; local-agent `uv run pytest tests/`.
- **Browser e2e**: per-pipeline Debug Chat; canonical skill prompt pattern in [`sandbox-skill-authoring.md`](./sandbox-skill-authoring.md). Automatable cases use the `automation_*` fields + `scripts/e2e/pipeline-debug-chat.mjs`.
@@ -0,0 +1,13 @@
+id: langbot-debug-chat-isolation-gate
+title: "LangBot Debug Chat isolation gate"
+description: "Manual/non-required cross-pipeline Debug Chat isolation gate. Current releases may fail this gate because of product bug #2286; use it as regression evidence after the routing fix lands."
+type: reliability
+priority: p1
+tags:
+  - reliability
+  - debug-chat
+  - websocket
+  - isolation
+  - concurrency
+cases:
+  - langbot-fake-provider-debug-chat-cross-pipeline-isolation
@@ -0,0 +1,15 @@
+id: langbot-debug-chat-load-gate
+title: "LangBot Debug Chat load gate"
+description: "Manual/non-required message-path load checks for Pipeline Debug Chat: controlled fake-provider baseline, slow-provider and fault-recovery profiles, plus optional real Space-provider smoke. Cross-pipeline isolation is split into langbot-debug-chat-isolation-gate because current releases may fail it due to product bug #2286."
+type: performance
+priority: p1
+tags:
+  - performance
+  - debug-chat
+  - websocket
+  - load
+cases:
+  - langbot-fake-provider-debug-chat-load
+  - langbot-fake-provider-debug-chat-slow-load
+  - langbot-fake-provider-debug-chat-fault-recovery
+  - langbot-space-debug-chat-concurrency-smoke
@@ -0,0 +1,14 @@
+id: langbot-live-backend-gate
+title: "LangBot live backend reliability gate"
+description: "Live backend control-plane responsiveness and runtime log health checks for a locally running LangBot instance."
+type: reliability
+priority: p1
+tags:
+  - performance
+  - reliability
+  - live-backend
+  - metrics
+cases:
+  - langbot-live-backend-latency
+  - langbot-live-control-plane-api
+  - langbot-live-backend-log-health
@@ -0,0 +1,13 @@
+id: langbot-performance-contract-gate
+title: "LangBot performance contract gate"
+description: "Fast synthetic contract checks for performance metric accounting and non-destructive reliability fault taxonomy."
+type: contract
+priority: p1
+tags:
+  - performance
+  - reliability
+  - contract
+  - metrics
+cases:
+  - langbot-overhead-accounting-contract
+  - langbot-fault-taxonomy-contract
@@ -0,0 +1,16 @@
+id: langbot-performance-reliability-gate
+title: "LangBot performance and reliability starter gate"
+description: "Starter gate for LangBot performance accounting, live backend control-plane latency, and non-destructive fault taxonomy checks."
+type: reliability
+priority: p1
+tags:
+  - performance
+  - reliability
+  - metrics
+  - chaos
+cases:
+  - langbot-overhead-accounting-contract
+  - langbot-fault-taxonomy-contract
+  - langbot-live-backend-latency
+  - langbot-live-control-plane-api
+  - langbot-live-backend-log-health
@@ -0,0 +1,12 @@
+id: langbot-user-path-performance-gate
+title: "LangBot user-path performance gate"
+description: "Browser-visible performance checks for user-facing LangBot paths such as Pipeline Debug Chat."
+type: performance
+priority: p1
+tags:
+  - performance
+  - browser
+  - debug-chat
+  - user-path
+cases:
+  - pipeline-debug-chat-performance
@@ -0,0 +1,23 @@
+id: telemetry-proxy-noise
+title: "Telemetry posting fails through the proxy while the target flow succeeds"
+date: 2026-06-25
+category: env_issue
+symptoms:
+  - "The target Debug Chat or provider smoke request completes successfully."
+  - "The same log window contains a Traceback for telemetry posting."
+  - "The traceback references the Space telemetry endpoint."
+patterns:
+  - "Failed to post telemetry"
+  - "https://space.langbot.app/api/v1/telemetry"
+  - "httpx.ConnectError"
+likely_causes:
+  - "The backend process inherited proxy settings that are required for model/provider access but unreliable for telemetry posting."
+  - "The telemetry endpoint is temporarily unreachable through the local proxy route."
+  - "TLS or proxy negotiation failed for the non-critical telemetry request."
+fix_steps:
+  - "Keep the proxy configuration needed for model/provider access; do not clear it only to hide telemetry noise."
+  - "Check that uppercase and lowercase proxy variables are consistent before rerunning a live Space smoke."
+  - "Classify the target flow and log-health result separately: a successful Debug Chat run can still have an environment log-health finding."
+verification: "A rerun shows the target case success patterns and no telemetry Traceback in the scanned log window, or the report explicitly records the telemetry issue as environment noise."
+related_cases:
+  - langbot-space-debug-chat-concurrency-smoke
@@ -1,5 +1,7 @@
 import { existsSync } from "node:fs";
+import { spawnSync } from "node:child_process";
 import { Socket } from "node:net";
+import { join } from "node:path";
 import type { CommandContext } from "../types.ts";
 import { parseOptions } from "../cli.ts";
 import { loadEnv } from "../fs.ts";
@@ -88,6 +90,37 @@ function compareProxyPair(env: Record<string, string>, upper: string, lower: str
  return null;
 }

+function envValue(env: Record<string, string>, key: string): string {
+  return process.env[key] ?? env[key] ?? "";
+}
+
+function activeSocksProxy(env: Record<string, string>): { key: string; value: string } | null {
+  for (const key of ["ALL_PROXY", "all_proxy", "HTTPS_PROXY", "https_proxy", "HTTP_PROXY", "http_proxy"]) {
+    const value = envValue(env, key);
+    if (/^socks/i.test(value)) return { key, value };
+  }
+  return null;
+}
+
+function checkSocksio(env: Record<string, string>): string | null {
+  const proxy = activeSocksProxy(env);
+  if (!proxy) return null;
+
+  const repo = env.LANGBOT_REPO;
+  const python = repo ? join(repo, ".venv", "bin", "python") : "";
+  if (!python || !existsSync(python)) {
+    return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but LangBot venv python was not found; after creating the venv, verify it can import socksio.`;
+  }
+
+  const result = spawnSync(python, ["-c", "import socksio"], {
+    encoding: "utf8",
+    timeout: 5000,
+  });
+  if (result.status === 0) return null;
+
+  return `SOCKS proxy ${proxy.key} is configured (${redactEnvValue(proxy.key, proxy.value)}), but ${python} cannot import socksio; run \`${python} -m pip install socksio\` or start LangBot without SOCKS proxy env.`;
+}
+
 export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
  const env = loadEnv(ctx.root);
  const failures: string[] = [];
@@ -117,6 +150,8 @@ export async function commandEnvDoctor(ctx: CommandContext): Promise<number> {
  ]) {
    if (mismatch) failures.push(mismatch);
  }
+  const socksioFailure = checkSocksio(env);
+  if (socksioFailure) failures.push(socksioFailure);

  for (const [label, result] of await Promise.all([
    checkUrl("LANGBOT_BACKEND_URL", env.LANGBOT_BACKEND_URL).then((result) => ["LANGBOT_BACKEND_URL", result] as const),
@@ -465,6 +465,41 @@ function outputTail(value: string | Buffer | null | undefined): string {
  return String(value ?? "").trim().slice(-4000);
 }

+function exitStatusFromResultStatus(status: string): number {
+  if (status === "pass") return 0;
+  if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
+  return 1;
+}
+
+function executionStatusFromExitStatus(status: number): string {
+  if (status === 0) return "ok";
+  if (status === 2) return "classified";
+  return "nonzero";
+}
+
+function executionFromCaseResultFile(caseItem: Record<string, unknown>): Record<string, unknown> | null {
+  const resultPath = join(String(caseItem.evidence_dir), "result.json");
+  if (!existsSync(resultPath)) return null;
+  try {
+    const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
+    if (
+      parsed.case_id !== caseItem.id ||
+      parsed.run_id !== caseItem.run_id ||
+      typeof parsed.status !== "string"
+    ) return null;
+    const exitStatus = exitStatusFromResultStatus(parsed.status);
+    return {
+      status: executionStatusFromExitStatus(exitStatus),
+      exit_status: exitStatus,
+      reason: typeof parsed.reason === "string" ? parsed.reason : "result.json completed",
+      result_status: parsed.status,
+      result_json: resultPath,
+    };
+  } catch {
+    return null;
+  }
+}
+
 function executionProblemStatus(executions: Array<Record<string, unknown>>): string {
  const statuses = executions.map((item) => String(item.status));
  if (statuses.includes("nonzero")) return "fail";
@@ -523,12 +558,18 @@ export function commandSuiteRun(ctx: CommandContext): number {
      encoding: "utf8",
      stdio: options.json === true ? "pipe" : "inherit",
    });
-    const status = result.error ? 1 : result.status ?? 1;
+    const fileExecution = result.error ? executionFromCaseResultFile(caseItem) : null;
+    const status = typeof fileExecution?.exit_status === "number"
+      ? fileExecution.exit_status
+      : result.error ? 1 : result.status ?? 1;
    executions.push({
      id: caseItem.id,
-      status: status === 0 ? "ok" : "nonzero",
+      status: fileExecution?.status ?? executionStatusFromExitStatus(status),
      exit_status: status,
-      reason: result.error?.message || "",
+      reason: fileExecution?.reason ?? result.error?.message ?? "",
+      result_status: fileExecution?.result_status,
+      result_json: fileExecution?.result_json,
+      spawn_error: fileExecution && result.error ? result.error.message : undefined,
      stdout: outputTail(result.stdout),
      stderr: outputTail(result.stderr),
    });
@@ -271,7 +271,7 @@ function reportTemplate(mode: string): Record<string, string> {
      target_tested: "Probe target, endpoint, file, command, or service actually checked",
      execution_path: "automation script | shell command | direct API | other",
      probe_result: "What the probe observed",
-      logs_or_artifacts: "Log, filesystem, API, or other artifact paths collected",
+      metrics_or_artifacts: "Metrics, logs, filesystem artifacts, traces, or profiles collected",
      diagnostics: "Extra diagnostics used, if any",
      matched_troubleshooting: "Troubleshooting ids matched, if any",
      assets_to_update: "New case/reference/troubleshooting entries to add",
@@ -320,7 +320,7 @@ function manualEvidenceTemplate(mode: string): ManualEvidenceTemplate {
      target_tested: "TODO: probe target, endpoint, file, command, or service actually checked",
      execution_path: "TODO: automation script | shell command | direct API | other",
      probe_result: "TODO: observed probe result",
-      logs_or_artifacts: "TODO: evidence paths or skipped reason",
+      metrics_or_artifacts: "TODO: metrics, logs, filesystem artifacts, traces, or profiles collected",
      diagnostics: "TODO: additional diagnostics used, if any",
      matched_troubleshooting: "TODO: troubleshooting ids matched, if any",
      assets_to_update: "TODO: case/reference/troubleshooting updates to make",
@@ -1099,6 +1099,41 @@ function executionTail(value: string | Buffer | null | undefined): string {
  return String(value ?? "").trim().slice(-4000);
 }

+function exitStatusFromResultStatus(status: string): number {
+  if (status === "pass") return 0;
+  if (status === "blocked" || status === "env_issue" || status === "flaky") return 2;
+  return 1;
+}
+
+function executionStatusFromExitStatus(status: number): string {
+  if (status === 0) return "ok";
+  if (status === 2) return "classified";
+  return "nonzero";
+}
+
+function executionFromAutomationResultFile(
+  evidenceDir: string,
+  caseId: string,
+  runId: string,
+): { status: string; exit_status: number; reason: string; result_status: string; path: string } | null {
+  const resultPath = join(evidenceDir, "automation-result.json");
+  if (!existsSync(resultPath)) return null;
+  try {
+    const parsed = JSON.parse(readFileSync(resultPath, "utf8")) as Record<string, unknown>;
+    if (parsed.case_id !== caseId || parsed.run_id !== runId || typeof parsed.status !== "string") return null;
+    const exitStatus = exitStatusFromResultStatus(parsed.status);
+    return {
+      status: executionStatusFromExitStatus(exitStatus),
+      exit_status: exitStatus,
+      reason: typeof parsed.reason === "string" ? parsed.reason : "automation-result.json completed",
+      result_status: parsed.status,
+      path: resultPath,
+    };
+  } catch {
+    return null;
+  }
+}
+
 function runSetupAutomation(
  ctx: CommandContext,
  item: StructuredItem,
@@ -1224,6 +1259,30 @@ export function commandTestRun(ctx: CommandContext): number {
  });

  if (result.error) {
+    const fileExecution = executionFromAutomationResultFile(
+      run.automation.evidence_dir,
+      String(run.case.id),
+      run.run_id,
+    );
+    if (fileExecution) {
+      if (options.json !== true) {
+        console.error(`WARN: automation spawn reported an error, but ${fileExecution.path} completed: ${result.error.message}`);
+      }
+      if (options.json === true) {
+        console.log(JSON.stringify({
+          run,
+          setup_executions: setupExecutions,
+          automation_execution: {
+            ...fileExecution,
+            spawn_error: result.error.message,
+            stdout: executionTail(result.stdout),
+            stderr: executionTail(result.stderr),
+          },
+          exit_status: fileExecution.exit_status,
+        }, null, 2));
+      }
+      return fileExecution.exit_status;
+    }
    if (options.json !== true) console.error(`ERROR: failed to run automation: ${result.error.message}`);
    if (options.json === true) {
      console.log(JSON.stringify({
@@ -1247,7 +1306,7 @@ export function commandTestRun(ctx: CommandContext): number {
      run,
      setup_executions: setupExecutions,
      automation_execution: {
-        status: status === 0 ? "ok" : "nonzero",
+        status: executionStatusFromExitStatus(status),
        exit_status: status,
        stdout: executionTail(result.stdout),
        stderr: executionTail(result.stderr),
@@ -1311,6 +1370,7 @@ function renderMarkdownReport(report: TestReport): string {
  const environment = report.environment;
  const logGuard = report.log_guard;
  const troubleshooting = report.troubleshooting;
+  const automation = report.automation_result;
  const lines: string[] = [];

  lines.push(`# Test Report: ${reportCase.id}`);
@@ -1323,20 +1383,41 @@ function renderMarkdownReport(report: TestReport): string {
  lines.push(`Type: ${reportCase.type}`);
  lines.push("");
  lines.push("## Result");
-  lines.push(`- result: ${evidence.result}`);
-  for (const [key, value] of Object.entries(evidence)) {
-    if (key !== "result") lines.push(`- ${key}: ${value}`);
+  if (automation.status === "loaded" && automation.result) {
+    lines.push(`- result: ${automation.result}`);
+    if (automation.reason) lines.push(`- reason: ${automation.reason}`);
+    if (automation.url) lines.push(`- target_tested: ${automation.url}`);
+    if (automation.path) lines.push(`- automation_result: ${automation.path}`);
+    if (automation.artifacts) lines.push(`- artifacts: ${JSON.stringify(automation.artifacts)}`);
+  } else {
+    lines.push(`- result: ${evidence.result}`);
+    for (const [key, value] of Object.entries(evidence)) {
+      if (key !== "result") lines.push(`- ${key}: ${value}`);
+    }
  }
  lines.push("");
  lines.push("## Automation Result");
-  lines.push(`- status: ${report.automation_result.status}`);
-  if (report.automation_result.path) lines.push(`- path: ${report.automation_result.path}`);
-  if (report.automation_result.result) lines.push(`- result: ${report.automation_result.result}`);
-  if (report.automation_result.reason) lines.push(`- reason: ${report.automation_result.reason}`);
-  if (report.automation_result.started_at_local) lines.push(`- started_at_local: ${report.automation_result.started_at_local}`);
-  if (report.automation_result.finished_at_local) lines.push(`- finished_at_local: ${report.automation_result.finished_at_local}`);
-  if (report.automation_result.url) lines.push(`- url: ${report.automation_result.url}`);
-  if (report.automation_result.expected_text) lines.push(`- expected_text: ${report.automation_result.expected_text}`);
+  lines.push(`- status: ${automation.status}`);
+  if (automation.path) lines.push(`- path: ${automation.path}`);
+  if (automation.result) lines.push(`- result: ${automation.result}`);
+  if (automation.reason) lines.push(`- reason: ${automation.reason}`);
+  if (automation.duration_ms !== undefined) lines.push(`- duration_ms: ${automation.duration_ms}`);
+  if (automation.started_at_local) lines.push(`- started_at_local: ${automation.started_at_local}`);
+  if (automation.finished_at_local) lines.push(`- finished_at_local: ${automation.finished_at_local}`);
+  if (automation.url) lines.push(`- url: ${automation.url}`);
+  if (automation.expected_text) lines.push(`- expected_text: ${automation.expected_text}`);
+  if (automation.metrics_summary) {
+    lines.push("- metrics_summary:");
+    lines.push(`  ${JSON.stringify(automation.metrics_summary)}`);
+  }
+  if (automation.thresholds_summary) {
+    lines.push("- thresholds_summary:");
+    lines.push(`  ${JSON.stringify(automation.thresholds_summary)}`);
+  }
+  if (automation.artifacts) {
+    lines.push("- artifacts:");
+    lines.push(`  ${JSON.stringify(automation.artifacts)}`);
+  }
  lines.push("");
  lines.push("## Environment");
  for (const [key, value] of Object.entries(environment)) lines.push(`- ${key}=${value}`);
@@ -126,6 +126,9 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
    ...validateEnvKeyScalar(item, "automation_pipeline_url_env"),
    ...validateEnvKeyScalar(item, "automation_pipeline_name_env"),
    ...validateJsonScalar(item, "automation_filesystem_checks_json"),
+    ...validateJsonScalar(item, "metrics_thresholds_json"),
+    ...validateJsonScalar(item, "load_profile_json"),
+    ...validateJsonScalar(item, "fault_model_json"),
    ...listValue(item.fields, "setup_automation").flatMap((entry) => (
      validateSetupAutomationEntry(root, entry, caseIds).map((error) => `${item.path}: ${error}`)
    )),
@@ -183,10 +186,62 @@ function validateCaseItem(root: string, item: StructuredItem, skillNames: Set<st
  if (timeout && (!/^\d+$/.test(timeout) || Number.parseInt(timeout, 10) <= 0)) {
    errors.push(`${item.path}: 'automation_response_timeout_ms' must be a positive integer string`);
  }
+  for (const key of [
+    "automation_debug_chat_load_requests",
+    "automation_debug_chat_load_concurrency",
+    "automation_debug_chat_load_timeout_ms",
+    "automation_debug_chat_load_response_p95_ms",
+    "automation_debug_chat_load_first_response_p95_ms",
+  ]) {
+    const value = scalar(item.fields, key);
+    if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) <= 0)) {
+      errors.push(`${item.path}: '${key}' must be a positive integer string`);
+    }
+  }
+  for (const key of [
+    "automation_debug_chat_load_min_error_count",
+    "automation_debug_chat_load_min_ok_count",
+    "automation_debug_chat_load_min_provider_fault_count",
+    "automation_fake_provider_first_token_delay_ms",
+    "automation_fake_provider_chunk_delay_ms",
+    "automation_fake_provider_chunk_count",
+    "automation_fake_provider_fail_first_n",
+    "automation_fake_provider_fail_every_n",
+  ]) {
+    const value = scalar(item.fields, key);
+    if (value && (!/^\d+$/.test(value) || Number.parseInt(value, 10) < 0)) {
+      errors.push(`${item.path}: '${key}' must be a non-negative integer string`);
+    }
+  }
+  for (const key of ["automation_debug_chat_load_max_error_rate", "automation_debug_chat_load_min_error_rate"]) {
+    const value = scalar(item.fields, key);
+    if (value && (!/^(?:0(?:\.\d+)?|1(?:\.0+)?)$/.test(value))) {
+      errors.push(`${item.path}: '${key}' must be a number string between 0 and 1`);
+    }
+  }
+  const fakeProviderFaultStatus = scalar(item.fields, "automation_fake_provider_fault_status");
+  if (fakeProviderFaultStatus) {
+    const parsed = Number.parseInt(fakeProviderFaultStatus, 10);
+    if (!/^\d+$/.test(fakeProviderFaultStatus) || parsed < 400 || parsed > 599) {
+      errors.push(`${item.path}: 'automation_fake_provider_fault_status' must be an HTTP 4xx or 5xx status string`);
+    }
+  }
  const streamOutput = scalar(item.fields, "automation_stream_output");
  if (streamOutput && !["0", "1", "false", "true"].includes(streamOutput)) {
    errors.push(`${item.path}: 'automation_stream_output' must be one of 0, 1, false, or true`);
  }
+  for (const key of [
+    "automation_debug_chat_load_stream",
+    "automation_debug_chat_load_reset",
+    "automation_debug_chat_load_fail_on_final_mismatch",
+    "automation_fake_provider_fail_after_first_chunk",
+    "automation_fake_provider_dynamic_response",
+  ]) {
+    const value = scalar(item.fields, key);
+    if (value && !["0", "1", "false", "true"].includes(value)) {
+      errors.push(`${item.path}: '${key}' must be one of 0, 1, false, or true`);
+    }
+  }
  const imageBase64Fixture = scalar(item.fields, "automation_image_base64_fixture");
  if (imageBase64Fixture && !existsSync(join(root, imageBase64Fixture))) {
    errors.push(`${item.path}: automation image fixture does not exist: ${imageBase64Fixture}`);
@@ -9,7 +9,18 @@ export const requiredEnvKeys = [
 ];

 export const caseModeValues = ["agent-browser", "probe"];
-export const caseTypeValues = ["smoke", "regression", "feature", "provider", "exploratory"];
+export const caseTypeValues = [
+  "smoke",
+  "regression",
+  "feature",
+  "provider",
+  "exploratory",
+  "contract",
+  "performance",
+  "reliability",
+  "chaos",
+  "security",
+];
 export const casePriorityValues = ["p0", "p1", "p2"];
 export const caseRiskValues = ["low", "medium", "high"];
 export const caseEvidenceValues = [
@@ -21,10 +32,24 @@ export const caseEvidenceValues = [
  "frontend_log",
  "api_diagnostic",
  "filesystem",
+  "metrics",
+  "trace",
+  "profile",
+  "resource_log",
 ];
 export const testResultStatusValues = ["pass", "fail", "blocked", "env_issue", "flaky"];
 export const troubleshootingCategoryValues = ["product", "env_issue", "external_dependency", "blocked", "flaky"];
-export const suiteTypeValues = ["smoke", "regression", "release_gate", "exploratory"];
+export const suiteTypeValues = [
+  "smoke",
+  "regression",
+  "release_gate",
+  "exploratory",
+  "contract",
+  "performance",
+  "reliability",
+  "chaos",
+  "security",
+];
 export const suiteRequiredStrings = ["id", "title", "description", "type", "priority"];
 export const suiteRequiredLists = ["tags", "cases"];

@@ -91,6 +91,7 @@ export type AutomationResultEvidence = {
  path?: string;
  result?: string;
  reason?: string;
+  duration_ms?: number;
  started_at?: string;
  started_at_local?: string;
  finished_at?: string;
@@ -98,6 +99,9 @@ export type AutomationResultEvidence = {
  url?: string;
  prompt?: string;
  expected_text?: string;
+  metrics_summary?: Record<string, unknown>;
+  thresholds_summary?: Record<string, unknown>;
+  artifacts?: Record<string, unknown>;
 };

 type MutableScanState = {
@@ -594,6 +598,18 @@ function stringField(data: Record<string, unknown>, key: string): string | undef
  return typeof value === "string" && value.trim() ? value : undefined;
 }

+function numberField(data: Record<string, unknown>, key: string): number | undefined {
+  const value = data[key];
+  return typeof value === "number" && Number.isFinite(value) ? value : undefined;
+}
+
+function objectField(data: Record<string, unknown>, key: string): Record<string, unknown> | undefined {
+  const value = data[key];
+  return value && typeof value === "object" && !Array.isArray(value)
+    ? value as Record<string, unknown>
+    : undefined;
+}
+
 function evidenceDirFromOptions(options: Record<string, string | boolean>): string | undefined {
  const explicit = typeof options["evidence-dir"] === "string" ? options["evidence-dir"] : undefined;
  if (explicit) return resolve(explicit);
@@ -628,6 +644,7 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
      path: resultPath,
      result: stringField(result, "status"),
      reason: stringField(result, "reason"),
+      duration_ms: numberField(result, "duration_ms"),
      started_at: stringField(result, "started_at"),
      started_at_local: stringField(result, "started_at_local"),
      finished_at: stringField(result, "finished_at"),
@@ -635,6 +652,9 @@ export function readAutomationResultEvidence(options: Record<string, string | bo
      url: stringField(result, "url"),
      prompt: redactSecrets(stringField(result, "prompt") ?? ""),
      expected_text: stringField(result, "expected_text"),
+      metrics_summary: objectField(result, "metrics_summary"),
+      thresholds_summary: objectField(result, "thresholds_summary"),
+      artifacts: objectField(result, "artifacts"),
    };
  } catch (error) {
    return { status: "invalid", path: resultPath, reason: String(error) };
@@ -114,6 +114,32 @@ export function automationEnvDefaults(item: StructuredItem, env: EnvSource = pro
    ["automation_expected_runner_id", "LANGBOT_E2E_EXPECTED_RUNNER_ID"],
    ["automation_reset_debug_chat", "LANGBOT_E2E_RESET_DEBUG_CHAT"],
    ["automation_debug_chat_session_type", "LANGBOT_E2E_DEBUG_CHAT_SESSION_TYPE"],
+    ["automation_debug_chat_response_p95_ms", "LANGBOT_E2E_DEBUG_CHAT_RESPONSE_P95_MS"],
+    ["automation_debug_chat_max_error_rate", "LANGBOT_E2E_DEBUG_CHAT_MAX_ERROR_RATE"],
+    ["automation_debug_chat_load_requests", "LANGBOT_DEBUG_CHAT_LOAD_REQUESTS"],
+    ["automation_debug_chat_load_concurrency", "LANGBOT_DEBUG_CHAT_LOAD_CONCURRENCY"],
+    ["automation_debug_chat_load_timeout_ms", "LANGBOT_DEBUG_CHAT_LOAD_TIMEOUT_MS"],
+    ["automation_debug_chat_load_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_RESPONSE_P95_MS"],
+    ["automation_debug_chat_load_first_response_p95_ms", "LANGBOT_DEBUG_CHAT_LOAD_FIRST_RESPONSE_P95_MS"],
+    ["automation_debug_chat_load_max_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MAX_ERROR_RATE"],
+    ["automation_debug_chat_load_min_error_rate", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_RATE"],
+    ["automation_debug_chat_load_min_error_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_ERROR_COUNT"],
+    ["automation_debug_chat_load_min_ok_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_OK_COUNT"],
+    ["automation_debug_chat_load_min_provider_fault_count", "LANGBOT_DEBUG_CHAT_LOAD_MIN_PROVIDER_FAULT_COUNT"],
+    ["automation_debug_chat_load_expected_prefix", "LANGBOT_DEBUG_CHAT_LOAD_EXPECTED_PREFIX"],
+    ["automation_debug_chat_load_prompt_template", "LANGBOT_DEBUG_CHAT_LOAD_PROMPT_TEMPLATE"],
+    ["automation_debug_chat_load_stream", "LANGBOT_DEBUG_CHAT_LOAD_STREAM"],
+    ["automation_debug_chat_load_reset", "LANGBOT_DEBUG_CHAT_LOAD_RESET"],
+    ["automation_debug_chat_load_fail_on_final_mismatch", "LANGBOT_DEBUG_CHAT_LOAD_FAIL_ON_FINAL_MISMATCH"],
+    ["automation_fake_provider_response_text", "LANGBOT_FAKE_PROVIDER_RESPONSE_TEXT"],
+    ["automation_fake_provider_first_token_delay_ms", "LANGBOT_FAKE_PROVIDER_FIRST_TOKEN_DELAY_MS"],
+    ["automation_fake_provider_chunk_delay_ms", "LANGBOT_FAKE_PROVIDER_CHUNK_DELAY_MS"],
+    ["automation_fake_provider_chunk_count", "LANGBOT_FAKE_PROVIDER_CHUNK_COUNT"],
+    ["automation_fake_provider_fail_first_n", "LANGBOT_FAKE_PROVIDER_FAIL_FIRST_N"],
+    ["automation_fake_provider_fail_every_n", "LANGBOT_FAKE_PROVIDER_FAIL_EVERY_N"],
+    ["automation_fake_provider_fault_status", "LANGBOT_FAKE_PROVIDER_FAULT_STATUS"],
+    ["automation_fake_provider_fail_after_first_chunk", "LANGBOT_FAKE_PROVIDER_FAIL_AFTER_FIRST_CHUNK"],
+    ["automation_fake_provider_dynamic_response", "LANGBOT_FAKE_PROVIDER_DYNAMIC_RESPONSE"],
    ["automation_filesystem_checks_json", "LANGBOT_E2E_FILESYSTEM_CHECKS_JSON"],
    ["automation_plugin_package", "LANGBOT_E2E_PLUGIN_PACKAGE"],
    ["automation_expected_plugin_id", "LANGBOT_E2E_EXPECTED_PLUGIN_ID"],
@@ -1,6 +1,6 @@
 import assert from "node:assert/strict";
 import { test } from "node:test";
-import { appendFileSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
+import { appendFileSync, chmodSync, existsSync, mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
 import { spawnSync } from "node:child_process";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
@@ -676,6 +676,82 @@ test("suite run JSON captures failed case output", () => {
  }
 });

+test("suite run preserves classified env_issue automation results", () => {
+  const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-env-issue-"));
+  try {
+    const skillDir = join(tmp, "skills", "langbot-testing");
+    const casesDir = join(skillDir, "cases");
+    const suitesDir = join(skillDir, "suites");
+    const scriptsDir = join(tmp, "scripts");
+    mkdirSync(casesDir, { recursive: true });
+    mkdirSync(suitesDir, { recursive: true });
+    mkdirSync(scriptsDir, { recursive: true });
+    writeFileSync(join(skillDir, "SKILL.md"), "---\nname: langbot-testing\ndescription: Testing.\n---\n\n# Testing\n");
+    writeFileSync(join(tmp, "skills", ".env"), "");
+    writeFileSync(
+      join(casesDir, "env-case.yaml"),
+      [
+        "id: env-case",
+        "title: Env Case",
+        "mode: probe",
+        "area: qa",
+        "type: smoke",
+        "priority: p2",
+        "risk: low",
+        "ci_eligible: true",
+        "automation: scripts/env-issue.mjs",
+        "evidence_required:",
+        "  - filesystem",
+      ].join("\n"),
+    );
+    writeFileSync(
+      join(suitesDir, "mini.yaml"),
+      [
+        "id: mini",
+        "title: Mini",
+        "description: Mini suite.",
+        "type: smoke",
+        "priority: p2",
+        "tags:",
+        "  - qa",
+        "cases:",
+        "  - env-case",
+      ].join("\n"),
+    );
+    writeFileSync(
+      join(scriptsDir, "env-issue.mjs"),
+      [
+        "import { mkdirSync, writeFileSync } from 'node:fs';",
+        "import { join } from 'node:path';",
+        "mkdirSync(process.env.LBS_EVIDENCE_DIR, { recursive: true });",
+        "const result = {",
+        "  case_id: process.env.LBS_CASE_ID,",
+        "  run_id: process.env.LBS_RUN_ID,",
+        "  status: 'env_issue',",
+        "  reason: 'backend not reachable',",
+        "  evidence_collected: ['filesystem']",
+        "};",
+        "writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'result.json'), JSON.stringify(result));",
+        "writeFileSync(join(process.env.LBS_EVIDENCE_DIR, 'automation-result.json'), JSON.stringify({ ...result, source: 'automation' }));",
+        "process.exit(2);",
+      ].join("\n"),
+    );
+
+    const result = capture(() => commandSuiteRun({
+      root: tmp,
+      args: ["suite", "run", "mini", "--run-id", "mini-run", "--evidence-dir", join(tmp, "evidence"), "--json"],
+    }));
+
+    assert.equal(result.code, 2);
+    const payload = JSON.parse(result.output);
+    assert.equal(payload.executions[0].status, "classified");
+    assert.equal(payload.report.status, "env_issue");
+    assert.equal(payload.report.execution_status, "ok");
+  } finally {
+    rmSync(tmp, { recursive: true, force: true });
+  }
+});
+
 test("suite run failure cannot be masked by stale pass result", () => {
  const tmp = mkdtempSync(join(tmpdir(), "lbs-suite-run-stale-pass-"));
  try {
@@ -1369,6 +1445,56 @@ test("env doctor does not require proxy variables", async () => {
  }
 });

+test("env doctor reports missing socksio for active SOCKS proxy", async () => {
+  const tmp = mkdtempSync(join(tmpdir(), "lbs-env-doctor-socksio-"));
+  const originalAllProxy = process.env.ALL_PROXY;
+  const originalAllProxyLower = process.env.all_proxy;
+  try {
+    delete process.env.ALL_PROXY;
+    delete process.env.all_proxy;
+    const skillsDir = join(tmp, "skills");
+    const repoDir = join(tmp, "LangBot");
+    const webDir = join(repoDir, "web");
+    const venvBin = join(repoDir, ".venv", "bin");
+    const browserProfile = join(tmp, "browser-profile");
+    const chromium = join(tmp, "chromium");
+    mkdirSync(skillsDir, { recursive: true });
+    mkdirSync(webDir, { recursive: true });
+    mkdirSync(venvBin, { recursive: true });
+    mkdirSync(browserProfile, { recursive: true });
+    writeFileSync(chromium, "");
+    const python = join(venvBin, "python");
+    writeFileSync(python, "#!/bin/sh\nexit 1\n");
+    chmodSync(python, 0o755);
+    writeFileSync(
+      join(skillsDir, ".env"),
+      [
+        "LANGBOT_BACKEND_URL=http://127.0.0.1:59996",
+        "LANGBOT_FRONTEND_URL=http://127.0.0.1:59996",
+        "LANGBOT_DEV_FRONTEND_URL=http://127.0.0.1:59996",
+        `LANGBOT_REPO=${repoDir}`,
+        `LANGBOT_WEB_REPO=${webDir}`,
+        `LANGBOT_BROWSER_PROFILE=${browserProfile}`,
+        `LANGBOT_CHROMIUM_EXECUTABLE=${chromium}`,
+        "ALL_PROXY=socks5://127.0.0.1:7890",
+      ].join("\n"),
+    );
+
+    const result = await captureAsync(() => commandEnvDoctor({ root: tmp, args: ["env", "doctor"] }));
+
+    assert.equal(result.code, 1);
+    assert.match(result.output, /FAIL: SOCKS proxy ALL_PROXY is configured/);
+    assert.match(result.output, /cannot import socksio/);
+    assert.match(result.output, /-m pip install socksio/);
+  } finally {
+    if (originalAllProxy === undefined) delete process.env.ALL_PROXY;
+    else process.env.ALL_PROXY = originalAllProxy;
+    if (originalAllProxyLower === undefined) delete process.env.all_proxy;
+    else process.env.all_proxy = originalAllProxyLower;
+    rmSync(tmp, { recursive: true, force: true });
+  }
+});
+
 test("env show redacts secret-like values by default", () => {
  const tmp = mkdtempSync(join(tmpdir(), "lbs-env-show-redact-"));
  try {
@@ -2521,6 +2647,38 @@ test("test report renders a reusable evidence template", () => {
  assert.match(result.output, /no log files provided/);
 });

+test("test report promotes loaded automation evidence into result section", () => {
+  const tmp = mkdtempSync(join(tmpdir(), "lbs-report-automation-"));
+  try {
+    writeFileSync(
+      join(tmp, "automation-result.json"),
+      JSON.stringify({
+        status: "pass",
+        reason: "latency thresholds passed",
+        url: "http://127.0.0.1:5300",
+        artifacts: { metrics_json: join(tmp, "metrics.json") },
+      }),
+    );
+
+    const result = capture(() => commandTestReport(ctx([
+      "test",
+      "report",
+      "langbot-live-backend-latency",
+      "--evidence-dir",
+      tmp,
+      "--no-auto-log",
+    ])));
+
+    assert.equal(result.code, 0);
+    assert.match(result.output, /## Result\n- result: pass\n- reason: latency thresholds passed/);
+    assert.match(result.output, /- target_tested: http:\/\/127\.0\.0\.1:5300/);
+    assert.doesNotMatch(result.output, /target_tested: TODO/);
+    assert.match(result.output, /## Automation Result/);
+  } finally {
+    rmSync(tmp, { recursive: true, force: true });
+  }
+});
+
 test("validate rejects dangling case references and missing automation scripts", () => {
  const tmp = mkdtempSync(join(tmpdir(), "lbs-validate-strict-"));
  try {
@@ -1,37 +0,0 @@
-"""Agent runner subsystem for LangBot."""
-from __future__ import annotations
-
-from .runner.descriptor import AgentRunnerDescriptor
-from .runner.id import parse_runner_id, format_runner_id, RunnerIdParts, is_plugin_runner_id
-from .runner.errors import (
-    AgentRunnerError,
-    RunnerNotFoundError,
-    RunnerNotAuthorizedError,
-    RunnerProtocolError,
-    RunnerExecutionError,
-)
-from .runner.registry import AgentRunnerRegistry
-from .runner.context_builder import AgentRunContextBuilder
-from .runner.resource_builder import AgentResourceBuilder
-from .runner.result_normalizer import AgentResultNormalizer
-from .runner.orchestrator import AgentRunOrchestrator
-from .runner.config_migration import ConfigMigration
-
-__all__ = [
-    'AgentRunnerDescriptor',
-    'parse_runner_id',
-    'format_runner_id',
-    'is_plugin_runner_id',
-    'RunnerIdParts',
-    'AgentRunnerError',
-    'RunnerNotFoundError',
-    'RunnerNotAuthorizedError',
-    'RunnerProtocolError',
-    'RunnerExecutionError',
-    'AgentRunnerRegistry',
-    'AgentRunContextBuilder',
-    'AgentResourceBuilder',
-    'AgentResultNormalizer',
-    'AgentRunOrchestrator',
-    'ConfigMigration',
-]
@@ -1,66 +0,0 @@
-"""Agent runner modules."""
-
-from __future__ import annotations
-
-from .descriptor import AgentRunnerDescriptor
-from .id import parse_runner_id, format_runner_id, RunnerIdParts
-from .errors import (
-    AgentRunnerError,
-    RunnerNotFoundError,
-    RunnerNotAuthorizedError,
-    RunnerProtocolError,
-    RunnerExecutionError,
-)
-from .registry import AgentRunnerRegistry
-from .context_builder import AgentRunContextBuilder
-from .resource_builder import AgentResourceBuilder
-from .result_normalizer import AgentResultNormalizer
-from .orchestrator import AgentRunOrchestrator
-from .config_migration import ConfigMigration
-from .default_config import AgentRunnerDefaultConfigService
-from .binding_resolver import AgentBindingResolver, AgentBindingResolutionError
-from .session_registry import (
-    AgentRunSessionRegistry,
-    AgentRunSession,
-    RunAuthorizationSnapshot,
-    get_session_registry,
-)
-from .run_ledger_store import RunLedgerStore
-from .events import (
-    MESSAGE_RECEIVED,
-    MESSAGE_RECALLED,
-    GROUP_MEMBER_JOINED,
-    FRIEND_REQUEST_RECEIVED,
-    RESERVED_EVENT_TYPES,
-)
-
-__all__ = [
-    'AgentRunnerDescriptor',
-    'parse_runner_id',
-    'format_runner_id',
-    'RunnerIdParts',
-    'AgentRunnerError',
-    'RunnerNotFoundError',
-    'RunnerNotAuthorizedError',
-    'RunnerProtocolError',
-    'RunnerExecutionError',
-    'AgentRunnerRegistry',
-    'AgentRunContextBuilder',
-    'AgentResourceBuilder',
-    'AgentResultNormalizer',
-    'AgentRunOrchestrator',
-    'ConfigMigration',
-    'AgentRunnerDefaultConfigService',
-    'AgentBindingResolver',
-    'AgentBindingResolutionError',
-    'AgentRunSessionRegistry',
-    'AgentRunSession',
-    'RunAuthorizationSnapshot',
-    'get_session_registry',
-    'RunLedgerStore',
-    'MESSAGE_RECEIVED',
-    'MESSAGE_RECALLED',
-    'GROUP_MEMBER_JOINED',
-    'FRIEND_REQUEST_RECEIVED',
-    'RESERVED_EVENT_TYPES',
-]
@@ -1,70 +0,0 @@
-"""Resolve host events to one effective Agent binding."""
-
-from __future__ import annotations
-
-from .host_models import AgentConfig, AgentBinding, AgentEventEnvelope, BindingScope
-
-
-class AgentBindingResolutionError(Exception):
-    """Raised when an event cannot resolve to exactly one Agent binding."""
-
-
-class AgentBindingResolver:
-    """Resolve an event to a single AgentBinding.
-
-    The target product model is one bot / IM channel -> one Agent. Fan-out,
-    observer agents, or multi-runner arbitration require separate delivery and
-    state semantics and are intentionally not hidden in this resolver.
-    """
-
-    def resolve_one(
-        self,
-        event: AgentEventEnvelope,
-        agents: list[AgentConfig],
-    ) -> AgentBinding:
-        """Resolve exactly one enabled Agent for the event.
-
-        Callers that source agents from bot/workspace/global configuration must
-        pre-filter candidates to the event scope before calling this resolver.
-        The current AgentConfig model represents one already-selected product
-        Agent and does not carry enough scope metadata to make that decision
-        safely here.
-        """
-        matches = [
-            agent
-            for agent in agents
-            if agent.enabled and event.event_type in agent.event_types
-        ]
-
-        if not matches:
-            raise AgentBindingResolutionError(
-                f'No Agent binding matches event_type={event.event_type}'
-            )
-
-        if len(matches) > 1:
-            agent_ids = ', '.join(agent.agent_id or '<anonymous>' for agent in matches)
-            raise AgentBindingResolutionError(
-                f'Multiple Agent bindings match event_type={event.event_type}: {agent_ids}'
-            )
-
-        return self._to_binding(matches[0])
-
-    def _to_binding(self, agent: AgentConfig) -> AgentBinding:
-        """Project product-level Agent config into the run-time binding model."""
-        scope = BindingScope(
-            scope_type='agent',
-            scope_id=agent.agent_id,
-        )
-
-        return AgentBinding(
-            binding_id=f"agent_{agent.agent_id or 'default'}_{agent.runner_id}",
-            scope=scope,
-            event_types=list(agent.event_types),
-            runner_id=agent.runner_id,
-            runner_config=agent.runner_config,
-            resource_policy=agent.resource_policy,
-            state_policy=agent.state_policy,
-            delivery_policy=agent.delivery_policy,
-            enabled=agent.enabled,
-            agent_id=agent.agent_id,
-        )
@@ -1,171 +0,0 @@
-"""Helpers for the current AgentRunner config shape."""
-
-from __future__ import annotations
-
-import typing
-
-
-LEGACY_RUNNER_ID_MAP: dict[str, str] = {
-    'local-agent': 'plugin:langbot/local-agent/default',
-    'dify-service-api': 'plugin:langbot/dify-agent/default',
-    'n8n-service-api': 'plugin:langbot/n8n-agent/default',
-    'coze-api': 'plugin:langbot/coze-agent/default',
-    'dashscope-app-api': 'plugin:langbot/dashscope-agent/default',
-    'deerflow-api': 'plugin:langbot/deerflow-agent/default',
-    'langflow-api': 'plugin:langbot/langflow-agent/default',
-    'tbox-app-api': 'plugin:langbot/tbox-agent/default',
-    'weknora-api': 'plugin:langbot/weknora-agent/default',
-}
-
-
-class ConfigMigration:
-    """Configuration helper for agent runner IDs.
-
-    Responsibilities:
-    - Resolve runner ID from ai.runner.id
-    - Migrate legacy ai.runner.runner + ai.<runner-name> blocks
-    - Extract current Agent/runner config from ai.runner_config
-    - Keep the current config container shape stable on save
-    """
-
-    @staticmethod
-    def resolve_runner_id(pipeline_config: dict[str, typing.Any]) -> str | None:
-        """Resolve runner ID from current configuration.
-
-        Args:
-            pipeline_config: Current configuration container
-
-        Returns:
-            Runner ID string, or None if not configured
-        """
-        ai_config = pipeline_config.get('ai', {})
-        runner_config = ai_config.get('runner', {})
-
-        runner_id = runner_config.get('id')
-        if runner_id:
-            return runner_id
-
-        legacy_runner = runner_config.get('runner')
-        if isinstance(legacy_runner, str):
-            return LEGACY_RUNNER_ID_MAP.get(legacy_runner)
-
-        return None
-
-    @staticmethod
-    def resolve_runner_config(
-        pipeline_config: dict[str, typing.Any],
-        runner_id: str,
-    ) -> dict[str, typing.Any]:
-        """Resolve Agent/runner configuration from the current container.
-
-        Args:
-            pipeline_config: Current configuration container
-            runner_id: Resolved runner ID
-
-        Returns:
-            Runner configuration dict (empty if not found)
-        """
-        ai_config = pipeline_config.get('ai', {})
-
-        runner_configs = ai_config.get('runner_config', {})
-        if runner_id in runner_configs:
-            return runner_configs[runner_id]
-
-        legacy_runner = ConfigMigration._legacy_runner_name_for_id(runner_id)
-        if legacy_runner and isinstance(ai_config.get(legacy_runner), dict):
-            return ConfigMigration._normalize_legacy_runner_config(
-                legacy_runner,
-                ai_config[legacy_runner],
-            )
-
-        return {}
-
-    @staticmethod
-    def get_expire_time(pipeline_config: dict[str, typing.Any]) -> int:
-        """Get conversation expire time from configuration.
-
-        Args:
-            pipeline_config: Current configuration container
-
-        Returns:
-            Expire time in seconds (0 means no expiry)
-        """
-        ai_config = pipeline_config.get('ai', {})
-        runner_config = ai_config.get('runner', {})
-        return runner_config.get('expire-time', 0)
-
-    @staticmethod
-    def migrate_pipeline_config(pipeline_config: dict[str, typing.Any]) -> dict[str, typing.Any]:
-        """Normalize the current config container before saving.
-
-        Args:
-            pipeline_config: Original configuration
-
-        Returns:
-            Configuration with explicit ai.runner and ai.runner_config containers
-        """
-        new_config = dict(pipeline_config)
-        if 'ai' not in new_config:
-            return new_config
-
-        ai_config = dict(new_config.get('ai', {}))
-
-        runner_config = dict(ai_config.get('runner', {}))
-        runner_configs = dict(ai_config.get('runner_config', {}))
-
-        legacy_runner = runner_config.get('runner')
-        mapped_runner_id = None
-        if isinstance(legacy_runner, str):
-            mapped_runner_id = LEGACY_RUNNER_ID_MAP.get(legacy_runner)
-
-        if mapped_runner_id and not runner_config.get('id'):
-            runner_config = {
-                key: value
-                for key, value in runner_config.items()
-                if key != 'runner'
-            }
-            runner_config['id'] = mapped_runner_id
-
-        if mapped_runner_id and mapped_runner_id not in runner_configs:
-            legacy_config = ai_config.get(legacy_runner)
-            if isinstance(legacy_config, dict):
-                runner_configs[mapped_runner_id] = ConfigMigration._normalize_legacy_runner_config(
-                    legacy_runner,
-                    legacy_config,
-                )
-
-        ai_config['runner'] = runner_config
-        ai_config['runner_config'] = runner_configs
-        if mapped_runner_id and legacy_runner in ai_config:
-            ai_config.pop(legacy_runner, None)
-        new_config['ai'] = ai_config
-
-        return new_config
-
-    @staticmethod
-    def _legacy_runner_name_for_id(runner_id: str) -> str | None:
-        for legacy_runner, mapped_runner_id in LEGACY_RUNNER_ID_MAP.items():
-            if mapped_runner_id == runner_id:
-                return legacy_runner
-        return None
-
-    @staticmethod
-    def _normalize_legacy_runner_config(
-        legacy_runner: str,
-        legacy_config: dict[str, typing.Any],
-    ) -> dict[str, typing.Any]:
-        """Normalize legacy runner config blocks to current plugin schema quirks."""
-        normalized = dict(legacy_config)
-
-        if legacy_runner == 'local-agent':
-            model = normalized.get('model')
-            if isinstance(model, str):
-                normalized['model'] = {
-                    'primary': model,
-                    'fallbacks': [],
-                }
-            knowledge_base = normalized.pop('knowledge-base', None)
-            if 'knowledge-bases' not in normalized and isinstance(knowledge_base, str):
-                normalized['knowledge-bases'] = [] if knowledge_base in {'', '__none__', '__none'} else [knowledge_base]
-
-        return normalized
@@ -1,204 +0,0 @@
-"""Helpers for interpreting AgentRunner DynamicForm configuration."""
-from __future__ import annotations
-
-import typing
-
-from .descriptor import AgentRunnerDescriptor
-
-
-FORM_ITEM_TYPE_ALIASES = {
-    'select-llm-model': 'llm-model-selector',
-    'select-knowledge-bases': 'knowledge-base-multi-selector',
-}
-LLM_MODEL_SELECTOR_TYPES = {'model-fallback-selector', 'llm-model-selector'}
-KB_SELECTOR_TYPES = {'knowledge-base-multi-selector'}
-PROMPT_EDITOR_TYPES = {'prompt-editor'}
-NONE_SENTINELS = {'', '__none__', '__none'}
-
-
-def normalize_schema_item_type(item_type: typing.Any) -> typing.Any:
-    """Normalize legacy/frontend DynamicForm aliases to protocol field types."""
-    if not isinstance(item_type, str):
-        return item_type
-    return FORM_ITEM_TYPE_ALIASES.get(item_type, item_type)
-
-
-def iter_schema_items(
-    descriptor: AgentRunnerDescriptor | None,
-    field_types: set[str],
-) -> typing.Iterator[dict[str, typing.Any]]:
-    """Yield descriptor config schema items whose type is in field_types."""
-    if descriptor is None:
-        return
-    for item in descriptor.config_schema or []:
-        if not isinstance(item, dict):
-            continue
-        if normalize_schema_item_type(item.get('type')) in field_types:
-            yield item
-
-
-def uses_host_models(descriptor: AgentRunnerDescriptor | None) -> bool:
-    """Return whether LangBot should resolve model resources for this runner."""
-    return any(True for _ in iter_schema_items(descriptor, LLM_MODEL_SELECTOR_TYPES))
-
-
-def uses_host_tools(descriptor: AgentRunnerDescriptor | None) -> bool:
-    """Return whether LangBot should expose tool resources to this runner."""
-    return descriptor is not None and descriptor.supports_tool_calling()
-
-
-def uses_host_knowledge_bases(descriptor: AgentRunnerDescriptor | None) -> bool:
-    """Return whether LangBot should expose knowledge-base resources to this runner."""
-    return descriptor is not None and descriptor.supports_knowledge_retrieval()
-
-
-def supports_skill_authoring(descriptor: AgentRunnerDescriptor | None) -> bool:
-    """Return whether the runner wants Host skill-authoring tools."""
-    if descriptor is None:
-        return False
-    return descriptor.capabilities.skill_authoring
-
-
-def extract_prompt_config(
-    descriptor: AgentRunnerDescriptor | None,
-    runner_config: dict[str, typing.Any],
-    default_prompt: list[dict[str, typing.Any]],
-) -> list[dict[str, typing.Any]]:
-    """Extract the prompt-editor value selected by the runner schema."""
-    for item in iter_schema_items(descriptor, PROMPT_EDITOR_TYPES):
-        field_name = item.get('name')
-        if field_name and field_name in runner_config:
-            configured_prompt = runner_config[field_name]
-            if isinstance(configured_prompt, list):
-                return configured_prompt
-        default_value = item.get('default')
-        if isinstance(default_value, list):
-            return default_value
-    return default_prompt
-
-
-def extract_model_selection(
-    descriptor: AgentRunnerDescriptor | None,
-    runner_config: dict[str, typing.Any],
-) -> tuple[str, list[str]]:
-    """Extract primary/fallback LLM selections from schema-defined fields."""
-    primary_uuid = ''
-    fallback_uuids: list[str] = []
-
-    for item in iter_schema_items(descriptor, LLM_MODEL_SELECTOR_TYPES):
-        field_name = item.get('name')
-        if not field_name:
-            continue
-
-        value = runner_config.get(field_name, item.get('default'))
-        item_type = normalize_schema_item_type(item.get('type'))
-        if item_type == 'model-fallback-selector':
-            if isinstance(value, str):
-                primary_uuid = value
-            elif isinstance(value, dict):
-                primary_uuid = value.get('primary') or ''
-                fallbacks = value.get('fallbacks', [])
-                if isinstance(fallbacks, list):
-                    fallback_uuids = [fallback for fallback in fallbacks if isinstance(fallback, str)]
-            break
-
-        if item_type == 'llm-model-selector' and isinstance(value, str):
-            primary_uuid = value
-            break
-
-    return primary_uuid, fallback_uuids
-
-
-def extract_knowledge_base_uuids(
-    descriptor: AgentRunnerDescriptor | None,
-    runner_config: dict[str, typing.Any],
-) -> list[str]:
-    """Extract configured knowledge-base UUIDs from schema-defined fields."""
-    if not uses_host_knowledge_bases(descriptor):
-        return []
-
-    kb_uuids: list[str] = []
-    for item in iter_schema_items(descriptor, KB_SELECTOR_TYPES):
-        field_name = item.get('name')
-        if not field_name:
-            continue
-        value = runner_config.get(field_name, item.get('default', []))
-        if isinstance(value, list):
-            kb_uuids.extend(
-                kb_uuid for kb_uuid in value if isinstance(kb_uuid, str) and kb_uuid not in NONE_SENTINELS
-            )
-
-    return list(dict.fromkeys(kb_uuids))
-
-
-def iter_config_model_refs(
-    descriptor: AgentRunnerDescriptor,
-    runner_config: dict[str, typing.Any],
-) -> typing.Iterator[tuple[str, str]]:
-    """Yield model references declared by schema-defined model selector fields."""
-    for item in descriptor.config_schema or []:
-        if not isinstance(item, dict):
-            continue
-
-        field_name = item.get('name')
-        field_type = normalize_schema_item_type(item.get('type'))
-        if not field_name or field_name not in runner_config:
-            continue
-
-        value = runner_config.get(field_name)
-        if field_type == 'model-fallback-selector':
-            if isinstance(value, str) and value not in NONE_SENTINELS:
-                yield 'llm', value
-            elif isinstance(value, dict):
-                primary = value.get('primary')
-                if isinstance(primary, str) and primary not in NONE_SENTINELS:
-                    yield 'llm', primary
-                fallbacks = value.get('fallbacks', [])
-                if isinstance(fallbacks, list):
-                    for fallback_uuid in fallbacks:
-                        if isinstance(fallback_uuid, str) and fallback_uuid not in NONE_SENTINELS:
-                            yield 'llm', fallback_uuid
-        elif field_type == 'llm-model-selector':
-            if isinstance(value, str) and value not in NONE_SENTINELS:
-                yield 'llm', value
-        elif field_type == 'rerank-model-selector':
-            if isinstance(value, str) and value not in NONE_SENTINELS:
-                yield 'rerank', value
-
-
-def set_empty_llm_model_selection(
-    descriptor: AgentRunnerDescriptor,
-    runner_config: dict[str, typing.Any],
-    model_uuid: str,
-) -> bool:
-    """Set the first empty schema-defined LLM selector to model_uuid."""
-    for item in iter_schema_items(descriptor, LLM_MODEL_SELECTOR_TYPES):
-        field_name = item.get('name')
-        field_type = normalize_schema_item_type(item.get('type'))
-        if not field_name:
-            continue
-
-        value = runner_config.get(field_name, item.get('default'))
-        if field_type == 'model-fallback-selector':
-            if isinstance(value, dict):
-                primary = value.get('primary') or ''
-                if primary not in NONE_SENTINELS:
-                    return False
-                fallbacks = value.get('fallbacks', [])
-                runner_config[field_name] = {
-                    'primary': model_uuid,
-                    'fallbacks': fallbacks if isinstance(fallbacks, list) else [],
-                }
-                return True
-            if isinstance(value, str) and value not in NONE_SENTINELS:
-                return False
-            runner_config[field_name] = {'primary': model_uuid, 'fallbacks': []}
-            return True
-
-        if field_type == 'llm-model-selector':
-            if isinstance(value, str) and value not in NONE_SENTINELS:
-                return False
-            runner_config[field_name] = model_uuid
-            return True
-
-    return False
@@ -1,490 +0,0 @@
-"""Agent run context builder for provisioning AgentRunContext envelopes."""
-
-from __future__ import annotations
-
-import uuid
-import time
-import typing
-
-from ...core import app
-from .descriptor import AgentRunnerDescriptor
-from .persistent_state_store import get_persistent_state_store
-from .host_models import AgentEventEnvelope, AgentBinding
-
-
-DEFAULT_RUNNER_TIMEOUT_SECONDS = 300
-
-
-# Internal models for the agent runner context protocol.
-
-
-class AgentTrigger(typing.TypedDict):
-    """Agent trigger information."""
-
-    type: str
-    source: str
-    timestamp: int | None
-
-
-class ConversationContext(typing.TypedDict):
-    """Conversation context."""
-
-    conversation_id: str | None
-    thread_id: str | None
-    launcher_type: str | None
-    launcher_id: str | None
-    sender_id: str | None
-    bot_id: str | None
-    workspace_id: str | None
-    session_id: str | None
-
-
-class AgentInput(typing.TypedDict):
-    """Agent input."""
-
-    text: str | None
-    contents: list[dict[str, typing.Any]]
-    attachments: list[dict[str, typing.Any]]
-
-
-class AgentRunState(typing.TypedDict):
-    """Agent run state with 4 scopes."""
-
-    conversation: dict[str, typing.Any]
-    actor: dict[str, typing.Any]
-    subject: dict[str, typing.Any]
-    runner: dict[str, typing.Any]
-
-
-# Resource payload models matching langbot-plugin-sdk/resources.py.
-
-
-class ModelResource(typing.TypedDict):
-    """Model resource payload."""
-
-    model_id: str
-    model_type: str | None
-    provider: str | None
-    operations: list[str]
-
-
-class ToolResource(typing.TypedDict):
-    """Tool resource payload."""
-
-    tool_name: str
-    tool_type: str | None
-    description: str | None
-    operations: list[str]
-
-
-class KnowledgeBaseResource(typing.TypedDict):
-    """Knowledge base resource payload."""
-
-    kb_id: str
-    kb_name: str | None
-    kb_type: str | None
-    operations: list[str]
-
-
-class SkillResource(typing.TypedDict):
-    """Skill resource payload."""
-
-    skill_name: str
-    display_name: str | None
-    description: str | None
-
-
-class StorageResource(typing.TypedDict):
-    """Storage resource payload."""
-
-    plugin_storage: bool
-    workspace_storage: bool
-
-
-class AgentResources(typing.TypedDict):
-    """Agent resources payload."""
-
-    models: list[ModelResource]
-    tools: list[ToolResource]
-    knowledge_bases: list[KnowledgeBaseResource]
-    skills: list[SkillResource]
-    storage: StorageResource
-    platform_capabilities: dict[str, typing.Any]
-
-
-class AgentRuntimeContext(typing.TypedDict):
-    """Agent runtime context."""
-
-    langbot_version: str | None
-    trace_id: str | None
-    deadline_at: float | None
-    metadata: dict[str, typing.Any]
-
-
-class AgentRunContextPayload(typing.TypedDict):
-    """AgentRunContext payload passed to an agent runner.
-
-    Protocol v1 structure - matches SDK AgentRunContext.
-
-    Note: The 'config' field contains the current Agent/runner config
-    from ai.runner_config[runner_id] while the current Query entry remains
-    a temporary configuration container. It is not plugin instance config.
-    """
-
-    run_id: str
-    trigger: AgentTrigger
-    conversation: ConversationContext | None
-    event: dict[str, typing.Any]  # REQUIRED for Protocol v1
-    actor: dict[str, typing.Any] | None
-    subject: dict[str, typing.Any] | None
-    input: AgentInput
-    delivery: dict[str, typing.Any]  # REQUIRED for Protocol v1
-    resources: AgentResources
-    context: dict[str, typing.Any]  # ContextAccess - REQUIRED for Protocol v1
-    state: AgentRunState
-    runtime: AgentRuntimeContext
-    config: dict[str, typing.Any]  # Agent/runner config from ai.runner_config[runner_id]
-    adapter: dict[str, typing.Any] | None  # Entry adapter context
-    metadata: dict[str, typing.Any]  # Additional metadata
-
-
-class AgentRunContextBuilder:
-    """Builder for provisioning AgentRunContext.
-
-    Responsibilities:
-    - Generate new run_id (UUID, not query id)
-    - Set trigger type based on event source
-    - Build conversation context from event
-    - Build input from event
-    - Build state snapshot from PersistentStateStore
-    - Build runtime context with host info, trace_id, deadline
-    - Set config from current Agent/runner configuration.
-
-    Query adaptation belongs to QueryEntryAdapter, not this builder.
-    """
-
-    ap: app.Application
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-
-    @staticmethod
-    def _positive_int(value: typing.Any) -> int | None:
-        if isinstance(value, bool):
-            return None
-        if isinstance(value, int) and value > 0:
-            return value
-        if isinstance(value, str) and value.isdigit():
-            parsed_value = int(value)
-            if parsed_value > 0:
-                return parsed_value
-        return None
-
-    @staticmethod
-    def _is_llm_model_resource(model_resource: ModelResource) -> bool:
-        operations = model_resource.get('operations')
-        if isinstance(operations, list) and operations:
-            return bool({'invoke', 'stream'} & {str(operation) for operation in operations})
-        return model_resource.get('model_type') != 'rerank'
-
-    async def _build_model_context_window_tokens(self, resources: AgentResources) -> int | None:
-        model_mgr = getattr(self.ap, 'model_mgr', None)
-        if model_mgr is None:
-            return None
-
-        for model_resource in resources.get('models', []):
-            if not self._is_llm_model_resource(model_resource):
-                continue
-
-            model_uuid = model_resource.get('model_id')
-            if not isinstance(model_uuid, str) or not model_uuid:
-                continue
-
-            try:
-                model = await model_mgr.get_model_by_uuid(model_uuid)
-            except Exception as exc:
-                logger = getattr(self.ap, 'logger', None)
-                if logger is not None:
-                    logger.debug(f'Failed to resolve model context window for {model_uuid}: {exc}')
-                continue
-
-            model_entity = getattr(model, 'model_entity', None)
-            context_length = self._positive_int(getattr(model_entity, 'context_length', None))
-            return context_length
-
-        return None
-
-    async def build_context_from_event(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-        resources: AgentResources,
-    ) -> AgentRunContextPayload:
-        """Build AgentRunContext from event-first envelope.
-
-        This is the main entry point for Protocol v1.
-        Does NOT inline full history by default.
-
-        Args:
-            event: Event envelope
-            binding: Agent binding
-            descriptor: Runner descriptor
-            resources: Built resources
-
-        Returns:
-            AgentRunContextPayload for the runner
-        """
-        # Generate new run_id
-        run_id = str(uuid.uuid4())
-
-        # Build trigger from event
-        trigger: AgentTrigger = {
-            'type': event.event_type,
-            'source': event.source,
-            'timestamp': event.event_time or int(time.time()),
-        }
-
-        # Build conversation context from event
-        conversation: ConversationContext | None = None
-        if event.conversation_id:
-            conversation = {
-                'session_id': None,
-                'conversation_id': event.conversation_id,
-                'thread_id': event.thread_id,
-                'launcher_type': None,  # Will be filled from actor/subject if needed
-                'launcher_id': None,
-                'sender_id': event.actor.actor_id if event.actor else None,
-                'bot_id': event.bot_id,
-                'workspace_id': event.workspace_id,
-            }
-
-        # Build event context (Protocol v1 event-first)
-        event_context = {
-            'event_id': event.event_id,
-            'event_type': event.event_type,
-            'event_time': event.event_time,
-            'source': event.source,
-            'source_event_type': event.source_event_type,
-            'raw_ref': event.raw_ref.model_dump(mode='json') if event.raw_ref else None,
-            'data': event.data,
-        }
-
-        # Build actor context
-        actor_context = None
-        if event.actor:
-            actor_context = {
-                'actor_type': event.actor.actor_type,
-                'actor_id': event.actor.actor_id,
-                'actor_name': event.actor.actor_name,
-            }
-
-        # Build subject context
-        subject_context = None
-        if event.subject:
-            subject_context = {
-                'subject_type': event.subject.subject_type,
-                'subject_id': event.subject.subject_id,
-                'data': event.subject.data,
-            }
-
-        # Build input from event
-        input: AgentInput = {
-            'text': event.input.text,
-            'contents': [c.model_dump(mode='json') if hasattr(c, 'model_dump') else c for c in event.input.contents],
-            'attachments': [
-                a.model_dump(mode='json') if hasattr(a, 'model_dump') else a for a in event.input.attachments
-            ],
-        }
-
-        # Build context access (no history inlined by default for Protocol v1)
-        # Populate with actual values from stores
-        context_access = await self._build_context_access(event, descriptor, binding)
-
-        # Build state snapshot from persistent state store (event-first Protocol v1)
-        persistent_state_store = get_persistent_state_store(self.ap.persistence_mgr.get_db_engine())
-        state: AgentRunState = await persistent_state_store.build_snapshot_from_event(event, binding, descriptor)
-
-        model_context_window_tokens = await self._build_model_context_window_tokens(resources)
-
-        # Build runtime context
-        runtime: AgentRuntimeContext = {
-            'langbot_version': self.ap.ver_mgr.get_current_version(),
-            'trace_id': run_id,
-            'deadline_at': self._build_deadline_from_binding(binding),
-            'metadata': {
-                'bot_id': event.bot_id,
-                'workspace_id': event.workspace_id,
-                'streaming_supported': event.delivery.supports_streaming,
-                'model_context_window_tokens': model_context_window_tokens,
-            },
-        }
-
-        # Build delivery context
-        delivery_context = {
-            'surface': event.delivery.surface,
-            'reply_target': event.delivery.reply_target,
-            'supports_streaming': event.delivery.supports_streaming,
-            'supports_edit': event.delivery.supports_edit,
-            'supports_reaction': event.delivery.supports_reaction,
-            'max_message_size': event.delivery.max_message_size,
-            'platform_capabilities': event.delivery.platform_capabilities,
-        }
-
-        # Build adapter context (empty for event-first)
-        adapter_context = {
-            'extra': {},
-        }
-
-        # Build full context - Protocol v1 structure
-        context: AgentRunContextPayload = {
-            'run_id': run_id,
-            'trigger': trigger,
-            'conversation': conversation,
-            'event': event_context,  # REQUIRED
-            'actor': actor_context,
-            'subject': subject_context,
-            'input': input,
-            'delivery': delivery_context,  # REQUIRED
-            'resources': resources,
-            'context': context_access,  # ContextAccess - REQUIRED
-            'state': state,
-            'runtime': runtime,
-            'config': binding.runner_config,
-            'adapter': adapter_context,
-            'metadata': {},  # Additional metadata
-        }
-
-        return context
-
-    def _build_deadline_from_binding(self, binding: AgentBinding) -> float | None:
-        """Build deadline timestamp from binding timeout config.
-
-        Args:
-            binding: Agent binding with runner_config
-
-        Returns:
-            Deadline timestamp or None
-        """
-        timeout = binding.runner_config.get('timeout', DEFAULT_RUNNER_TIMEOUT_SECONDS)
-        if timeout is None:
-            return None
-
-        try:
-            timeout_seconds = float(timeout)
-        except (TypeError, ValueError):
-            return None
-
-        if timeout_seconds <= 0:
-            return None
-
-        return time.time() + timeout_seconds
-
-    async def _build_context_access(
-        self,
-        event: AgentEventEnvelope,
-        descriptor: AgentRunnerDescriptor,
-        binding: AgentBinding | None = None,
-    ) -> dict[str, typing.Any]:
-        """Build ContextAccess with actual values from stores.
-
-        Args:
-            event: Event envelope
-            descriptor: Runner descriptor
-            binding: Agent binding (required for state_policy in event-first mode)
-
-        Returns:
-            ContextAccess dict
-        """
-        conversation_id = event.conversation_id
-        permissions = descriptor.permissions
-        history_perms = set(permissions.history)
-        event_perms = set(permissions.events)
-        storage_perms = set(permissions.storage)
-
-        history_page_enabled = 'page' in history_perms and conversation_id is not None
-        history_search_enabled = 'search' in history_perms and conversation_id is not None
-        event_get_enabled = 'get' in event_perms
-        event_page_enabled = 'page' in event_perms and conversation_id is not None
-        steering_pull_enabled = (
-            bool(getattr(descriptor.capabilities, 'steering', False)) and conversation_id is not None
-        )
-        run_get_enabled = True
-        run_list_enabled = conversation_id is not None
-        run_events_page_enabled = True
-        run_cancel_enabled = True
-        run_append_result_enabled = False
-        run_finalize_enabled = False
-        run_claim_enabled = False
-        run_renew_claim_enabled = False
-        run_release_claim_enabled = False
-        runtime_register_enabled = False
-        runtime_heartbeat_enabled = False
-        runtime_list_enabled = False
-
-        # Determine state API availability based on binding state_policy.
-        state_enabled = False
-        storage_enabled = False
-        if binding is not None:
-            state_policy = binding.state_policy
-            if state_policy.enable_state and state_policy.state_scopes:
-                state_enabled = True
-
-            resource_policy = binding.resource_policy
-            storage_enabled = ('plugin' in storage_perms and resource_policy.allow_plugin_storage) or (
-                'workspace' in storage_perms and resource_policy.allow_workspace_storage
-            )
-
-        # Get latest cursor and has_history_before if conversation exists
-        latest_cursor = None
-        has_history_before = False
-
-        if conversation_id:
-            try:
-                from .transcript_store import TranscriptStore
-
-                store = TranscriptStore(self.ap.persistence_mgr.get_db_engine())
-
-                latest_cursor = await store.get_latest_cursor(conversation_id)
-                if latest_cursor:
-                    has_history_before = True
-            except Exception as e:
-                self.ap.logger.warning(f'Failed to get transcript cursor: {e}')
-
-        return {
-            'conversation_id': conversation_id,
-            'thread_id': event.thread_id,
-            'latest_cursor': latest_cursor,
-            'event_seq': None,  # Will be populated when EventLog is written
-            'transcript_seq': int(latest_cursor) if latest_cursor else None,
-            'has_history_before': has_history_before,
-            'inline_policy': {
-                'mode': 'current_event',
-                'delivered_count': 0,
-                'source_total_count': None,
-                'messages_complete': False,
-                'reason': 'current_event_only',
-            },
-            'available_apis': {
-                'prompt_get': False,
-                'history_page': history_page_enabled,
-                'history_search': history_search_enabled,
-                'event_get': event_get_enabled,
-                'event_page': event_page_enabled,
-                'state': state_enabled,
-                'storage': storage_enabled,
-                'steering_pull': steering_pull_enabled,
-                'run_get': run_get_enabled,
-                'run_list': run_list_enabled,
-                'run_events_page': run_events_page_enabled,
-                'run_cancel': run_cancel_enabled,
-                'run_append_result': run_append_result_enabled,
-                'run_finalize': run_finalize_enabled,
-                'run_claim': run_claim_enabled,
-                'run_renew_claim': run_renew_claim_enabled,
-                'run_release_claim': run_release_claim_enabled,
-                'runtime_register': runtime_register_enabled,
-                'runtime_heartbeat': runtime_heartbeat_enabled,
-                'runtime_list': runtime_list_enabled,
-            },
-        }
@@ -1,72 +0,0 @@
-"""Default AgentRunner binding configuration helpers."""
-
-from __future__ import annotations
-
-import sqlalchemy
-
-from ...core import app
-from ...entity.persistence import pipeline as persistence_pipeline
-from . import config_schema
-from .config_migration import ConfigMigration
-
-
-class AgentRunnerDefaultConfigService:
-    """Apply AgentRunner schema-defined defaults to host binding config."""
-
-    ap: app.Application
-
-    def __init__(self, ap: app.Application) -> None:
-        self.ap = ap
-
-    async def _get_runner_descriptor(self, runner_id: str):
-        registry = getattr(self.ap, 'agent_runner_registry', None)
-        if registry is None:
-            return None
-        try:
-            return await registry.get(runner_id, bound_plugins=None)
-        except Exception as e:
-            logger = getattr(self.ap, 'logger', None)
-            if logger:
-                logger.warning(f'Failed to load AgentRunner descriptor while setting default model: {e}')
-            return None
-
-    async def auto_set_default_pipeline_llm_model(self, model_uuid: str) -> bool:
-        """Set model_uuid into the default pipeline runner config when the selector is empty."""
-        result = await self.ap.persistence_mgr.execute_async(
-            sqlalchemy.select(persistence_pipeline.LegacyPipeline).where(
-                persistence_pipeline.LegacyPipeline.is_default == True
-            )
-        )
-        pipeline = result.first()
-        if pipeline is None:
-            return False
-
-        return await self.set_pipeline_llm_model_if_empty(pipeline, model_uuid)
-
-    async def set_pipeline_llm_model_if_empty(
-        self,
-        pipeline: persistence_pipeline.LegacyPipeline,
-        model_uuid: str,
-    ) -> bool:
-        """Set model_uuid into a pipeline's schema-defined LLM selector if it is empty."""
-        pipeline_config = pipeline.config
-        if not isinstance(pipeline_config, dict):
-            return False
-
-        runner_id = ConfigMigration.resolve_runner_id(pipeline_config)
-        if not runner_id:
-            return False
-
-        descriptor = await self._get_runner_descriptor(runner_id)
-        if descriptor is None:
-            return False
-
-        ai_config = pipeline_config.setdefault('ai', {})
-        runner_configs = ai_config.setdefault('runner_config', {})
-        runner_config = runner_configs.setdefault(runner_id, {})
-
-        if not config_schema.set_empty_llm_model_selection(descriptor, runner_config, model_uuid):
-            return False
-
-        await self.ap.pipeline_service.update_pipeline(pipeline.uuid, {'config': pipeline_config})
-        return True
@@ -1,82 +0,0 @@
-"""Agent runner descriptor."""
-from __future__ import annotations
-
-import typing
-import pydantic
-
-from langbot_plugin.api.entities.builtin.agent_runner.manifest import (
-    AgentRunnerCapabilities,
-    AgentRunnerPermissions,
-)
-
-
-class AgentRunnerDescriptor(pydantic.BaseModel):
-    """Descriptor for an agent runner.
-
-    Represents the discovered metadata for a runner, including
-    its identity, capabilities, permissions, and configuration schema.
-    """
-
-    id: str
-    """Unique runner ID: plugin:author/plugin_name/runner_name"""
-
-    source: typing.Literal['plugin']
-    """Runner source type"""
-
-    label: dict[str, str]
-    """Display labels keyed by locale (e.g., en_US, zh_Hans)"""
-
-    description: dict[str, str] | None = None
-    """Optional description keyed by locale"""
-
-    plugin_author: str
-    """Plugin author from manifest"""
-
-    plugin_name: str
-    """Plugin name from manifest"""
-
-    runner_name: str
-    """AgentRunner component name from manifest"""
-
-    plugin_version: str | None = None
-    """Optional plugin version"""
-
-    config_schema: list[dict[str, typing.Any]] = pydantic.Field(default_factory=list)
-    """Configuration schema using DynamicForm format"""
-
-    capabilities: AgentRunnerCapabilities = pydantic.Field(
-        default_factory=AgentRunnerCapabilities
-    )
-    """Runner capabilities: streaming, tool_calling, knowledge_retrieval, etc."""
-
-    permissions: AgentRunnerPermissions = pydantic.Field(
-        default_factory=AgentRunnerPermissions
-    )
-    """Requested LangBot resource permissions."""
-
-    raw_manifest: dict[str, typing.Any] = pydantic.Field(default_factory=dict)
-    """Original manifest for reference"""
-
-    model_config = pydantic.ConfigDict(
-        extra='allow',
-    )
-
-    def get_plugin_id(self) -> str:
-        """Return plugin identifier as author/name."""
-        return f'{self.plugin_author}/{self.plugin_name}'
-
-    def supports_streaming(self) -> bool:
-        """Check if runner supports streaming output."""
-        return self.capabilities.streaming
-
-    def supports_tool_calling(self) -> bool:
-        """Check if runner supports tool calling."""
-        return self.capabilities.tool_calling
-
-    def supports_knowledge_retrieval(self) -> bool:
-        """Check if runner supports knowledge retrieval."""
-        return self.capabilities.knowledge_retrieval
-
-    def supports_steering(self) -> bool:
-        """Check if runner supports run steering/follow-up input."""
-        return bool(getattr(self.capabilities, 'steering', False))
@@ -1,37 +0,0 @@
-"""Agent runner errors."""
-from __future__ import annotations
-
-
-class AgentRunnerError(Exception):
-    """Base error for agent runner operations."""
-    pass
-
-
-class RunnerNotFoundError(AgentRunnerError):
-    """Runner not found in registry."""
-    def __init__(self, runner_id: str):
-        self.runner_id = runner_id
-        super().__init__(f'Agent runner not found: {runner_id}')
-
-
-class RunnerNotAuthorizedError(AgentRunnerError):
-    """Runner not authorized for this binding."""
-    def __init__(self, runner_id: str, bound_plugins: list[str] | None):
-        self.runner_id = runner_id
-        self.bound_plugins = bound_plugins
-        super().__init__(f'Agent runner {runner_id} not authorized for bound_plugins={bound_plugins}')
-
-
-class RunnerProtocolError(AgentRunnerError):
-    """Runner protocol version mismatch or invalid manifest."""
-    def __init__(self, runner_id: str, message: str):
-        self.runner_id = runner_id
-        super().__init__(f'Agent runner protocol error for {runner_id}: {message}')
-
-
-class RunnerExecutionError(AgentRunnerError):
-    """Runner execution failed."""
-    def __init__(self, runner_id: str, message: str, retryable: bool = False):
-        self.runner_id = runner_id
-        self.retryable = retryable
-        super().__init__(f'Agent runner {runner_id} execution failed: {message}')
@@ -1,315 +0,0 @@
-"""EventLog store for writing and querying event records."""
-from __future__ import annotations
-
-import json
-import datetime
-import typing
-import uuid
-
-import sqlalchemy
-from sqlalchemy.ext.asyncio import AsyncEngine, AsyncSession
-from sqlalchemy.orm import sessionmaker
-
-from ...entity.persistence.event_log import EventLog
-
-
-UTC = datetime.timezone.utc
-
-
-def _utc_now() -> datetime.datetime:
-    return datetime.datetime.now(UTC)
-
-
-def _datetime_to_epoch(value: datetime.datetime | None) -> int | None:
-    if value is None:
-        return None
-    if value.tzinfo is None:
-        value = value.replace(tzinfo=UTC)
-    else:
-        value = value.astimezone(UTC)
-    return int(value.timestamp())
-
-
-class EventLogStore:
-    """Store for EventLog records.
-
-    Handles writing events to the event log and querying them.
-    All methods are async and use the provided database engine.
-    """
-
-    engine: AsyncEngine
-
-    # Hard limits
-    MAX_INPUT_SUMMARY_LENGTH = 1000
-
-    def __init__(self, engine: AsyncEngine):
-        self.engine = engine
-        self._session_factory = sessionmaker(
-            engine, class_=AsyncSession, expire_on_commit=False
-        )
-
-    async def append_event(
-        self,
-        event_id: str | None,
-        event_type: str,
-        source: str,
-        bot_id: str | None = None,
-        workspace_id: str | None = None,
-        conversation_id: str | None = None,
-        thread_id: str | None = None,
-        actor_type: str | None = None,
-        actor_id: str | None = None,
-        actor_name: str | None = None,
-        subject_type: str | None = None,
-        subject_id: str | None = None,
-        input_summary: str | None = None,
-        input_json: dict[str, typing.Any] | None = None,
-        raw_ref: str | None = None,
-        run_id: str | None = None,
-        runner_id: str | None = None,
-        event_time: datetime.datetime | None = None,
-        metadata: dict[str, typing.Any] | None = None,
-    ) -> str:
-        """Append an event to the event log.
-
-        Args:
-            event_id: Unique event ID (generated if None)
-            event_type: Event type
-            source: Event source
-            bot_id: Bot UUID
-            workspace_id: Workspace ID
-            conversation_id: Conversation ID
-            thread_id: Thread ID
-            actor_type: Actor type
-            actor_id: Actor ID
-            actor_name: Actor display name
-            subject_type: Subject type
-            subject_id: Subject ID
-            input_summary: Brief input summary
-            input_json: Full input JSON
-            raw_ref: Reference to raw event payload
-            run_id: Run ID processing this event
-            runner_id: Runner ID processing this event
-            event_time: When the event occurred
-            metadata: Additional metadata
-
-        Returns:
-            The event_id
-        """
-        if event_id is None:
-            event_id = str(uuid.uuid4())
-
-        # Truncate input summary if too long
-        if input_summary and len(input_summary) > self.MAX_INPUT_SUMMARY_LENGTH:
-            input_summary = input_summary[:self.MAX_INPUT_SUMMARY_LENGTH - 3] + "..."
-
-        async with self._session_factory() as session:
-            event = EventLog(
-                event_id=event_id,
-                event_type=event_type,
-                event_time=event_time,
-                source=source,
-                bot_id=bot_id,
-                workspace_id=workspace_id,
-                conversation_id=conversation_id,
-                thread_id=thread_id,
-                actor_type=actor_type,
-                actor_id=actor_id,
-                actor_name=actor_name,
-                subject_type=subject_type,
-                subject_id=subject_id,
-                input_summary=input_summary,
-                input_json=json.dumps(input_json) if input_json else None,
-                raw_ref=raw_ref,
-                run_id=run_id,
-                runner_id=runner_id,
-                metadata_json=json.dumps(metadata) if metadata else None,
-                created_at=_utc_now(),
-            )
-            session.add(event)
-            await session.commit()
-
-        return event_id
-
-    async def get_event(
-        self,
-        event_id: str,
-    ) -> dict[str, typing.Any] | None:
-        """Get a single event by ID.
-
-        Args:
-            event_id: Event ID
-
-        Returns:
-            Event record as dict, or None if not found
-        """
-        async with self._session_factory() as session:
-            result = await session.execute(
-                sqlalchemy.select(EventLog).where(EventLog.event_id == event_id)
-            )
-            row = result.scalars().first()
-            if row is None:
-                return None
-            return self._row_to_dict(row)
-
-    async def page_events(
-        self,
-        conversation_id: str | None = None,
-        event_types: list[str] | None = None,
-        before_seq: int | None = None,
-        limit: int = 50,
-        bot_id: str | None = None,
-        workspace_id: str | None = None,
-        thread_id: str | None = None,
-        strict_thread: bool = False,
-    ) -> tuple[list[dict[str, typing.Any]], int | None, bool]:
-        """Page through event records.
-
-        Args:
-            conversation_id: Filter by conversation ID
-            event_types: Filter by event types
-            before_seq: Get events before this sequence number
-            limit: Maximum items to return (capped at 100)
-            bot_id: Optional bot scope filter
-            workspace_id: Optional workspace scope filter
-            thread_id: Optional thread scope filter
-            strict_thread: When true, require thread_id equality including NULL
-
-        Returns:
-            Tuple of (items, next_seq, has_more)
-        """
-        limit = min(limit, 100)  # Hard cap
-
-        async with self._session_factory() as session:
-            query = sqlalchemy.select(EventLog)
-
-            if conversation_id is not None:
-                query = query.where(EventLog.conversation_id == conversation_id)
-            query = self._apply_scope_filters(query, bot_id, workspace_id, thread_id, strict_thread)
-
-            if event_types:
-                query = query.where(EventLog.event_type.in_(event_types))
-
-            if before_seq is not None:
-                query = query.where(EventLog.id < before_seq)
-
-            query = query.order_by(EventLog.id.desc()).limit(limit + 1)
-
-            result = await session.execute(query)
-            rows = result.scalars().all()
-
-            items = [self._row_to_dict(row) for row in rows[:limit]]
-            has_more = len(rows) > limit
-            next_seq = items[-1]['id'] if items and has_more else None
-
-            return items, next_seq, has_more
-
-    async def get_latest_cursor(
-        self,
-        conversation_id: str,
-    ) -> str | None:
-        """Get the latest cursor for a conversation.
-
-        Args:
-            conversation_id: Conversation ID
-
-        Returns:
-            Cursor string (seq number), or None if no events
-        """
-        async with self._session_factory() as session:
-            result = await session.execute(
-                sqlalchemy.select(EventLog.id)
-                .where(EventLog.conversation_id == conversation_id)
-                .order_by(EventLog.id.desc())
-                .limit(1)
-            )
-            row = result.scalars().first()
-            if row is None:
-                return None
-            return str(row)
-
-    async def has_events_before(
-        self,
-        conversation_id: str,
-        seq: int,
-        bot_id: str | None = None,
-        workspace_id: str | None = None,
-        thread_id: str | None = None,
-        strict_thread: bool = False,
-    ) -> bool:
-        """Check if there are events before a sequence number.
-
-        Args:
-            conversation_id: Conversation ID
-            seq: Sequence number
-
-        Returns:
-            True if there are events before
-        """
-        async with self._session_factory() as session:
-            query = (
-                sqlalchemy.select(sqlalchemy.func.count())
-                .select_from(EventLog)
-                .where(EventLog.conversation_id == conversation_id, EventLog.id < seq)
-            )
-            query = self._apply_scope_filters(query, bot_id, workspace_id, thread_id, strict_thread)
-            result = await session.execute(query)
-            count = result.scalar()
-            return count > 0
-
-    def _apply_scope_filters(
-        self,
-        query: typing.Any,
-        bot_id: str | None,
-        workspace_id: str | None,
-        thread_id: str | None,
-        strict_thread: bool,
-    ) -> typing.Any:
-        if bot_id is not None:
-            query = query.where(EventLog.bot_id == bot_id)
-        if workspace_id is not None:
-            query = query.where(EventLog.workspace_id == workspace_id)
-        if strict_thread:
-            if thread_id is None:
-                query = query.where(EventLog.thread_id.is_(None))
-            else:
-                query = query.where(EventLog.thread_id == thread_id)
-        return query
-
-    async def cleanup_events_older_than(
-        self,
-        before: datetime.datetime,
-    ) -> int:
-        """Delete EventLog rows created before the supplied timestamp."""
-        async with self._session_factory() as session:
-            result = await session.execute(
-                sqlalchemy.delete(EventLog).where(EventLog.created_at < before)
-            )
-            await session.commit()
-            return result.rowcount or 0
-
-    def _row_to_dict(self, row: EventLog) -> dict[str, typing.Any]:
-        """Convert an EventLog row to dict."""
-        return {
-            'id': row.id,
-            'event_id': row.event_id,
-            'event_type': row.event_type,
-            'event_time': _datetime_to_epoch(row.event_time),
-            'source': row.source,
-            'bot_id': row.bot_id,
-            'workspace_id': row.workspace_id,
-            'conversation_id': row.conversation_id,
-            'thread_id': row.thread_id,
-            'actor_type': row.actor_type,
-            'actor_id': row.actor_id,
-            'actor_name': row.actor_name,
-            'subject_type': row.subject_type,
-            'subject_id': row.subject_id,
-            'input_summary': row.input_summary,
-            'input_json': json.loads(row.input_json) if row.input_json else None,
-            'raw_ref': row.raw_ref,
-            'run_id': row.run_id,
-            'runner_id': row.runner_id,
-            'created_at': _datetime_to_epoch(row.created_at),
-            'metadata': json.loads(row.metadata_json) if row.metadata_json else {},
-        }
@@ -1,25 +0,0 @@
-"""Canonical AgentRunner event names reserved for future EBA integration."""
-from __future__ import annotations
-
-
-MESSAGE_RECEIVED = 'message.received'
-"""A normal message entered the current Pipeline."""
-
-MESSAGE_RECALLED = 'message.recalled'
-"""A platform message was recalled or deleted."""
-
-GROUP_MEMBER_JOINED = 'group.member_joined'
-"""A new member joined a group/channel conversation."""
-
-FRIEND_REQUEST_RECEIVED = 'friend.request_received'
-"""A new friend/contact request was received."""
-
-
-RESERVED_EVENT_TYPES = frozenset(
-    {
-        MESSAGE_RECEIVED,
-        MESSAGE_RECALLED,
-        GROUP_MEMBER_JOINED,
-        FRIEND_REQUEST_RECEIVED,
-    }
-)
@@ -1,210 +0,0 @@
-"""Agent event envelope and binding models for LangBot Host.
-
-These are Host-internal models, not exposed to SDK.
-"""
-from __future__ import annotations
-
-import typing
-import pydantic
-
-from langbot_plugin.api.entities.builtin.agent_runner.event import (
-    ActorContext,
-    SubjectContext,
-    RawEventRef,
-)
-from langbot_plugin.api.entities.builtin.agent_runner.input import AgentInput
-from langbot_plugin.api.entities.builtin.agent_runner.delivery import DeliveryContext
-
-
-class AgentEventEnvelope(pydantic.BaseModel):
-    """Event envelope for LangBot Host event gateway.
-
-    This is the unified input model that replaces Query-first approach.
-    IM / WebUI / API / EventRouter all produce this envelope.
-    """
-
-    event_id: str
-    """Unique event identifier."""
-
-    event_type: str
-    """Event type (message.received, message.recalled, etc.)."""
-
-    event_time: int | None = None
-    """Event timestamp (epoch seconds)."""
-
-    source: str
-    """Event source (platform, webui, api, scheduler, system)."""
-
-    source_event_type: str | None = None
-    """Original source event type, when available."""
-
-    bot_id: str | None = None
-    """Bot UUID handling this event."""
-
-    workspace_id: str | None = None
-    """Workspace ID (for multi-tenant)."""
-
-    conversation_id: str | None = None
-    """Conversation ID."""
-
-    thread_id: str | None = None
-    """Thread ID (for platforms supporting threads)."""
-
-    actor: ActorContext | None = None
-    """Actor (who triggered the event)."""
-
-    subject: SubjectContext | None = None
-    """Subject (what the event is about)."""
-
-    input: AgentInput
-    """Event input."""
-
-    delivery: DeliveryContext
-    """Delivery context."""
-
-    raw_ref: RawEventRef | None = None
-    """Reference to raw event payload."""
-
-    data: dict[str, typing.Any] = pydantic.Field(default_factory=dict)
-    """Small structured event payload. Large payloads should be referenced via raw_ref."""
-
-
-# Binding scope types
-class BindingScope(pydantic.BaseModel):
-    """Scope for agent binding."""
-
-    scope_type: typing.Literal["agent", "bot", "workspace", "global"] = "agent"
-    """Scope type."""
-
-    scope_id: str | None = None
-    """Scope identifier (agent_id, bot_uuid, etc.)."""
-
-
-class ResourcePolicy(pydantic.BaseModel):
-    """Resource policy for agent binding.
-
-    Controls what resources the runner can access.
-    """
-
-    allowed_model_uuids: list[str] | None = None
-    """Additional model UUID grants. None means no additional model grants."""
-
-    allowed_tool_names: list[str] | None = None
-    """Additional tool name grants. None means no additional tool grants."""
-
-    allowed_kb_uuids: list[str] | None = None
-    """Additional knowledge base UUID grants. None means no additional KB grants."""
-
-    allowed_skill_names: list[str] | None = None
-    """Allowed skill names. None means all currently visible skills are allowed."""
-
-    allow_plugin_storage: bool = True
-    """Whether plugin storage is allowed."""
-
-    allow_workspace_storage: bool = False
-    """Whether workspace storage is allowed."""
-
-
-class StatePolicy(pydantic.BaseModel):
-    """State policy for agent binding.
-
-    Controls state management behavior.
-    """
-
-    enable_state: bool = True
-    """Whether host-owned state is enabled."""
-
-    state_scopes: list[typing.Literal["conversation", "actor", "subject", "runner"]] = (
-        pydantic.Field(default_factory=lambda: ["conversation", "actor"])
-    )
-    """Enabled state scopes."""
-
-
-class DeliveryPolicy(pydantic.BaseModel):
-    """Delivery policy for agent binding.
-
-    Controls how results are delivered.
-    """
-
-    enable_streaming: bool = True
-    """Whether streaming output is enabled."""
-
-    enable_reply: bool = True
-    """Whether reply is enabled."""
-
-    max_message_size: int | None = None
-    """Maximum message size."""
-
-
-class AgentConfig(pydantic.BaseModel):
-    """Host-side Agent configuration.
-
-    Product-level Agent is the target replacement for Pipeline-owned agent
-    config. Current Pipeline entry paths can project their config into this
-    model during migration.
-    """
-
-    agent_id: str | None = None
-    """Host-side Agent/config identifier."""
-
-    runner_id: str
-    """Runner ID to invoke."""
-
-    runner_config: dict[str, typing.Any] = pydantic.Field(default_factory=dict)
-    """Agent/runner binding configuration."""
-
-    resource_policy: ResourcePolicy = pydantic.Field(default_factory=ResourcePolicy)
-    """Resource policy for this Agent."""
-
-    state_policy: StatePolicy = pydantic.Field(default_factory=StatePolicy)
-    """State policy for this Agent."""
-
-    delivery_policy: DeliveryPolicy = pydantic.Field(default_factory=DeliveryPolicy)
-    """Delivery policy for this Agent."""
-
-    event_types: list[str] = pydantic.Field(default_factory=lambda: ["message.received"])
-    """Event types this Agent handles."""
-
-    enabled: bool = True
-    """Whether this Agent can be selected by a binding resolver."""
-
-    metadata: dict[str, typing.Any] = pydantic.Field(default_factory=dict)
-    """Non-protocol diagnostic metadata, such as legacy config source."""
-
-
-class AgentBinding(pydantic.BaseModel):
-    """Binding configuration for mapping events to runners.
-
-    This is Host-internal model for event-to-runner binding.
-    It replaces the old Pipeline runner config role.
-    """
-
-    binding_id: str
-    """Unique binding identifier."""
-
-    scope: BindingScope = pydantic.Field(default_factory=BindingScope)
-    """Binding scope."""
-
-    event_types: list[str] = pydantic.Field(default_factory=lambda: ["message.received"])
-    """Event types this binding handles."""
-
-    runner_id: str
-    """Runner ID to invoke."""
-
-    runner_config: dict[str, typing.Any] = pydantic.Field(default_factory=dict)
-    """Current Agent/runner configuration."""
-
-    resource_policy: ResourcePolicy = pydantic.Field(default_factory=ResourcePolicy)
-    """Resource policy."""
-
-    state_policy: StatePolicy = pydantic.Field(default_factory=StatePolicy)
-    """State policy."""
-
-    delivery_policy: DeliveryPolicy = pydantic.Field(default_factory=DeliveryPolicy)
-    """Delivery policy."""
-
-    enabled: bool = True
-    """Whether binding is enabled."""
-
-    agent_id: str | None = None
-    """Host-side Agent/config identifier for this binding."""
@@ -1,91 +0,0 @@
-"""Agent runner ID parsing and formatting."""
-from __future__ import annotations
-
-import dataclasses
-
-
-@dataclasses.dataclass(frozen=True)
-class RunnerIdParts:
-    """Parsed runner ID components."""
-    source: str  # 'plugin' (future: 'builtin')
-    plugin_author: str
-    plugin_name: str
-    runner_name: str
-
-    def to_plugin_id(self) -> str:
-        """Return plugin identifier as author/name."""
-        return f'{self.plugin_author}/{self.plugin_name}'
-
-
-def parse_runner_id(runner_id: str) -> RunnerIdParts:
-    """Parse runner ID string into components.
-
-    Args:
-        runner_id: Runner ID in format 'plugin:author/plugin_name/runner_name'
-
-    Returns:
-        RunnerIdParts with parsed components
-
-    Raises:
-        ValueError: If runner_id format is invalid
-    """
-    if runner_id.startswith('plugin:'):
-        parts = runner_id[7:].split('/')
-        if len(parts) != 3:
-            raise ValueError(
-                f'Invalid plugin runner ID format: {runner_id}. '
-                f'Expected: plugin:author/plugin_name/runner_name'
-            )
-        plugin_author, plugin_name, runner_name = parts
-        if not plugin_author or not plugin_name or not runner_name:
-            raise ValueError(
-                f'Invalid plugin runner ID: {runner_id}. '
-                f'author, plugin_name, and runner_name must be non-empty'
-            )
-        return RunnerIdParts(
-            source='plugin',
-            plugin_author=plugin_author,
-            plugin_name=plugin_name,
-            runner_name=runner_name,
-        )
-    else:
-        # Only plugin runner IDs are valid at the protocol boundary.
-        raise ValueError(
-            f'Invalid runner ID format: {runner_id}. '
-            f'Expected: plugin:author/plugin_name/runner_name'
-        )
-
-
-def format_runner_id(
-    source: str,
-    plugin_author: str,
-    plugin_name: str,
-    runner_name: str,
-) -> str:
-    """Format runner ID from components.
-
-    Args:
-        source: Runner source ('plugin')
-        plugin_author: Plugin author
-        plugin_name: Plugin name
-        runner_name: Runner component name
-
-    Returns:
-        Runner ID string
-    """
-    if source == 'plugin':
-        return f'plugin:{plugin_author}/{plugin_name}/{runner_name}'
-    else:
-        raise ValueError(f'Invalid runner source: {source}')
-
-
-def is_plugin_runner_id(runner_id: str) -> bool:
-    """Check if runner ID is a plugin runner.
-
-    Args:
-        runner_id: Runner ID string
-
-    Returns:
-        True if runner ID starts with 'plugin:'
-    """
-    return runner_id.startswith('plugin:')
@@ -1,131 +0,0 @@
-"""Plugin-runtime invocation for AgentRunner executions."""
-
-from __future__ import annotations
-
-import asyncio
-import time
-import traceback
-import typing
-
-from langbot_plugin.entities.io.errors import ActionCallTimeoutError
-
-from ...core import app
-from .context_builder import AgentRunContextPayload
-from .descriptor import AgentRunnerDescriptor
-from .errors import RunnerExecutionError
-
-
-class AgentRunnerInvoker:
-    """Invoke an AgentRunner through the plugin runtime.
-
-    This keeps runtime transport, deadline enforcement, and transport error
-    mapping out of the orchestration state machine.
-    """
-
-    ap: app.Application
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-
-    async def invoke(
-        self,
-        descriptor: AgentRunnerDescriptor,
-        context: AgentRunContextPayload,
-    ) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
-        """Invoke the runner and yield raw result dictionaries."""
-        if not self.ap.plugin_connector.is_enable_plugin:
-            raise RunnerExecutionError(
-                descriptor.id,
-                'Plugin system is disabled',
-                retryable=False,
-            )
-
-        try:
-            gen = self.ap.plugin_connector.run_agent(
-                plugin_author=descriptor.plugin_author,
-                plugin_name=descriptor.plugin_name,
-                runner_name=descriptor.runner_name,
-                context=context,
-            )
-
-            while True:
-                try:
-                    result_dict = await self._next_with_deadline(gen, descriptor, context)
-                except StopAsyncIteration:
-                    break
-                yield result_dict
-
-        except asyncio.TimeoutError as e:
-            raise RunnerExecutionError(
-                descriptor.id,
-                'Runner timed out (code: runner.timeout)',
-                retryable=True,
-            ) from e
-        except ActionCallTimeoutError as e:
-            raise RunnerExecutionError(
-                descriptor.id,
-                f'{e} (code: runner.timeout)',
-                retryable=True,
-            ) from e
-        except RunnerExecutionError:
-            raise
-        except Exception as e:
-            self.ap.logger.error(
-                f'Runner {descriptor.id} unexpected error: {traceback.format_exc()}'
-            )
-            raise RunnerExecutionError(
-                descriptor.id,
-                str(e),
-                retryable=False,
-            )
-
-    async def _next_with_deadline(
-        self,
-        gen: typing.AsyncGenerator[dict[str, typing.Any], None],
-        descriptor: AgentRunnerDescriptor,
-        context: AgentRunContextPayload,
-    ) -> dict[str, typing.Any]:
-        """Read the next runner result while enforcing the run deadline."""
-        remaining = self._remaining_deadline_seconds(context)
-        if remaining is not None and remaining <= 0:
-            await self._close_generator(gen, descriptor)
-            raise asyncio.TimeoutError
-
-        try:
-            if remaining is None:
-                return await anext(gen)
-            return await asyncio.wait_for(anext(gen), timeout=remaining)
-        except StopAsyncIteration:
-            if self._is_deadline_exhausted(context):
-                raise asyncio.TimeoutError
-            raise
-        except asyncio.TimeoutError:
-            await self._close_generator(gen, descriptor)
-            raise
-
-    def _remaining_deadline_seconds(
-        self,
-        context: AgentRunContextPayload,
-    ) -> float | None:
-        runtime = context.get('runtime') or {}
-        deadline_at = runtime.get('deadline_at')
-        if deadline_at is None:
-            return None
-        try:
-            return float(deadline_at) - time.time()
-        except (TypeError, ValueError):
-            return None
-
-    def _is_deadline_exhausted(self, context: AgentRunContextPayload) -> bool:
-        remaining = self._remaining_deadline_seconds(context)
-        return remaining is not None and remaining <= 0
-
-    async def _close_generator(
-        self,
-        gen: typing.AsyncGenerator[dict[str, typing.Any], None],
-        descriptor: AgentRunnerDescriptor,
-    ) -> None:
-        try:
-            await gen.aclose()
-        except Exception as e:
-            self.ap.logger.warning(f'Failed to close timed-out runner {descriptor.id}: {e}')
@@ -1,536 +0,0 @@
-"""Agent run orchestrator for coordinating runner execution."""
-
-from __future__ import annotations
-
-import time
-import typing
-
-from langbot_plugin.api.entities.builtin.provider import message as provider_message
-from langbot_plugin.api.entities.builtin.pipeline import query as pipeline_query
-
-from ...core import app
-from .binding_resolver import AgentBindingResolver
-from .context_builder import AgentRunContextBuilder, AgentRunContextPayload
-from .descriptor import AgentRunnerDescriptor
-from .host_models import AgentBinding, AgentEventEnvelope
-from .invoker import AgentRunnerInvoker
-from .query_bridge import QueryRunBridge
-from .registry import AgentRunnerRegistry
-from .resource_builder import AgentResourceBuilder
-from .result_normalizer import AgentResultNormalizer
-from .run_journal import AgentRunJournal
-from .session_registry import AgentRunSessionRegistry, get_session_registry
-from .state_scope import build_state_context
-from ...provider.tools.loaders import skill as skill_loader
-
-
-ACTIVATED_SKILL_NAMES_STATE_KEY = 'host.activated_skills'
-
-
-class AgentRunOrchestrator:
-    """Coordinate one AgentRunner execution.
-
-    The orchestrator keeps the run state machine readable and delegates
-    transport, Query bridging, and persistence side effects to narrower
-    collaborators.
-    """
-
-    ap: app.Application
-    registry: AgentRunnerRegistry
-    context_builder: AgentRunContextBuilder
-    resource_builder: AgentResourceBuilder
-    result_normalizer: AgentResultNormalizer
-    binding_resolver: AgentBindingResolver
-    query_bridge: QueryRunBridge
-    invoker: AgentRunnerInvoker
-    journal: AgentRunJournal
-    _session_registry: AgentRunSessionRegistry
-
-    def __init__(
-        self,
-        ap: app.Application,
-        registry: AgentRunnerRegistry,
-    ):
-        self.ap = ap
-        self.registry = registry
-        self.context_builder = AgentRunContextBuilder(ap)
-        self.resource_builder = AgentResourceBuilder(ap)
-        self.result_normalizer = AgentResultNormalizer(ap)
-        self.binding_resolver = AgentBindingResolver()
-        self.query_bridge = QueryRunBridge(self.binding_resolver)
-        self.invoker = AgentRunnerInvoker(ap)
-        self.journal = AgentRunJournal(ap)
-        self._session_registry = get_session_registry()
-
-    async def run(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        bound_plugins: list[str] | None = None,
-        adapter_context: dict[str, typing.Any] | None = None,
-    ) -> typing.AsyncGenerator[provider_message.Message | provider_message.MessageChunk, None]:
-        """Run an AgentRunner from an event-first envelope."""
-        runner_id = binding.runner_id
-        descriptor = await self.registry.get(runner_id, bound_plugins)
-
-        resources = await self.resource_builder.build_resources_from_binding(
-            event=event,
-            binding=binding,
-            descriptor=descriptor,
-        )
-
-        context = await self.context_builder.build_context_from_event(
-            event=event,
-            binding=binding,
-            descriptor=descriptor,
-            resources=resources,
-        )
-
-        session_query_id = None
-        if adapter_context:
-            query = adapter_context.get('_query')
-            if query is not None:
-                skill_loader.restore_activated_skills_from_state(
-                    self.ap,
-                    query,
-                    context.get('state', {}),
-                )
-            session_query_id = adapter_context.get('query_id')
-            if query is not None or session_query_id is not None:
-                context['context']['available_apis']['prompt_get'] = True
-            if 'params' in adapter_context:
-                context['adapter']['extra']['params'] = adapter_context['params']
-
-        state_context = build_state_context(event, binding, descriptor)
-        run_id = context['run_id']
-        available_apis = context.get('context', {}).get('available_apis')
-        run_authorization = {
-            'runner_id': descriptor.id,
-            'binding_id': binding.binding_id,
-            'plugin_identity': descriptor.get_plugin_id(),
-            'resources': resources,
-            'available_apis': available_apis,
-            'conversation_id': event.conversation_id,
-            'bot_id': event.bot_id,
-            'workspace_id': event.workspace_id,
-            'thread_id': event.thread_id,
-            'state_policy': {
-                'enable_state': binding.state_policy.enable_state,
-                'state_scopes': list(binding.state_policy.state_scopes),
-            },
-            'state_context': state_context,
-        }
-
-        seen_sequences: set[int] = set()
-        last_sequence = 0
-        assistant_transcript_written = False
-        terminal_status: str | None = None
-        terminal_reason: str | None = None
-        terminal_usage: dict[str, typing.Any] | None = None
-
-        try:
-            await self.journal.create_run(
-                event=event,
-                binding=binding,
-                descriptor=descriptor,
-                context=context,
-                authorization=run_authorization,
-            )
-            await self._session_registry.register(
-                run_id=run_id,
-                runner_id=descriptor.id,
-                query_id=session_query_id,
-                plugin_identity=descriptor.get_plugin_id(),
-                resources=resources,
-                available_apis=context.get('context', {}).get('available_apis'),
-                conversation_id=event.conversation_id,
-                bot_id=event.bot_id,
-                workspace_id=event.workspace_id,
-                thread_id=event.thread_id,
-                state_policy={
-                    'enable_state': binding.state_policy.enable_state,
-                    'state_scopes': list(binding.state_policy.state_scopes),
-                },
-                state_context=state_context,
-            )
-
-            event_log_id = await self.journal.write_event_log(
-                event=event,
-                binding=binding,
-                run_id=run_id,
-                runner_id=descriptor.id,
-            )
-            if event.event_type == 'message.received' and event.conversation_id:
-                await self.journal.write_user_transcript(
-                    event=event,
-                    event_log_id=event_log_id,
-                )
-
-            async for result_dict in self.invoker.invoke(descriptor, context):
-                result_dict = dict(result_dict)
-                sequence = result_dict.get('sequence')
-                if sequence is not None:
-                    try:
-                        sequence_int = int(sequence)
-                    except (TypeError, ValueError):
-                        self.ap.logger.warning(f'Runner {descriptor.id} returned invalid result sequence: {sequence}')
-                        sequence_int = last_sequence + 1
-                        result_dict['sequence'] = sequence_int
-                    else:
-                        if sequence_int in seen_sequences:
-                            self.ap.logger.warning(
-                                f'Runner {descriptor.id} returned duplicate result sequence '
-                                f'{sequence_int} for run {run_id}; dropping duplicate'
-                            )
-                            continue
-                        if sequence_int <= 0:
-                            self.ap.logger.warning(
-                                f'Runner {descriptor.id} returned non-positive result sequence '
-                                f'{sequence_int} for run {run_id}'
-                            )
-                            sequence_int = last_sequence + 1
-                            result_dict['sequence'] = sequence_int
-                        elif last_sequence and sequence_int != last_sequence + 1:
-                            self.ap.logger.warning(
-                                f'Runner {descriptor.id} result sequence gap or out-of-order '
-                                f'for run {run_id}: previous={last_sequence}, current={sequence_int}'
-                            )
-                        seen_sequences.add(sequence_int)
-                        last_sequence = max(last_sequence, sequence_int)
-                else:
-                    sequence_int = last_sequence + 1
-                    result_dict['sequence'] = sequence_int
-                    seen_sequences.add(sequence_int)
-                    last_sequence = sequence_int
-
-                result_type = result_dict.get('type')
-                if result_type and not self.result_normalizer.validate_payload(
-                    result_type,
-                    result_dict.get('data', {}),
-                    descriptor,
-                ):
-                    continue
-
-                await self.journal.append_run_result(
-                    result_dict=result_dict,
-                    run_id=run_id,
-                    sequence=sequence_int,
-                )
-
-                if result_type == 'state.updated':
-                    await self.journal.handle_state_updated_event(
-                        result_dict,
-                        event,
-                        binding,
-                        descriptor,
-                        run_id=run_id,
-                    )
-                    await self.result_normalizer.normalize(result_dict, descriptor)
-                    continue
-
-                if result_type == 'run.completed':
-                    terminal_status = 'completed'
-                    terminal_reason = (
-                        result_dict.get('data', {}).get('finish_reason')
-                        if isinstance(result_dict.get('data'), dict)
-                        else None
-                    )
-                    usage = result_dict.get('usage')
-                    if isinstance(usage, dict):
-                        terminal_usage = usage
-                elif result_type == 'run.failed':
-                    terminal_status = 'failed'
-                    data = result_dict.get('data') if isinstance(result_dict.get('data'), dict) else {}
-                    terminal_reason = data.get('error') or data.get('code')
-                    usage = result_dict.get('usage')
-                    if isinstance(usage, dict):
-                        terminal_usage = usage
-
-                has_completed_message = result_type == 'message.completed' or (
-                    result_type == 'run.completed'
-                    and isinstance(result_dict.get('data'), dict)
-                    and bool(result_dict['data'].get('message'))
-                )
-                if has_completed_message and event.conversation_id and not assistant_transcript_written:
-                    await self.journal.write_assistant_transcript(
-                        result_dict=result_dict,
-                        event=event,
-                        run_id=run_id,
-                        runner_id=descriptor.id,
-                    )
-                    assistant_transcript_written = True
-
-                result = await self.result_normalizer.normalize(result_dict, descriptor)
-                if result is not None:
-                    yield result
-
-                run_snapshot = await self.journal.get_run(run_id)
-                if run_snapshot and run_snapshot.get('cancel_requested_at') is not None:
-                    terminal_status = 'cancelled'
-                    terminal_reason = run_snapshot.get('status_reason') or 'cancel_requested'
-                    break
-            await self.journal.finalize_run(
-                run_id=run_id,
-                status=terminal_status or 'completed',
-                status_reason=terminal_reason,
-                usage=terminal_usage,
-            )
-        except Exception as exc:
-            failed_usage = terminal_usage
-            await self.journal.finalize_run(
-                run_id=run_id,
-                status='timeout' if self._is_deadline_exhausted(context) else 'failed',
-                status_reason=str(exc),
-                usage=failed_usage,
-            )
-            raise
-        finally:
-            session = await self._session_registry.unregister(run_id)
-            pending_steering = session.get('steering_queue', []) if session else []
-            if pending_steering:
-                try:
-                    await self.journal.write_steering_dropped_audits(
-                        pending_steering,
-                        run_id,
-                        descriptor.id,
-                    )
-                except Exception as exc:
-                    self.ap.logger.warning(
-                        f'Failed to write dropped steering audit for run {run_id}: {exc}',
-                        exc_info=True,
-                    )
-
-    async def run_from_query(
-        self,
-        query: pipeline_query.Query,
-    ) -> typing.AsyncGenerator[provider_message.Message | provider_message.MessageChunk, None]:
-        """Run an AgentRunner from the current Pipeline Query entry point."""
-        plan = self.query_bridge.build_plan(query)
-        adapter_context = dict(plan.adapter_context)
-        adapter_context['_query'] = query
-
-        # Materialize inbound attachments into sandbox before running
-        await self._materialize_inbound_attachments(query, plan.event)
-
-        async for result in self.run(
-            plan.event,
-            plan.binding,
-            bound_plugins=plan.bound_plugins,
-            adapter_context=adapter_context,
-        ):
-            yield result
-
-    async def _materialize_inbound_attachments(
-        self,
-        query: pipeline_query.Query,
-        event: AgentEventEnvelope,
-    ) -> None:
-        """Persist inbound attachments into the sandbox and update event.input.attachments.
-
-        No-op when the box service is unavailable or there are no attachments.
-        On success, updates each attachment in event.input.attachments with the
-        sandbox path so runners can tell the model where to find the files.
-        """
-        box_service = getattr(self.ap, 'box_service', None)
-        if box_service is None or not getattr(box_service, 'available', False):
-            return
-
-        try:
-            materialized = await box_service.materialize_inbound_attachments(query)
-        except Exception as e:
-            # Never break the chat turn over attachment IO
-            self.ap.logger.warning(f'Inbound attachment materialization failed: {e}')
-            return
-
-        if not materialized:
-            return
-
-        # Build a lookup by name for matching
-        materialized_by_name = {att.get('name'): att for att in materialized if att.get('name')}
-
-        # Update event.input.attachments with sandbox paths
-        if event.input and event.input.attachments:
-            for attachment in event.input.attachments:
-                name = attachment.name
-                if name and name in materialized_by_name:
-                    mat = materialized_by_name[name]
-                    # Update the attachment with sandbox path
-                    attachment.path = mat.get('path')
-                    attachment.size = mat.get('size') or attachment.size
-                    attachment.mime_type = attachment.mime_type or mat.get('mime_type')
-
-        # Store materialized descriptors in query variables for downstream use
-        query.variables['_sandbox_inbound_attachments'] = materialized
-
-    def resolve_runner_id_for_telemetry(self, query: pipeline_query.Query) -> str | None:
-        """Resolve runner ID for telemetry/logging without full execution."""
-        return self.query_bridge.resolve_runner_id_for_telemetry(query)
-
-    async def try_claim_steering_from_query(
-        self,
-        query: pipeline_query.Query,
-    ) -> bool:
-        """Claim a query as steering input for an active run when possible."""
-        plan = self.query_bridge.build_plan(query)
-        event = plan.event
-        binding = plan.binding
-
-        if event.event_type != 'message.received' or not event.conversation_id:
-            return False
-
-        descriptor = await self.registry.get(binding.runner_id, plan.bound_plugins)
-        if not descriptor.supports_steering():
-            return False
-
-        target_run_id = await self._session_registry.find_steering_target(
-            conversation_id=event.conversation_id,
-            runner_id=descriptor.id,
-            bot_id=event.bot_id,
-            workspace_id=event.workspace_id,
-            thread_id=event.thread_id,
-        )
-        if target_run_id is None:
-            return False
-
-        steering_item = self._build_steering_item(event, target_run_id, descriptor.id)
-        if not await self._session_registry.enqueue_steering(target_run_id, steering_item):
-            return False
-
-        try:
-            event_log_id = await self.journal.write_event_log(
-                event=event,
-                binding=binding,
-                run_id=target_run_id,
-                runner_id=descriptor.id,
-                metadata={
-                    'steering': {
-                        'status': 'queued',
-                        'trigger_behavior': 'absorbed_into_active_run',
-                        'claimed_by_run_id': target_run_id,
-                        'claimed_runner_id': descriptor.id,
-                        'claimed_at': steering_item.get('claimed_at'),
-                    },
-                },
-            )
-            await self.journal.write_user_transcript(event, event_log_id)
-        except Exception as exc:
-            self.ap.logger.warning(
-                f'Failed to persist steering event {event.event_id} for run {target_run_id}: {exc}',
-                exc_info=True,
-            )
-
-        self.ap.logger.info(f'Claimed event {event.event_id} as steering input for run {target_run_id}')
-        return True
-
-    def _build_steering_item(
-        self,
-        event: AgentEventEnvelope,
-        run_id: str,
-        runner_id: str,
-    ) -> dict[str, typing.Any]:
-        """Build the run-scoped steering item returned by the Host pull API."""
-        return {
-            'claimed_run_id': run_id,
-            'runner_id': runner_id,
-            'claimed_at': int(time.time()),
-            'event': {
-                'event_id': event.event_id,
-                'event_type': event.event_type,
-                'event_time': event.event_time,
-                'source': event.source,
-                'source_event_type': event.source_event_type,
-                'raw_ref': event.raw_ref.model_dump(mode='json') if event.raw_ref else None,
-                'data': event.data,
-            },
-            'conversation': {
-                'conversation_id': event.conversation_id,
-                'thread_id': event.thread_id,
-                'bot_id': event.bot_id,
-                'workspace_id': event.workspace_id,
-            },
-            'actor': event.actor.model_dump(mode='json') if event.actor else None,
-            'subject': event.subject.model_dump(mode='json') if event.subject else None,
-            'input': {
-                'text': event.input.text if event.input else None,
-                'contents': [
-                    c.model_dump(mode='json') if hasattr(c, 'model_dump') else c
-                    for c in (event.input.contents if event.input else [])
-                ],
-                'attachments': [
-                    a.model_dump(mode='json') if hasattr(a, 'model_dump') else a
-                    for a in (event.input.attachments if event.input else [])
-                ],
-            },
-        }
-
-    async def _invoke_runner(
-        self,
-        descriptor: AgentRunnerDescriptor,
-        context: AgentRunContextPayload,
-    ) -> typing.AsyncGenerator[dict[str, typing.Any], None]:
-        """Compatibility delegate for older tests and internal callers."""
-        async for result in self.invoker.invoke(descriptor, context):
-            yield result
-
-    async def _next_with_deadline(
-        self,
-        gen: typing.AsyncGenerator[dict[str, typing.Any], None],
-        descriptor: AgentRunnerDescriptor,
-        context: AgentRunContextPayload,
-    ) -> dict[str, typing.Any]:
-        return await self.invoker._next_with_deadline(gen, descriptor, context)
-
-    def _remaining_deadline_seconds(
-        self,
-        context: AgentRunContextPayload,
-    ) -> float | None:
-        return self.invoker._remaining_deadline_seconds(context)
-
-    def _is_deadline_exhausted(self, context: AgentRunContextPayload) -> bool:
-        return self.invoker._is_deadline_exhausted(context)
-
-    async def _close_generator(
-        self,
-        gen: typing.AsyncGenerator[dict[str, typing.Any], None],
-        descriptor: AgentRunnerDescriptor,
-    ) -> None:
-        await self.invoker._close_generator(gen, descriptor)
-
-    async def _handle_state_updated_event(
-        self,
-        result_dict: dict[str, typing.Any],
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-    ) -> None:
-        await self.journal.handle_state_updated_event(result_dict, event, binding, descriptor)
-
-    async def _write_event_log(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        run_id: str,
-        runner_id: str,
-    ) -> str:
-        return await self.journal.write_event_log(event, binding, run_id, runner_id)
-
-    async def _write_user_transcript(
-        self,
-        event: AgentEventEnvelope,
-        event_log_id: str,
-    ) -> None:
-        await self.journal.write_user_transcript(event, event_log_id)
-
-    async def _write_assistant_transcript(
-        self,
-        result_dict: dict[str, typing.Any],
-        event: AgentEventEnvelope,
-        run_id: str,
-        runner_id: str,
-    ) -> None:
-        await self.journal.write_assistant_transcript(
-            result_dict=result_dict,
-            event=event,
-            run_id=run_id,
-            runner_id=runner_id,
-        )
@@ -1,435 +0,0 @@
-"""Persistent state store for AgentRunner protocol state.
-
-This module provides a database-backed state store for event-first Protocol v1.
-"""
-from __future__ import annotations
-
-import typing
-import json
-import threading
-from datetime import datetime
-
-import sqlalchemy
-from sqlalchemy.ext.asyncio import AsyncEngine
-from sqlalchemy import select, delete, update
-from sqlalchemy.dialects.postgresql import insert as postgresql_insert
-from sqlalchemy.dialects.sqlite import insert as sqlite_insert
-from sqlalchemy.exc import IntegrityError
-
-from .descriptor import AgentRunnerDescriptor
-from .host_models import AgentEventEnvelope, AgentBinding
-from .state_scope import (
-    VALID_STATE_SCOPES,
-    build_state_scope_key,
-    get_binding_identity,
-    normalize_state_key,
-)
-from ...entity.persistence.agent_runner_state import AgentRunnerState
-
-
-# Maximum value_json size (256KB)
-MAX_VALUE_JSON_BYTES = 256 * 1024
-
-
-class PersistentStateStore:
-    """Database-backed state store for AgentRunner protocol state.
-
-    IMPORTANT: This is HOST-OWNED protocol state, NOT plugin instance state.
-
-    This store provides:
-    1. Persistent storage across runs via database
-    2. Scope isolation by runner_id + binding_identity + scope
-    3. Policy enforcement (enable_state, state_scopes)
-    4. JSON value validation and size limits
-
-    Used by:
-    - Event-first Protocol v1 (async methods)
-    - State API handlers (get/set/delete/list)
-    """
-
-    def __init__(self, db_engine: AsyncEngine):
-        self._db_engine = db_engine
-
-    def _get_scope_key(
-        self,
-        scope: str,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-    ) -> str | None:
-        """Get scope key for given scope."""
-        return build_state_scope_key(scope, event, binding, descriptor)
-
-    def _check_scope_enabled(self, scope: str, binding: AgentBinding) -> bool:
-        """Check if scope is enabled by binding's state_policy."""
-        state_policy = binding.state_policy
-        if not state_policy.enable_state:
-            return False
-        return scope in state_policy.state_scopes
-
-    def _validate_json_value(
-        self,
-        value: typing.Any,
-        logger: typing.Any = None,
-    ) -> tuple[str | None, str | None]:
-        """Validate and serialize value to JSON.
-
-        Returns:
-            Tuple of (json_string, error_message). If error_message is not None,
-            json_string will be None.
-        """
-        try:
-            json_str = json.dumps(value, ensure_ascii=False)
-        except (TypeError, ValueError) as e:
-            return None, f'Value is not JSON-serializable: {e}'
-
-        # Check size limit
-        json_bytes = len(json_str.encode('utf-8'))
-        if json_bytes > MAX_VALUE_JSON_BYTES:
-            return None, f'Value size {json_bytes} bytes exceeds limit {MAX_VALUE_JSON_BYTES} bytes'
-
-        return json_str, None
-
-    async def _upsert_state_row(
-        self,
-        conn: typing.Any,
-        values: dict[str, typing.Any],
-    ) -> None:
-        """Insert or update a state row by the logical scope/key identity."""
-        update_values = {
-            'value_json': values['value_json'],
-            'updated_at': values['updated_at'],
-        }
-        constraint_columns = ['scope_key', 'state_key']
-        dialect_name = self._db_engine.dialect.name
-
-        if dialect_name == 'sqlite':
-            stmt = sqlite_insert(AgentRunnerState).values(**values)
-            await conn.execute(
-                stmt.on_conflict_do_update(
-                    index_elements=constraint_columns,
-                    set_=update_values,
-                )
-            )
-            return
-
-        if dialect_name == 'postgresql':
-            stmt = postgresql_insert(AgentRunnerState).values(**values)
-            await conn.execute(
-                stmt.on_conflict_do_update(
-                    index_elements=constraint_columns,
-                    set_=update_values,
-                )
-            )
-            return
-
-        try:
-            await conn.execute(sqlalchemy.insert(AgentRunnerState).values(**values))
-        except IntegrityError:
-            await conn.execute(
-                update(AgentRunnerState)
-                .where(AgentRunnerState.scope_key == values['scope_key'])
-                .where(AgentRunnerState.state_key == values['state_key'])
-                .values(**update_values)
-            )
-
-    # ========== Async DB Operations ==========
-
-    async def build_snapshot_from_event(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-    ) -> dict[str, dict[str, typing.Any]]:
-        """Build state snapshot for all scopes from event and binding.
-
-        Reads from database, respects state_policy.
-        """
-        state_policy = binding.state_policy
-
-        # If state is disabled, return all empty scopes
-        if not state_policy.enable_state:
-            return {
-                'conversation': {},
-                'actor': {},
-                'subject': {},
-                'runner': {},
-            }
-
-        snapshot: dict[str, dict[str, typing.Any]] = {
-            'conversation': {},
-            'actor': {},
-            'subject': {},
-            'runner': {},
-        }
-
-        async with self._db_engine.connect() as conn:
-            for scope in VALID_STATE_SCOPES:
-                if not self._check_scope_enabled(scope, binding):
-                    continue
-
-                scope_key = self._get_scope_key(scope, event, binding, descriptor)
-                if not scope_key:
-                    continue
-
-                # Query all state entries for this scope_key
-                result = await conn.execute(
-                    select(AgentRunnerState.state_key, AgentRunnerState.value_json)
-                    .where(AgentRunnerState.scope_key == scope_key)
-                )
-                rows = result.fetchall()
-
-                for row in rows:
-                    key = row.state_key
-                    value_json = row.value_json
-                    if value_json:
-                        try:
-                            snapshot[scope][key] = json.loads(value_json)
-                        except json.JSONDecodeError:
-                            pass  # Skip invalid JSON
-
-        # Seed external.conversation_id from event.conversation_id if not set
-        if self._check_scope_enabled('conversation', binding) and event.conversation_id:
-            if 'external.conversation_id' not in snapshot['conversation']:
-                snapshot['conversation']['external.conversation_id'] = event.conversation_id
-
-        return snapshot
-
-    async def apply_update_from_event(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-        scope: str,
-        key: str,
-        value: typing.Any,
-        logger: typing.Any = None,
-    ) -> tuple[bool, str | None]:
-        """Apply a state update from event context.
-
-        Returns:
-            Tuple of (success, error_message). If success is False, error_message
-            contains the reason.
-        """
-        state_policy = binding.state_policy
-
-        # Check if state is disabled
-        if not state_policy.enable_state:
-            return False, 'State is disabled by binding policy'
-
-        # Validate scope
-        if scope not in VALID_STATE_SCOPES:
-            return False, f'Invalid scope: {scope}'
-
-        # Check if scope is enabled
-        if not self._check_scope_enabled(scope, binding):
-            return False, f'Scope "{scope}" not enabled by binding policy'
-
-        # Map accepted key aliases
-        key = normalize_state_key(key)
-
-        # Get scope key
-        scope_key = self._get_scope_key(scope, event, binding, descriptor)
-        if not scope_key:
-            return False, f'Missing identity for scope "{scope}"'
-
-        # Validate and serialize value
-        value_json, error = self._validate_json_value(value, logger)
-        if error:
-            return False, error
-
-        # Build context fields
-        binding_identity = get_binding_identity(binding)
-
-        now = datetime.utcnow()
-        async with self._db_engine.begin() as conn:
-            await self._upsert_state_row(
-                conn,
-                {
-                    'runner_id': descriptor.id,
-                    'binding_identity': binding_identity,
-                    'scope': scope,
-                    'scope_key': scope_key,
-                    'state_key': key,
-                    'value_json': value_json,
-                    'bot_id': event.bot_id,
-                    'workspace_id': event.workspace_id,
-                    'conversation_id': event.conversation_id,
-                    'thread_id': event.thread_id,
-                    'actor_type': event.actor.actor_type if event.actor else None,
-                    'actor_id': event.actor.actor_id if event.actor else None,
-                    'subject_type': event.subject.subject_type if event.subject else None,
-                    'subject_id': event.subject.subject_id if event.subject else None,
-                    'created_at': now,
-                    'updated_at': now,
-                },
-            )
-
-        return True, None
-
-    async def state_get(
-        self,
-        scope_key: str,
-        state_key: str,
-    ) -> typing.Any:
-        """Get a single state value by scope_key and state_key.
-
-        Used by State API handlers.
-        """
-        state_key = normalize_state_key(state_key)
-
-        async with self._db_engine.connect() as conn:
-            result = await conn.execute(
-                select(AgentRunnerState.value_json)
-                .where(AgentRunnerState.scope_key == scope_key)
-                .where(AgentRunnerState.state_key == state_key)
-            )
-            row = result.first()
-
-            if not row or not row.value_json:
-                return None
-
-            try:
-                return json.loads(row.value_json)
-            except json.JSONDecodeError:
-                return None
-
-    async def state_set(
-        self,
-        scope_key: str,
-        state_key: str,
-        value: typing.Any,
-        runner_id: str,
-        binding_identity: str,
-        scope: str,
-        context: dict[str, typing.Any] | None = None,
-        logger: typing.Any = None,
-    ) -> tuple[bool, str | None]:
-        """Set a state value.
-
-        Used by State API handlers.
-        Context contains optional fields like bot_id, conversation_id, etc.
-        """
-        state_key = normalize_state_key(state_key)
-
-        # Validate and serialize value
-        value_json, error = self._validate_json_value(value, logger)
-        if error:
-            return False, error
-
-        context = context or {}
-
-        now = datetime.utcnow()
-        async with self._db_engine.begin() as conn:
-            await self._upsert_state_row(
-                conn,
-                {
-                    'runner_id': runner_id,
-                    'binding_identity': binding_identity,
-                    'scope': scope,
-                    'scope_key': scope_key,
-                    'state_key': state_key,
-                    'value_json': value_json,
-                    'bot_id': context.get('bot_id'),
-                    'workspace_id': context.get('workspace_id'),
-                    'conversation_id': context.get('conversation_id'),
-                    'thread_id': context.get('thread_id'),
-                    'actor_type': context.get('actor_type'),
-                    'actor_id': context.get('actor_id'),
-                    'subject_type': context.get('subject_type'),
-                    'subject_id': context.get('subject_id'),
-                    'created_at': now,
-                    'updated_at': now,
-                },
-            )
-
-        return True, None
-
-    async def state_delete(
-        self,
-        scope_key: str,
-        state_key: str,
-    ) -> bool:
-        """Delete a state value.
-
-        Returns True if deleted, False if not found.
-        """
-        state_key = normalize_state_key(state_key)
-
-        async with self._db_engine.begin() as conn:
-            result = await conn.execute(
-                delete(AgentRunnerState)
-                .where(AgentRunnerState.scope_key == scope_key)
-                .where(AgentRunnerState.state_key == state_key)
-            )
-            return (result.rowcount or 0) > 0
-
-    async def state_list(
-        self,
-        scope_key: str,
-        prefix: str | None = None,
-        limit: int = 100,
-    ) -> tuple[list[str], bool]:
-        """List state keys in a scope.
-
-        Returns tuple of (keys, has_more).
-        """
-        # Enforce limit cap
-        limit = min(limit, 100)
-
-        async with self._db_engine.connect() as conn:
-            query = (
-                select(AgentRunnerState.state_key)
-                .where(AgentRunnerState.scope_key == scope_key)
-                .order_by(AgentRunnerState.state_key)
-                .limit(limit + 1)  # Fetch one extra to check has_more
-            )
-
-            if prefix:
-                prefix = normalize_state_key(prefix)
-                query = query.where(
-                    AgentRunnerState.state_key.like(f'{prefix}%')
-                )
-
-            result = await conn.execute(query)
-            rows = result.fetchall()
-
-            keys = [row.state_key for row in rows[:limit]]
-            has_more = len(rows) > limit
-
-            return keys, has_more
-
-    async def clear_all(self) -> None:
-        """Clear all state entries (for testing)."""
-        async with self._db_engine.begin() as conn:
-            await conn.execute(delete(AgentRunnerState))
-
-
-# Global singleton persistent state store
-_persistent_state_store: PersistentStateStore | None = None
-_persistent_state_store_lock = threading.Lock()
-
-
-def get_persistent_state_store(db_engine: AsyncEngine | None = None) -> PersistentStateStore:
-    """Get the global persistent state store singleton.
-
-    Args:
-        db_engine: Database engine (required on first call)
-
-    Returns:
-        PersistentStateStore singleton
-    """
-    global _persistent_state_store
-    with _persistent_state_store_lock:
-        if _persistent_state_store is None:
-            if db_engine is None:
-                raise RuntimeError("db_engine required for first call to get_persistent_state_store")
-            _persistent_state_store = PersistentStateStore(db_engine)
-        return _persistent_state_store
-
-
-def reset_persistent_state_store() -> None:
-    """Reset the global persistent state store (for testing)."""
-    global _persistent_state_store
-    with _persistent_state_store_lock:
-        _persistent_state_store = None
@@ -1,56 +0,0 @@
-"""Pipeline Query bridge for AgentRunner execution."""
-
-from __future__ import annotations
-
-import dataclasses
-import typing
-
-from langbot_plugin.api.entities.builtin.pipeline import query as pipeline_query
-
-from .binding_resolver import AgentBindingResolver
-from .config_migration import ConfigMigration
-from .errors import RunnerNotFoundError
-from .host_models import AgentBinding, AgentEventEnvelope
-from .query_entry_adapter import QueryEntryAdapter
-
-
-@dataclasses.dataclass(frozen=True)
-class QueryRunPlan:
-    """Projected event-first execution request for a Query-backed run."""
-
-    event: AgentEventEnvelope
-    binding: AgentBinding
-    bound_plugins: list[str] | None
-    adapter_context: dict[str, typing.Any]
-
-
-class QueryRunBridge:
-    """Project the current Pipeline Query entry point into Protocol v1 inputs."""
-
-    binding_resolver: AgentBindingResolver
-
-    def __init__(self, binding_resolver: AgentBindingResolver):
-        self.binding_resolver = binding_resolver
-
-    def build_plan(self, query: pipeline_query.Query) -> QueryRunPlan:
-        """Build an event-first run plan from a Pipeline Query."""
-        runner_id = ConfigMigration.resolve_runner_id(query.pipeline_config)
-        if not runner_id:
-            raise RunnerNotFoundError('no runner configured')
-
-        event = QueryEntryAdapter.query_to_event(query)
-        agent_config = QueryEntryAdapter.config_to_agent_config(query, runner_id)
-        binding = self.binding_resolver.resolve_one(event, [agent_config])
-        bound_plugins = query.variables.get('_pipeline_bound_plugins')
-        adapter_context = QueryEntryAdapter.build_adapter_context(query, binding)
-
-        return QueryRunPlan(
-            event=event,
-            binding=binding,
-            bound_plugins=bound_plugins,
-            adapter_context=adapter_context,
-        )
-
-    def resolve_runner_id_for_telemetry(self, query: pipeline_query.Query) -> str | None:
-        """Resolve runner ID for telemetry/logging without full execution."""
-        return ConfigMigration.resolve_runner_id(query.pipeline_config)
@@ -1,649 +0,0 @@
-"""Query entry adapter for converting Query to event-first envelope.
-
-This adapter bridges the current Query entry point with the event-first
-Protocol v1 architecture without exposing Query internals to runners.
-"""
-from __future__ import annotations
-
-import hashlib
-import typing
-
-from langbot_plugin.api.entities.builtin.pipeline import query as pipeline_query
-from langbot_plugin.api.entities.builtin.platform import message as platform_message
-from langbot_plugin.api.entities.builtin.agent_runner.event import (
-    AgentEventContext,
-    ConversationContext,
-    ActorContext,
-    SubjectContext,
-    RawEventRef,
-)
-from langbot_plugin.api.entities.builtin.agent_runner.input import AgentInput
-from langbot_plugin.api.entities.builtin.agent_runner.delivery import DeliveryContext
-
-from .host_models import (
-    AgentConfig,
-    AgentEventEnvelope,
-    ResourcePolicy,
-    StatePolicy,
-    DeliveryPolicy,
-)
-from .config_migration import ConfigMigration
-from . import events as runner_events
-
-
-class QueryEntryAdapter:
-    """Adapter for converting Query to event-first envelope.
-
-    This adapter is responsible for:
-    - Converting Query to AgentEventEnvelope
-    - Projecting current Pipeline config to temporary AgentConfig
-    - Putting Query-only fields into adapter context
-    """
-
-    INTERNAL_PREFIX = '_'
-    SENSITIVE_PATTERNS = ('secret', 'token', 'key', 'password', 'credential', 'api_key', 'apikey')
-    PERMISSION_VARS = ('_pipeline_bound_plugins', '_authorized', '_permission')
-    EVENT_DATA_MAX_STRING_BYTES = 512
-
-    @classmethod
-    def query_to_event(
-        cls,
-        query: pipeline_query.Query,
-    ) -> AgentEventEnvelope:
-        """Convert Query to AgentEventEnvelope.
-
-        Args:
-            query: Current entry query
-
-        Returns:
-            AgentEventEnvelope for event-first processing
-        """
-        # Build event context
-        event = cls._build_event_context(query)
-
-        # Build conversation context
-        conversation = cls._build_conversation_context(query)
-
-        # Build actor context
-        actor = cls._build_actor_context(query)
-
-        # Build subject context
-        subject = cls._build_subject_context(query)
-
-        # Build input
-        input = cls._build_input(query)
-
-        # Build delivery context
-        delivery = cls._build_delivery_context(query)
-
-        # Build raw ref
-        raw_ref = cls._build_raw_ref(query)
-
-        return AgentEventEnvelope(
-            event_id=event.event_id or str(query.query_id),
-            event_type=event.event_type or runner_events.MESSAGE_RECEIVED,
-            event_time=event.event_time,
-            source="host_adapter",
-            source_event_type=event.source_event_type,
-            bot_id=query.bot_uuid,
-            workspace_id=None,  # Not available in Query
-            conversation_id=conversation.conversation_id,
-            thread_id=conversation.thread_id,
-            actor=actor,
-            subject=subject,
-            input=input,
-            delivery=delivery,
-            raw_ref=raw_ref,
-            data=event.data,
-        )
-
-    @classmethod
-    def config_to_agent_config(
-        cls,
-        query: pipeline_query.Query,
-        runner_id: str,
-    ) -> AgentConfig:
-        """Project the current Pipeline config container into target Agent config."""
-        pipeline_config = query.pipeline_config or {}
-        runner_config = ConfigMigration.resolve_runner_config(pipeline_config, runner_id)
-        agent_id = getattr(query, 'pipeline_uuid', None)
-
-        # Build resource policy from current config
-        resource_policy = ResourcePolicy(
-            allowed_model_uuids=cls._extract_allowed_models(query),
-            allowed_tool_names=cls._extract_allowed_tools(query),
-            allowed_kb_uuids=cls._extract_allowed_kbs(query),
-            allowed_skill_names=cls._extract_allowed_skills(query),
-        )
-
-        # Build state policy
-        state_policy = StatePolicy(
-            enable_state=True,
-            state_scopes=["conversation", "actor", "subject", "runner"],
-        )
-
-        # Build delivery policy
-        delivery_policy = DeliveryPolicy(
-            enable_streaming=True,
-            enable_reply=True,
-        )
-
-        return AgentConfig(
-            agent_id=agent_id,
-            runner_id=runner_id,
-            runner_config=runner_config,
-            resource_policy=resource_policy,
-            state_policy=state_policy,
-            delivery_policy=delivery_policy,
-            event_types=[runner_events.MESSAGE_RECEIVED],
-            enabled=True,
-            metadata={'source': 'pipeline_adapter'},
-        )
-
-    @classmethod
-    def build_adapter_context(
-        cls,
-        query: pipeline_query.Query,
-        binding: AgentBinding,
-    ) -> dict[str, typing.Any]:
-        """Build Query-derived fields for the current entry adapter."""
-        return {
-            'params': cls.build_params(query),
-            'query_id': getattr(query, 'query_id', None),
-        }
-
-    @classmethod
-    def build_params(cls, query: pipeline_query.Query) -> dict[str, typing.Any]:
-        """Build adapter params from Pipeline variables with host filtering."""
-        params: dict[str, typing.Any] = {}
-        variables = getattr(query, 'variables', None)
-        if not variables:
-            return params
-
-        for key, value in variables.items():
-            if key.startswith(cls.INTERNAL_PREFIX):
-                continue
-            key_lower = key.lower()
-            if any(pattern in key_lower for pattern in cls.SENSITIVE_PATTERNS):
-                continue
-            if any(key == perm_var or key.startswith(perm_var) for perm_var in cls.PERMISSION_VARS):
-                continue
-            if cls.is_json_serializable(value):
-                params[key] = value
-
-        return params
-
-    @classmethod
-    def is_json_serializable(cls, value: typing.Any) -> bool:
-        """Return whether a value can safely cross the adapter boundary as JSON."""
-        if value is None or isinstance(value, (str, int, float, bool)):
-            return True
-        if isinstance(value, (list, tuple)):
-            return all(cls.is_json_serializable(item) for item in value)
-        if isinstance(value, dict):
-            return all(
-                isinstance(k, str) and cls.is_json_serializable(v)
-                for k, v in value.items()
-            )
-        return False
-
-    # Private helper methods
-
-    @classmethod
-    def _build_event_context(
-        cls,
-        query: pipeline_query.Query,
-    ) -> AgentEventContext:
-        """Build AgentEventContext from Query."""
-        message_event = getattr(query, 'message_event', None)
-
-        event_data: dict[str, typing.Any] = {}
-        if message_event and hasattr(message_event, 'model_dump'):
-            try:
-                raw_event_data = message_event.model_dump(mode='json')
-            except TypeError:
-                raw_event_data = message_event.model_dump()
-            except Exception:
-                raw_event_data = {}
-            if isinstance(raw_event_data, dict):
-                event_data = cls._compact_event_data(raw_event_data)
-
-        source_event_type = None
-        if message_event:
-            source_event_type = getattr(message_event, 'type', None)
-
-        message_chain = getattr(query, 'message_chain', None)
-        message_id = getattr(message_chain, 'message_id', None)
-        if message_id == -1:
-            message_id = None
-
-        event_time = None
-        if message_event:
-            event_time = getattr(message_event, 'time', None)
-        if isinstance(event_time, (int, float)):
-            event_time = int(event_time)
-
-        source_event_id = str(message_id or query.query_id)
-        return AgentEventContext(
-            event_id=cls._build_scoped_event_id(query, source_event_id, event_time),
-            event_type=runner_events.MESSAGE_RECEIVED,
-            event_time=event_time,
-            source="host_adapter",
-            source_event_type=source_event_type,
-            data=event_data,
-        )
-
-    @classmethod
-    def _compact_event_data(
-        cls,
-        event_data: dict[str, typing.Any],
-    ) -> dict[str, typing.Any]:
-        """Keep only small scalar source-event metadata in event.data."""
-        compact: dict[str, typing.Any] = {}
-        for key, value in event_data.items():
-            if key == 'source_platform_object' or key.startswith('_'):
-                continue
-            if value is None or isinstance(value, (bool, int, float)):
-                compact[key] = value
-                continue
-            if isinstance(value, str):
-                if len(value.encode('utf-8')) <= cls.EVENT_DATA_MAX_STRING_BYTES:
-                    compact[key] = value
-                continue
-        return compact
-
-    @classmethod
-    def _build_scoped_event_id(
-        cls,
-        query: pipeline_query.Query,
-        source_event_id: str,
-        event_time: int | None,
-    ) -> str:
-        """Build a globally unique host event id from pipeline-local ids."""
-        launcher_type = getattr(query, 'launcher_type', None)
-        launcher_type_value = getattr(launcher_type, 'value', launcher_type) if launcher_type is not None else None
-        scope_parts = [
-            'host_adapter',
-            getattr(query, 'pipeline_uuid', None),
-            getattr(query, 'bot_uuid', None),
-            launcher_type_value,
-            getattr(query, 'launcher_id', None),
-            getattr(query, 'sender_id', None),
-            source_event_id,
-            event_time,
-        ]
-        scoped = '|'.join('' if part is None else str(part) for part in scope_parts)
-        digest = hashlib.sha256(scoped.encode('utf-8')).hexdigest()[:32]
-        return f'host:{digest}'
-
-    @classmethod
-    def _build_conversation_context(
-        cls,
-        query: pipeline_query.Query,
-    ) -> ConversationContext:
-        """Build ConversationContext from Query."""
-        # Handle launcher_type safely
-        launcher_type = getattr(query, 'launcher_type', None)
-        launcher_type_value = None
-        if launcher_type is not None:
-            launcher_type_value = getattr(launcher_type, 'value', launcher_type)
-
-        # Handle launcher_id
-        launcher_id = getattr(query, 'launcher_id', None)
-
-        # Build session_id from launcher info if available
-        session_id = None
-        if launcher_type_value and launcher_id:
-            session_id = f'{launcher_type_value}_{launcher_id}'
-
-        # Handle session and conversation_id
-        conversation_id = None
-        session = getattr(query, 'session', None)
-        if session:
-            conversation = getattr(session, 'using_conversation', None)
-            if conversation:
-                conversation_id = getattr(conversation, 'uuid', None)
-
-        if not conversation_id:
-            variables = getattr(query, 'variables', None) or {}
-            conversation_id = variables.get('conversation_id') or None
-
-        if not conversation_id:
-            conversation_id = session_id
-
-        # Handle sender_id
-        sender_id = getattr(query, 'sender_id', None)
-        if sender_id is not None:
-            sender_id = str(sender_id)
-
-        # Handle bot_uuid
-        bot_uuid = getattr(query, 'bot_uuid', None)
-
-        return ConversationContext(
-            conversation_id=str(conversation_id) if conversation_id is not None else None,
-            thread_id=None,
-            launcher_type=launcher_type_value,
-            launcher_id=launcher_id,
-            sender_id=sender_id,
-            bot_id=bot_uuid,
-            workspace_id=None,
-            session_id=session_id,
-        )
-
-    @classmethod
-    def _build_actor_context(
-        cls,
-        query: pipeline_query.Query,
-    ) -> ActorContext:
-        """Build ActorContext from Query."""
-        message_event = getattr(query, 'message_event', None)
-        sender = getattr(message_event, 'sender', None) if message_event else None
-        sender_id = getattr(query, 'sender_id', None)
-        actor_id = getattr(sender, 'id', None) if sender else None
-        if actor_id is None:
-            actor_id = sender_id
-        actor_name = sender.get_name() if sender and hasattr(sender, 'get_name') else None
-
-        return ActorContext(
-            actor_type="user",
-            actor_id=str(actor_id) if actor_id is not None else None,
-            actor_name=actor_name,
-            metadata={},
-        )
-
-    @classmethod
-    def _build_subject_context(
-        cls,
-        query: pipeline_query.Query,
-    ) -> SubjectContext:
-        """Build SubjectContext from Query."""
-        message_chain = getattr(query, 'message_chain', None)
-        message_id = getattr(message_chain, 'message_id', None) if message_chain else None
-        if message_id == -1:
-            message_id = None
-
-        query_id = getattr(query, 'query_id', None)
-
-        # Safely get launcher_type
-        launcher_type = getattr(query, 'launcher_type', None)
-        launcher_type_value = None
-        if launcher_type is not None:
-            launcher_type_value = getattr(launcher_type, 'value', launcher_type)
-
-        return SubjectContext(
-            subject_type="message",
-            subject_id=str(message_id or query_id or ''),
-            data={
-                "launcher_type": launcher_type_value,
-                "launcher_id": getattr(query, 'launcher_id', None),
-                "sender_id": str(getattr(query, 'sender_id', '')) if getattr(query, 'sender_id', None) else None,
-                "bot_uuid": getattr(query, 'bot_uuid', None),
-            },
-        )
-
-    @classmethod
-    def _build_input(
-        cls,
-        query: pipeline_query.Query,
-    ) -> AgentInput:
-        """Build AgentInput from Query."""
-        text = None
-        text_parts: list[str] = []
-        contents: list[dict[str, typing.Any]] = []
-
-        user_message = getattr(query, 'user_message', None)
-        if user_message:
-            content = getattr(user_message, 'content', None)
-            if isinstance(content, list):
-                for elem in content:
-                    elem_dict = None
-                    if hasattr(elem, 'model_dump'):
-                        elem_dict = elem.model_dump(mode='json')
-                    elif isinstance(elem, dict):
-                        elem_dict = elem
-
-                    if not isinstance(elem_dict, dict):
-                        continue
-
-                    contents.append(elem_dict)
-                    if elem_dict.get('type') == 'text':
-                        elem_text = elem_dict.get('text')
-                        if elem_text:
-                            text_parts.append(elem_text)
-            elif content is not None:
-                text = str(content)
-                contents.append({'type': 'text', 'text': text})
-
-        if not contents:
-            message_chain = getattr(query, 'message_chain', None) or []
-            for component in message_chain:
-                if isinstance(component, platform_message.Plain):
-                    component_text = getattr(component, 'text', '')
-                    if component_text:
-                        text_parts.append(component_text)
-                        contents.append({'type': 'text', 'text': component_text})
-                elif isinstance(component, platform_message.Image):
-                    image_base64 = getattr(component, 'base64', None)
-                    image_url = getattr(component, 'url', None)
-                    if image_base64:
-                        contents.append({'type': 'image_base64', 'image_base64': image_base64})
-                    elif image_url:
-                        contents.append({'type': 'image_url', 'image_url': {'url': image_url}})
-
-        if text_parts:
-            text = ''.join(text_parts)
-
-        attachments = cls._build_attachments(query, contents)
-
-        return AgentInput(
-            text=text,
-            contents=contents,
-            attachments=attachments,
-        )
-
-    @classmethod
-    def _build_attachments(
-        cls,
-        query: pipeline_query.Query,
-        contents: list[dict[str, typing.Any]],
-    ) -> list[dict[str, typing.Any]]:
-        """Extract attachments from query."""
-        attachments: list[dict[str, typing.Any]] = []
-        seen_keys: dict[tuple[str, str, str], set[str]] = {}
-
-        def add_attachment(attachment: dict[str, typing.Any]) -> None:
-            key = cls._attachment_dedupe_key(attachment)
-            if key is not None:
-                source = str(attachment.get('source') or '')
-                sources = seen_keys.setdefault(key, set())
-                if source and sources and source not in sources:
-                    return
-                if source:
-                    sources.add(source)
-            attachments.append(attachment)
-
-        for elem in contents:
-            elem_type = elem.get('type')
-
-            if elem_type == 'image_url':
-                image_url = elem.get('image_url') or {}
-                add_attachment({
-                    'type': 'image',
-                    'source': 'url',
-                    'url': image_url.get('url') if isinstance(image_url, dict) else str(image_url),
-                })
-            elif elem_type == 'image_base64':
-                add_attachment({
-                    'type': 'image',
-                    'source': 'base64',
-                    'content': elem.get('image_base64'),
-                })
-            elif elem_type == 'file_url':
-                add_attachment({
-                    'type': 'file',
-                    'source': 'url',
-                    'url': elem.get('file_url'),
-                    'name': elem.get('file_name'),
-                })
-            elif elem_type == 'file_base64':
-                add_attachment({
-                    'type': 'file',
-                    'source': 'base64',
-                    'content': elem.get('file_base64'),
-                    'name': elem.get('file_name'),
-                })
-
-        message_chain = getattr(query, 'message_chain', None)
-        if message_chain:
-            try:
-                message_components = iter(message_chain)
-            except TypeError:
-                message_components = iter(())
-
-            for component in message_components:
-                if isinstance(component, platform_message.Image):
-                    image_id = component.image_id or None
-                    image_url = component.url or None
-                    image_base64 = component.base64 or None
-                    add_attachment({
-                        'type': 'image',
-                        'source': 'message_chain',
-                        'id': image_id,
-                        'url': image_url,
-                        'content': image_base64,
-                    })
-                elif isinstance(component, platform_message.File):
-                    add_attachment({
-                        'type': 'file',
-                        'source': 'message_chain',
-                        'id': component.id or None,
-                        'name': component.name or None,
-                        'url': component.url or None,
-                        'content': component.base64 or None,
-                    })
-                elif isinstance(component, platform_message.Voice):
-                    add_attachment({
-                        'type': 'voice',
-                        'source': 'message_chain',
-                        'id': component.voice_id or None,
-                        'url': component.url or None,
-                        'content': component.base64 or None,
-                    })
-
-        return attachments
-
-    @classmethod
-    def _attachment_dedupe_key(
-        cls,
-        attachment: dict[str, typing.Any],
-    ) -> tuple[str, str, str] | None:
-        """Return a stable key for the same attachment across content sources."""
-        attachment_type = attachment.get('type')
-        if not attachment_type:
-            return None
-        for field in ('id', 'url', 'content'):
-            value = attachment.get(field)
-            if value:
-                if field == 'content':
-                    value = hashlib.sha256(str(value).encode('utf-8')).hexdigest()
-                return str(attachment_type), field, str(value)
-        return None
-
-    @classmethod
-    def _build_delivery_context(
-        cls,
-        query: pipeline_query.Query,
-    ) -> DeliveryContext:
-        """Build DeliveryContext from Query."""
-        message_chain = getattr(query, 'message_chain', None)
-        return DeliveryContext(
-            surface="platform",
-            reply_target={
-                "message_id": getattr(message_chain, 'message_id', None),
-            },
-            supports_streaming=True,
-            supports_edit=False,
-            supports_reaction=False,
-            platform_capabilities={},
-        )
-
-    @classmethod
-    def _build_raw_ref(
-        cls,
-        query: pipeline_query.Query,
-    ) -> RawEventRef | None:
-        """Build RawEventRef from Query."""
-        # For now, we don't store raw event payload
-        return None
-
-    @classmethod
-    def _extract_allowed_models(
-        cls,
-        query: pipeline_query.Query,
-    ) -> list[str] | None:
-        """Extract allowed model UUIDs from query."""
-        model_uuids: list[str] = []
-        model_uuid = getattr(query, 'use_llm_model_uuid', None)
-        if model_uuid:
-            model_uuids.append(model_uuid)
-
-        variables = getattr(query, 'variables', None) or {}
-        for fallback_uuid in variables.get('_fallback_model_uuids', []) or []:
-            if fallback_uuid and fallback_uuid not in model_uuids:
-                model_uuids.append(fallback_uuid)
-
-        return model_uuids or None
-
-    @classmethod
-    def _extract_allowed_tools(
-        cls,
-        query: pipeline_query.Query,
-    ) -> list[str] | None:
-        """Extract allowed tool names from query."""
-        use_funcs = getattr(query, 'use_funcs', None)
-        if not use_funcs:
-            return None
-        try:
-            tool_names = []
-            for func in use_funcs:
-                if isinstance(func, dict):
-                    name = func.get('name')
-                elif hasattr(func, 'name'):
-                    name = func.name
-                else:
-                    continue
-                if name:
-                    tool_names.append(name)
-            return tool_names if tool_names else None
-        except (TypeError, AttributeError):
-            return None
-
-    @classmethod
-    def _extract_allowed_kbs(
-        cls,
-        query: pipeline_query.Query,
-    ) -> list[str] | None:
-        """Extract allowed knowledge base UUIDs from query."""
-        variables = getattr(query, 'variables', None)
-        if not variables:
-            return None
-        kb_uuids = variables.get('_knowledge_base_uuids')
-        if kb_uuids:
-            return kb_uuids
-        return None
-
-    @classmethod
-    def _extract_allowed_skills(
-        cls,
-        query: pipeline_query.Query,
-    ) -> list[str] | None:
-        """Extract pipeline-visible skill names from query."""
-        variables = getattr(query, 'variables', None)
-        if not variables or '_pipeline_bound_skills' not in variables:
-            return None
-        bound_skills = variables.get('_pipeline_bound_skills')
-        if bound_skills is None:
-            return None
-        if not isinstance(bound_skills, list):
-            return []
-        return [str(skill_name) for skill_name in bound_skills if skill_name]
@@ -1,273 +0,0 @@
-"""Agent runner registry for discovering and caching runner descriptors."""
-
-from __future__ import annotations
-
-import typing
-import asyncio
-
-from langbot_plugin.api.entities.builtin.agent_runner.manifest import (
-    AgentRunnerManifest,
-)
-
-from ...core import app
-from .descriptor import AgentRunnerDescriptor
-from .id import parse_runner_id, format_runner_id
-from .errors import RunnerNotFoundError, RunnerNotAuthorizedError
-
-
-class AgentRunnerRegistry:
-    """Registry for discovering and managing agent runners.
-
-    Responsibilities:
-    - Discover runners from plugin runtime via LIST_AGENT_RUNNERS
-    - Validate runner manifests (kind, metadata, spec)
-    - Cache discovered runners for performance
-    - Filter runners by bound plugins
-    - Handle manifest errors gracefully (log warning, skip runner)
-    """
-
-    ap: app.Application
-
-    _cache: dict[str, AgentRunnerDescriptor] | None
-    """Cached runner descriptors keyed by runner ID"""
-
-    _cache_lock: asyncio.Lock
-    """Lock for cache refresh operations"""
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-        self._cache = None
-        self._cache_lock = asyncio.Lock()
-
-    async def _discover_runners(self) -> dict[str, AgentRunnerDescriptor]:
-        """Discover runners from plugin runtime.
-
-        Always discovers ALL runners (no bound_plugins filter).
-        The cache should contain unfiltered discovery results.
-
-        Returns:
-            Dict of runner descriptors keyed by runner ID
-        """
-        if not self.ap.plugin_connector.is_enable_plugin:
-            return {}
-
-        runners: dict[str, AgentRunnerDescriptor] = {}
-
-        try:
-            # Always list all runners (bound_plugins=None)
-            plugin_runners = await self.ap.plugin_connector.list_agent_runners(None)
-
-            for runner_data in plugin_runners:
-                try:
-                    descriptor = self._validate_and_build_descriptor(runner_data)
-                    if descriptor is not None:
-                        runners[descriptor.id] = descriptor
-                except Exception as e:
-                    plugin_author = runner_data.get('plugin_author', 'unknown')
-                    plugin_name = runner_data.get('plugin_name', 'unknown')
-                    runner_name = runner_data.get('runner_name', 'unknown')
-                    self.ap.logger.warning(
-                        f'Invalid runner manifest for plugin:{plugin_author}/{plugin_name}/{runner_name}: {e}'
-                    )
-                    continue
-
-        except Exception as e:
-            self.ap.logger.warning(f'Failed to list agent runners from plugin runtime: {e}')
-            return {}
-
-        return runners
-
-    def _validate_and_build_descriptor(self, runner_data: dict[str, typing.Any]) -> AgentRunnerDescriptor | None:
-        """Validate runner manifest and build descriptor.
-
-        Args:
-            runner_data: Raw runner data from plugin runtime with fields:
-                - plugin_author, plugin_name, runner_name
-                - manifest (typed AgentRunnerManifest)
-
-        Returns:
-            AgentRunnerDescriptor if valid, None if invalid
-        """
-        plugin_author = runner_data.get('plugin_author', '')
-        plugin_name = runner_data.get('plugin_name', '')
-        runner_name = runner_data.get('runner_name', '')
-
-        if not plugin_author or not plugin_name or not runner_name:
-            return None
-
-        manifest = runner_data.get('manifest', {})
-        runner_id = format_runner_id(
-            source='plugin',
-            plugin_author=plugin_author,
-            plugin_name=plugin_name,
-            runner_name=runner_name,
-        )
-
-        typed_manifest = AgentRunnerManifest.model_validate(manifest)
-        config_schema = [
-            item.model_dump(mode='json') for item in typed_manifest.config_schema
-        ]
-
-        return AgentRunnerDescriptor(
-            id=runner_id,
-            source='plugin',
-            label=typed_manifest.label,
-            description=typed_manifest.description,
-            plugin_author=plugin_author,
-            plugin_name=plugin_name,
-            runner_name=runner_name,
-            plugin_version=runner_data.get('plugin_version'),
-            config_schema=config_schema,
-            capabilities=typed_manifest.capabilities,
-            permissions=typed_manifest.permissions,
-            raw_manifest=manifest,
-        )
-
-    async def refresh(self) -> None:
-        """Refresh runner cache.
-
-        Always discovers ALL runners (no bound_plugins filter).
-        The cache contains unfiltered discovery results.
-        """
-        async with self._cache_lock:
-            self._cache = await self._discover_runners()
-
-    async def list_runners(
-        self,
-        bound_plugins: list[str] | None = None,
-        use_cache: bool = True,
-    ) -> list[AgentRunnerDescriptor]:
-        """List available runners.
-
-        Args:
-            bound_plugins: Optional filter for bound plugins (applied locally)
-            use_cache: Use cached data if available
-
-        Returns:
-            List of runner descriptors
-        """
-        if use_cache and self._cache is not None:
-            # Filter from cache
-            return self._filter_runners_by_bound_plugins(self._cache, bound_plugins)
-
-        # Discover fresh (always full list)
-        runners = await self._discover_runners()
-
-        # Update cache (full list, unfiltered)
-        async with self._cache_lock:
-            self._cache = runners
-
-        # Filter locally
-        return self._filter_runners_by_bound_plugins(runners, bound_plugins)
-
-    def _filter_runners_by_bound_plugins(
-        self,
-        runners: dict[str, AgentRunnerDescriptor],
-        bound_plugins: list[str] | None,
-    ) -> list[AgentRunnerDescriptor]:
-        """Filter runners by bound plugins.
-
-        Args:
-            runners: Dict of runner descriptors
-            bound_plugins: Optional filter (None means all plugins allowed)
-
-        Returns:
-            Filtered list of runner descriptors
-        """
-        if bound_plugins is None:
-            # All plugins allowed
-            return list(runners.values())
-
-        allowed_plugin_ids = set(bound_plugins)
-        filtered = []
-        for descriptor in runners.values():
-            plugin_id = descriptor.get_plugin_id()
-            if plugin_id in allowed_plugin_ids:
-                filtered.append(descriptor)
-
-        return filtered
-
-    async def get(
-        self,
-        runner_id: str,
-        bound_plugins: list[str] | None = None,
-    ) -> AgentRunnerDescriptor:
-        """Get a specific runner descriptor.
-
-        Args:
-            runner_id: Runner ID to lookup
-            bound_plugins: Optional bound plugins filter
-
-        Returns:
-            AgentRunnerDescriptor
-
-        Raises:
-            RunnerNotFoundError: If runner not found
-            RunnerNotAuthorizedError: If runner not in bound plugins
-        """
-        # Parse and validate runner ID format
-        try:
-            parse_runner_id(runner_id)
-        except ValueError as e:
-            raise RunnerNotFoundError(runner_id) from e
-
-        # Get from cache or discover (always full list)
-        if self._cache is None:
-            await self.refresh()
-
-        if self._cache is None:
-            raise RunnerNotFoundError(runner_id)
-
-        descriptor = self._cache.get(runner_id)
-        if descriptor is None:
-            raise RunnerNotFoundError(runner_id)
-
-        # Check authorization
-        if bound_plugins is not None:
-            plugin_id = descriptor.get_plugin_id()
-            if plugin_id not in bound_plugins:
-                raise RunnerNotAuthorizedError(runner_id, bound_plugins)
-
-        return descriptor
-
-    async def get_runner_metadata_for_pipeline(self) -> list[dict[str, typing.Any]]:
-        """Get runner metadata for pipeline configuration UI.
-
-        Returns runner options and their config schemas for the DynamicForm.
-        """
-        # Get all runners (no bound plugin filter for metadata listing)
-        runners = await self.list_runners(bound_plugins=None)
-
-        options = []
-        stages = []
-
-        for descriptor in runners:
-            config_schema = []
-            for index, config_item in enumerate(descriptor.config_schema):
-                item = dict(config_item)
-                if not item.get('id'):
-                    item_name = item.get('name') or str(index)
-                    item['id'] = f'{descriptor.id}.{item_name}'
-                config_schema.append(item)
-
-            # Add runner option
-            options.append(
-                {
-                    'name': descriptor.id,
-                    'label': descriptor.label,
-                    'description': descriptor.description,
-                }
-            )
-
-            # Add config schema as stage if not empty
-            if descriptor.config_schema:
-                stages.append(
-                    {
-                        'name': descriptor.id,
-                        'label': descriptor.label,
-                        'description': descriptor.description,
-                        'config': config_schema,
-                    }
-                )
-
-        return options, stages
@@ -1,319 +0,0 @@
-"""Agent resource builder for constructing authorized resources."""
-from __future__ import annotations
-
-import typing
-
-from ...core import app
-from .descriptor import AgentRunnerDescriptor
-from .context_builder import (
-    AgentResources,
-    ModelResource,
-    ToolResource,
-    KnowledgeBaseResource,
-    SkillResource,
-    StorageResource,
-)
-from . import config_schema
-from .host_models import AgentEventEnvelope, AgentBinding
-
-
-class AgentResourceBuilder:
-    """Builder for constructing run-scoped AgentResources with permission filtering.
-
-    Responsibilities:
-    - Apply manifest permissions intersected with binding resource policy
-    - Build models list from authorized models
-    - Build tools list from bound plugins/MCP servers
-    - Build knowledge_bases list from config
-    - Build storage access summary
-
-    Note: This only builds the resource declaration. The actual proxy actions
-    in handler.py must still validate against ctx.resources at runtime.
-
-    Resource field names match the plugin SDK payload:
-    - ModelResource: model_id, model_type, provider
-    - ToolResource: tool_name, tool_type, description
-    - KnowledgeBaseResource: kb_id, kb_name, kb_type
-    - SkillResource: skill_name, display_name, description
-    - StorageResource: plugin_storage, workspace_storage
-    """
-
-    ap: app.Application
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-
-    async def build_resources_from_binding(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-    ) -> AgentResources:
-        """Build AgentResources from event and binding.
-
-        This is the main entry point for Protocol v1.
-
-        Args:
-            event: Event envelope
-            binding: Agent binding with resource policy
-            descriptor: Runner descriptor with capabilities, permissions, and config schema
-
-        Returns:
-            AgentResources dict with filtered resource lists
-        """
-        resource_policy = binding.resource_policy
-        runner_config = binding.runner_config
-        manifest_perms = descriptor.permissions
-
-        # Build each resource category
-        models = await self._build_models_from_binding(
-            manifest_perms, resource_policy, descriptor, runner_config
-        )
-        tools = await self._build_tools_from_binding(
-            manifest_perms, resource_policy, descriptor
-        )
-        knowledge_bases = await self._build_knowledge_bases_from_binding(
-            manifest_perms, resource_policy, descriptor, runner_config
-        )
-        skills = self._build_skills_from_binding(
-            resource_policy, descriptor
-        )
-        storage = self._build_storage_from_binding(manifest_perms, binding)
-
-        return {
-            'models': models,
-            'tools': tools,
-            'knowledge_bases': knowledge_bases,
-            'skills': skills,
-            'storage': storage,
-            'platform_capabilities': {},  # Reserved for EBA
-        }
-
-    async def _build_models_from_binding(
-        self,
-        manifest_perms: typing.Any,
-        resource_policy: typing.Any,
-        descriptor: AgentRunnerDescriptor,
-        runner_config: dict[str, typing.Any],
-    ) -> list[ModelResource]:
-        """Build models list from binding."""
-        models: list[ModelResource] = []
-        seen_model_ids: set[str] = set()
-
-        model_perms = set(manifest_perms.models)
-        include_llm = bool({'invoke', 'stream'} & model_perms)
-        include_rerank = 'rerank' in model_perms
-        llm_operations = [operation for operation in ('invoke', 'stream') if operation in model_perms]
-        if not include_llm and not include_rerank:
-            return models
-
-        # Get additional model UUID grants from resource policy.
-        allowed_uuids = resource_policy.allowed_model_uuids
-
-        # Add model resources from Agent/runner config schema
-        await self._append_config_declared_model_resources(
-            models=models,
-            seen_model_ids=seen_model_ids,
-            descriptor=descriptor,
-            runner_config=runner_config,
-            include_llm=include_llm,
-            include_rerank=include_rerank,
-            llm_operations=llm_operations,
-        )
-
-        # Add explicitly allowed models
-        if allowed_uuids and include_llm:
-            for model_uuid in allowed_uuids:
-                await self._append_llm_model_resource(models, seen_model_ids, model_uuid, llm_operations)
-
-        return models
-
-    async def _build_tools_from_binding(
-        self,
-        manifest_perms: typing.Any,
-        resource_policy: typing.Any,
-        descriptor: AgentRunnerDescriptor,
-    ) -> list[ToolResource]:
-        """Build tools list from binding."""
-        tools: list[ToolResource] = []
-        tool_perms = set(manifest_perms.tools)
-        if not ({'detail', 'call'} & tool_perms):
-            return tools
-
-        if not config_schema.uses_host_tools(descriptor):
-            return tools
-
-        # Get tool names from resource policy
-        allowed_names = resource_policy.allowed_tool_names
-        tool_operations = [operation for operation in ('detail', 'call') if operation in tool_perms]
-
-        # Prefill full tool schema (best-effort) so runners can build LLM tool
-        # definitions without a per-tool get_tool_detail round-trip. Degrades to
-        # None when no tool manager is available.
-        get_tool_schema = getattr(getattr(self.ap, 'tool_mgr', None), 'get_tool_schema', None)
-        if allowed_names:
-            for tool_name in allowed_names:
-                if get_tool_schema is not None:
-                    description, parameters = await get_tool_schema(tool_name)
-                else:
-                    description, parameters = None, None
-                tools.append({
-                    'tool_name': tool_name,
-                    'tool_type': None,
-                    'description': description,
-                    'operations': tool_operations,
-                    'parameters': parameters,
-                })
-
-        return tools
-
-    async def _build_knowledge_bases_from_binding(
-        self,
-        manifest_perms: typing.Any,
-        resource_policy: typing.Any,
-        descriptor: AgentRunnerDescriptor,
-        runner_config: dict[str, typing.Any],
-    ) -> list[KnowledgeBaseResource]:
-        """Build knowledge bases list from binding."""
-        kb_resources: list[KnowledgeBaseResource] = []
-        kb_perms = set(manifest_perms.knowledge_bases)
-        if not ({'list', 'retrieve'} & kb_perms):
-            return kb_resources
-        kb_operations = [operation for operation in ('list', 'retrieve') if operation in kb_perms]
-
-        if not config_schema.uses_host_knowledge_bases(descriptor):
-            return kb_resources
-
-        # Get KB UUID grants from schema-defined config fields.
-        kb_uuids = config_schema.extract_knowledge_base_uuids(descriptor, runner_config)
-
-        # Also include resource policy grants.
-        allowed_uuids = resource_policy.allowed_kb_uuids
-        if allowed_uuids:
-            kb_uuids = list(dict.fromkeys([*kb_uuids, *allowed_uuids]))
-
-        for kb_uuid in kb_uuids:
-            try:
-                kb = await self.ap.rag_mgr.get_knowledge_base_by_uuid(kb_uuid)
-                if kb:
-                    kb_resources.append({
-                        'kb_id': kb_uuid,
-                        'kb_name': kb.get_name(),
-                        'kb_type': kb.knowledge_base_entity.kb_type if hasattr(kb.knowledge_base_entity, 'kb_type') else None,
-                        'operations': kb_operations,
-                    })
-            except Exception as e:
-                self.ap.logger.warning(f'Failed to build knowledge base resource {kb_uuid}: {e}')
-
-        return kb_resources
-
-    def _build_skills_from_binding(
-        self,
-        resource_policy: typing.Any,
-        descriptor: AgentRunnerDescriptor,
-    ) -> list[SkillResource]:
-        """Build pipeline-visible skill resource facts.
-
-        Skills are exposed as authorized tools (activate / register_skill / native
-        exec), so skill facts are surfaced to every run that has a skill manager,
-        not gated by the ``skill_authoring`` capability. The capability is now a
-        semantic declaration only.
-        """
-        skill_mgr = getattr(self.ap, 'skill_mgr', None)
-        if skill_mgr is None:
-            return []
-
-        loaded_skills = getattr(skill_mgr, 'skills', {}) or {}
-        allowed_names = resource_policy.allowed_skill_names
-        if allowed_names is None:
-            names = sorted(loaded_skills.keys())
-        else:
-            names = sorted(name for name in allowed_names if name in loaded_skills)
-
-        skills: list[SkillResource] = []
-        for skill_name in names:
-            skill_data = loaded_skills.get(skill_name) or {}
-            skills.append({
-                'skill_name': skill_name,
-                'display_name': skill_data.get('display_name') or skill_data.get('name') or skill_name,
-                'description': skill_data.get('description') or None,
-            })
-        return skills
-
-    def _build_storage_from_binding(
-        self,
-        manifest_perms: typing.Any,
-        binding: AgentBinding,
-    ) -> StorageResource:
-        """Build storage access summary from manifest and binding policy."""
-        resource_policy = binding.resource_policy
-        storage_perms = set(manifest_perms.storage)
-
-        return {
-            'plugin_storage': 'plugin' in storage_perms and resource_policy.allow_plugin_storage,
-            'workspace_storage': 'workspace' in storage_perms and resource_policy.allow_workspace_storage,
-        }
-
-    async def _append_config_declared_model_resources(
-        self,
-        models: list[ModelResource],
-        seen_model_ids: set[str],
-        descriptor: AgentRunnerDescriptor,
-        runner_config: dict[str, typing.Any],
-        include_llm: bool,
-        include_rerank: bool,
-        llm_operations: list[str],
-    ) -> None:
-        """Authorize model-like values selected through DynamicForm fields."""
-        for model_type, model_uuid in config_schema.iter_config_model_refs(descriptor, runner_config):
-            if model_type == 'llm' and include_llm:
-                await self._append_llm_model_resource(models, seen_model_ids, model_uuid, llm_operations)
-            elif model_type == 'rerank' and include_rerank:
-                await self._append_rerank_model_resource(models, seen_model_ids, model_uuid)
-
-    async def _append_llm_model_resource(
-        self,
-        models: list[ModelResource],
-        seen_model_ids: set[str],
-        model_uuid: str | None,
-        operations: list[str],
-    ) -> None:
-        """Append an LLM model resource if it exists and has not been added."""
-        if not model_uuid or model_uuid == '__none__' or model_uuid in seen_model_ids:
-            return
-
-        try:
-            model = await self.ap.model_mgr.get_model_by_uuid(model_uuid)
-            if model and model.model_entity:
-                models.append({
-                    'model_id': model_uuid,
-                    'model_type': getattr(model.model_entity, 'model_type', None),
-                    'provider': getattr(model.provider_entity, 'name', None) if hasattr(model, 'provider_entity') else None,
-                    'operations': operations,
-                })
-                seen_model_ids.add(model_uuid)
-        except Exception as e:
-            self.ap.logger.warning(f'Failed to build LLM model resource {model_uuid}: {e}')
-
-    async def _append_rerank_model_resource(
-        self,
-        models: list[ModelResource],
-        seen_model_ids: set[str],
-        model_uuid: str | None,
-    ) -> None:
-        """Append a rerank model resource if it exists and has not been added."""
-        if not model_uuid or model_uuid == '__none__' or model_uuid in seen_model_ids:
-            return
-
-        try:
-            model = await self.ap.model_mgr.get_rerank_model_by_uuid(model_uuid)
-            if model and model.model_entity:
-                models.append({
-                    'model_id': model_uuid,
-                    'model_type': getattr(model.model_entity, 'model_type', 'rerank') or 'rerank',
-                    'provider': getattr(model.provider_entity, 'name', None) if hasattr(model, 'provider_entity') else None,
-                    'operations': ['rerank'],
-                })
-                seen_model_ids.add(model_uuid)
-        except Exception as e:
-            self.ap.logger.warning(f'Failed to build rerank model resource {model_uuid}: {e}')
@@ -1,234 +0,0 @@
-"""Agent result normalizer for converting AgentRunResult to Pipeline messages."""
-from __future__ import annotations
-
-import typing
-
-import pydantic
-from langbot_plugin.api.entities.builtin.agent_runner.result import (
-    ActionRequestedPayload,
-    MessageCompletedPayload,
-    MessageDeltaPayload,
-    RunCompletedPayload,
-    RunFailedPayload,
-    StateUpdatedPayload,
-    ToolCallCompletedPayload,
-    ToolCallStartedPayload,
-)
-from langbot_plugin.api.entities.builtin.provider import message as provider_message
-
-from ...core import app
-from .descriptor import AgentRunnerDescriptor
-from .errors import RunnerExecutionError, RunnerProtocolError
-
-
-# Maximum size for a single result payload (prevent memory exhaustion)
-MAX_RESULT_SIZE_BYTES = 1024 * 1024  # 1 MB
-
-STRICT_RESULT_PAYLOADS: dict[str, type[pydantic.BaseModel]] = {
-    'message.delta': MessageDeltaPayload,
-    'message.completed': MessageCompletedPayload,
-    'tool.call.started': ToolCallStartedPayload,
-    'tool.call.completed': ToolCallCompletedPayload,
-    'state.updated': StateUpdatedPayload,
-    'action.requested': ActionRequestedPayload,
-    'run.completed': RunCompletedPayload,
-    'run.failed': RunFailedPayload,
-}
-
-
-class AgentResultNormalizer:
-    """Normalizer for converting AgentRunResult to Pipeline messages.
-
-    Responsibilities:
-    - Accept only supported result types (message.delta, message.completed, etc.)
-    - Map message.delta -> MessageChunk
-    - Map message.completed -> Message
-    - Map run.completed (with message) -> Message
-    - Handle run.failed as controlled error
-    - Ignore unknown types with warning
-    - Validate result size
-    - Validate message schema
-
-    Accepted result types:
-    - message.delta
-    - message.completed
-    - tool.call.started
-    - tool.call.completed
-    - state.updated
-    - run.completed
-    - run.failed
-    - action.requested (log only, don't execute)
-    """
-
-    ap: app.Application
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-
-    async def normalize(
-        self,
-        result_dict: dict[str, typing.Any],
-        descriptor: AgentRunnerDescriptor,
-    ) -> provider_message.Message | provider_message.MessageChunk | None:
-        """Normalize AgentRunResult to Message or MessageChunk.
-
-        Args:
-            result_dict: Raw result dict from plugin runtime
-            descriptor: Runner descriptor for error context
-
-        Returns:
-            Message, MessageChunk, or None (for non-message events)
-
-        Raises:
-            RunnerExecutionError: On run.failed
-            RunnerProtocolError: On invalid result format
-        """
-        # Validate result type
-        result_type = result_dict.get('type')
-        if not result_type:
-            raise RunnerProtocolError(descriptor.id, 'Missing result type')
-
-        # Validate result size
-        try:
-            import json
-            result_json = json.dumps(result_dict)
-            if len(result_json) > MAX_RESULT_SIZE_BYTES:
-                self.ap.logger.warning(
-                    f'Runner {descriptor.id} result too large ({len(result_json)} bytes), truncating'
-                )
-                # Truncate content if possible
-                data = result_dict.get('data', {})
-                if 'chunk' in data or 'message' in data:
-                    content = data.get('chunk', {}).get('content', '') or data.get('message', {}).get('content', '')
-                    if isinstance(content, str) and len(content) > 10000:
-                        # Keep reasonable length
-                        data['chunk'] = {'role': 'assistant', 'content': content[:10000] + '...[truncated]'}
-        except Exception as e:
-            self.ap.logger.warning(f'Failed to validate runner {descriptor.id} result size: {e}')
-
-        # Handle each result type
-        data = result_dict.get('data', {})
-
-        if not self.validate_payload(result_type, data, descriptor):
-            return None
-
-        if result_type == 'message.delta':
-            return self._normalize_message_delta(data, descriptor)
-
-        elif result_type == 'message.completed':
-            return self._normalize_message_completed(data, descriptor)
-
-        elif result_type == 'tool.call.started':
-            # Log only, don't yield to pipeline
-            self.ap.logger.debug(
-                f'Runner {descriptor.id} tool call started: {data.get("tool_name", "unknown")}'
-            )
-            return None
-
-        elif result_type == 'tool.call.completed':
-            # Log only, don't yield to pipeline
-            self.ap.logger.debug(
-                f'Runner {descriptor.id} tool call completed: {data.get("tool_name", "unknown")}'
-            )
-            return None
-
-        elif result_type == 'state.updated':
-            # Log for telemetry, don't yield to pipeline
-            # Orchestrator already handles the actual PersistentStateStore update.
-            scope = data.get('scope', 'unknown')
-            key = data.get('key', 'unknown')
-            value_repr = repr(data.get('value', '...'))[:100]  # Truncate for log
-            self.ap.logger.debug(
-                f'Runner {descriptor.id} state.updated logged: scope={scope}, key={key}, value={value_repr}'
-            )
-            return None
-
-        elif result_type == 'run.completed':
-            # May include final message
-            if 'message' in data:
-                return self._normalize_message_completed(data, descriptor)
-            # If no message, it's just completion signal
-            return None
-
-        elif result_type == 'run.failed':
-            error_msg = data.get('error', 'Unknown error')
-            error_code = data.get('code', 'unknown')
-            retryable = data.get('retryable', False)
-            raise RunnerExecutionError(
-                descriptor.id,
-                f'{error_msg} (code: {error_code})',
-                retryable=retryable,
-            )
-
-        elif result_type == 'action.requested':
-            # Reserved for EBA - log only, don't execute
-            self.ap.logger.info(
-                f'Runner {descriptor.id} requested action (not executed in current phase): '
-                f'{data.get("action", "unknown")}'
-            )
-            return None
-
-        else:
-            # Unknown type - warn and ignore.
-            self.ap.logger.warning(
-                f'Runner {descriptor.id} returned unknown result type: {result_type}. '
-                f'Expected supported types (message.delta, message.completed, run.completed, run.failed, etc.)'
-            )
-            return None
-
-    def validate_payload(
-        self,
-        result_type: str,
-        data: typing.Any,
-        descriptor: AgentRunnerDescriptor,
-    ) -> bool:
-        """Validate typed payloads that affect Host state or delivery.
-
-        Tool-call telemetry stays intentionally loose so older runners can keep
-        emitting diagnostic fields. Unknown result types are handled by the
-        caller and are not validated here.
-        """
-        payload_model = STRICT_RESULT_PAYLOADS.get(result_type)
-        if payload_model is None:
-            return True
-
-        try:
-            payload_model.model_validate(data)
-            return True
-        except Exception as e:
-            self.ap.logger.warning(
-                f'Runner {descriptor.id} returned invalid {result_type} payload; dropping result: {e}'
-            )
-            return False
-
-    def _normalize_message_delta(
-        self,
-        data: dict[str, typing.Any],
-        descriptor: AgentRunnerDescriptor,
-    ) -> provider_message.MessageChunk:
-        """Normalize message.delta to MessageChunk."""
-        chunk_data = data.get('chunk', {})
-        if not chunk_data:
-            raise RunnerProtocolError(descriptor.id, 'message.delta missing chunk data')
-
-        try:
-            chunk = provider_message.MessageChunk.model_validate(chunk_data)
-            return chunk
-        except Exception as e:
-            raise RunnerProtocolError(descriptor.id, f'Invalid chunk schema: {e}')
-
-    def _normalize_message_completed(
-        self,
-        data: dict[str, typing.Any],
-        descriptor: AgentRunnerDescriptor,
-    ) -> provider_message.Message:
-        """Normalize message.completed to Message."""
-        message_data = data.get('message', {})
-        if not message_data:
-            raise RunnerProtocolError(descriptor.id, 'message.completed missing message data')
-
-        try:
-            msg = provider_message.Message.model_validate(message_data)
-            return msg
-        except Exception as e:
-            raise RunnerProtocolError(descriptor.id, f'Invalid message schema: {e}')
@@ -1,412 +0,0 @@
-"""Run-side effects for AgentRunner executions."""
-
-from __future__ import annotations
-
-import typing
-
-from ...core import app
-from .descriptor import AgentRunnerDescriptor
-from .errors import RunnerProtocolError
-from .host_models import AgentBinding, AgentEventEnvelope
-from .persistent_state_store import PersistentStateStore, get_persistent_state_store
-from .run_ledger_store import RunLedgerStore
-
-
-class AgentRunJournal:
-    """Persist run events, transcript records, and state updates."""
-
-    ap: app.Application
-
-    _persistent_state_store: PersistentStateStore | None
-    _run_ledger_store: RunLedgerStore | None
-
-    def __init__(self, ap: app.Application):
-        self.ap = ap
-        self._persistent_state_store = None
-        self._run_ledger_store = None
-
-    def _get_run_ledger_store(self) -> RunLedgerStore:
-        if self._run_ledger_store is None:
-            self._run_ledger_store = RunLedgerStore(self.ap.persistence_mgr.get_db_engine())
-        return self._run_ledger_store
-
-    @staticmethod
-    def _to_plain_dict(value: typing.Any) -> dict[str, typing.Any]:
-        if hasattr(value, 'model_dump'):
-            value = value.model_dump(mode='json')
-        if isinstance(value, dict):
-            return dict(value)
-        return {}
-
-    @classmethod
-    def _sanitize_content_item(cls, value: typing.Any) -> typing.Any:
-        item = cls._to_plain_dict(value)
-        if not item:
-            return value
-        item_type = item.get('type')
-        if item_type == 'image_base64' and item.get('image_base64'):
-            item['image_base64'] = None
-            item['content_redacted'] = True
-        elif item_type == 'file_base64' and item.get('file_base64'):
-            item['file_base64'] = None
-            item['content_redacted'] = True
-        return item
-
-    @classmethod
-    def _sanitize_attachment_ref(cls, value: typing.Any) -> dict[str, typing.Any]:
-        item = cls._to_plain_dict(value)
-        if item.get('content'):
-            item['content'] = None
-            item['content_redacted'] = True
-        return item
-
-    @classmethod
-    def _sanitize_contents(cls, contents: typing.Iterable[typing.Any]) -> list[typing.Any]:
-        return [cls._sanitize_content_item(content) for content in contents]
-
-    @classmethod
-    def _sanitize_attachments(cls, attachments: typing.Iterable[typing.Any]) -> list[dict[str, typing.Any]]:
-        return [cls._sanitize_attachment_ref(attachment) for attachment in attachments]
-
-    async def create_run(
-        self,
-        *,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-        context: dict[str, typing.Any],
-        authorization: dict[str, typing.Any],
-    ) -> dict[str, typing.Any]:
-        """Create the Host-owned run ledger record."""
-        runtime = context.get('runtime') if isinstance(context, dict) else {}
-        return await self._get_run_ledger_store().create_run(
-            run_id=context['run_id'],
-            event_id=event.event_id,
-            binding_id=binding.binding_id,
-            runner_id=descriptor.id,
-            conversation_id=event.conversation_id,
-            thread_id=event.thread_id,
-            workspace_id=event.workspace_id,
-            bot_id=event.bot_id,
-            deadline_at=runtime.get('deadline_at') if isinstance(runtime, dict) else None,
-            authorization=authorization,
-            metadata={
-                'event_type': event.event_type,
-                'source': event.source,
-            },
-        )
-
-    async def append_run_result(
-        self,
-        *,
-        result_dict: dict[str, typing.Any],
-        run_id: str,
-        sequence: int,
-        source: str = 'runner',
-        metadata: dict[str, typing.Any] | None = None,
-    ) -> dict[str, typing.Any]:
-        """Persist one AgentRunResult in the run ledger."""
-        usage = result_dict.get('usage')
-        if hasattr(usage, 'model_dump'):
-            usage = usage.model_dump(mode='json')
-        return await self._get_run_ledger_store().append_event(
-            run_id=run_id,
-            sequence=sequence,
-            event_type=str(result_dict.get('type') or 'unknown'),
-            data=result_dict.get('data') if isinstance(result_dict.get('data'), dict) else {},
-            usage=usage if isinstance(usage, dict) else None,
-            source=source,
-            metadata=metadata,
-        )
-
-    async def finalize_run(
-        self,
-        *,
-        run_id: str,
-        status: str,
-        status_reason: str | None = None,
-        usage: dict[str, typing.Any] | None = None,
-        metadata: dict[str, typing.Any] | None = None,
-    ) -> dict[str, typing.Any] | None:
-        """Finalize or update the Host-owned run ledger record."""
-        return await self._get_run_ledger_store().finalize_run(
-            run_id=run_id,
-            status=status,
-            status_reason=status_reason,
-            usage=usage,
-            metadata=metadata,
-        )
-
-    async def get_run(self, run_id: str) -> dict[str, typing.Any] | None:
-        """Return the persisted run ledger record."""
-        return await self._get_run_ledger_store().get_run(run_id)
-
-    async def handle_state_updated_event(
-        self,
-        result_dict: dict[str, typing.Any],
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        descriptor: AgentRunnerDescriptor,
-        run_id: str | None = None,
-    ) -> None:
-        """Handle state.updated result in event-first mode."""
-        data = result_dict.get('data', {})
-
-        result_run_id = result_dict.get('run_id')
-        if run_id and result_run_id and result_run_id != run_id:
-            raise RunnerProtocolError(
-                descriptor.id,
-                f'state.updated run_id mismatch: expected {run_id}, got {result_run_id}',
-            )
-
-        scope = data.get('scope')
-        if not scope:
-            raise RunnerProtocolError(
-                descriptor.id,
-                'state.updated missing required field: scope',
-            )
-
-        key = data.get('key')
-        value = data.get('value')
-
-        if not key:
-            raise RunnerProtocolError(
-                descriptor.id,
-                'state.updated missing required field: key',
-            )
-
-        if self._persistent_state_store is None:
-            self._persistent_state_store = get_persistent_state_store(self.ap.persistence_mgr.get_db_engine())
-
-        success, error = await self._persistent_state_store.apply_update_from_event(
-            event=event,
-            binding=binding,
-            descriptor=descriptor,
-            scope=scope,
-            key=key,
-            value=value,
-            logger=self.ap.logger,
-        )
-
-        if success:
-            self.ap.logger.debug(f'Runner {descriptor.id} state.updated (event mode): scope={scope}, key={key}')
-        elif error:
-            self.ap.logger.warning(f'Runner {descriptor.id} state.updated rejected: {error}')
-
-    async def write_event_log(
-        self,
-        event: AgentEventEnvelope,
-        binding: AgentBinding,
-        run_id: str,
-        runner_id: str,
-        metadata: dict[str, typing.Any] | None = None,
-    ) -> str:
-        """Write incoming event to EventLog."""
-        import datetime
-
-        from .event_log_store import EventLogStore
-
-        store = EventLogStore(self.ap.persistence_mgr.get_db_engine())
-
-        input_summary = None
-        input_json = None
-        if event.input:
-            if event.input.text:
-                input_summary = event.input.text[:1000]
-            input_json = {
-                'text': event.input.text,
-                'contents': self._sanitize_contents(event.input.contents),
-                'attachments': self._sanitize_attachments(event.input.attachments),
-            }
-
-        return await store.append_event(
-            event_id=event.event_id,
-            event_type=event.event_type,
-            source=event.source,
-            bot_id=event.bot_id,
-            workspace_id=event.workspace_id,
-            conversation_id=event.conversation_id,
-            thread_id=event.thread_id,
-            actor_type=event.actor.actor_type if event.actor else None,
-            actor_id=event.actor.actor_id if event.actor else None,
-            actor_name=event.actor.actor_name if event.actor else None,
-            subject_type=event.subject.subject_type if event.subject else None,
-            subject_id=event.subject.subject_id if event.subject else None,
-            input_summary=input_summary,
-            input_json=input_json,
-            run_id=run_id,
-            runner_id=runner_id,
-            event_time=(
-                datetime.datetime.fromtimestamp(event.event_time, datetime.timezone.utc) if event.event_time else None
-            ),
-            metadata=metadata,
-        )
-
-    async def write_user_transcript(
-        self,
-        event: AgentEventEnvelope,
-        event_log_id: str,
-    ) -> None:
-        """Write user message to Transcript."""
-        from .transcript_store import TranscriptStore
-
-        store = TranscriptStore(self.ap.persistence_mgr.get_db_engine())
-
-        content = event.input.text if event.input else None
-        content_json = None
-        if event.input:
-            content_json = {
-                'role': 'user',
-                'content': self._sanitize_contents(event.input.contents) if event.input.contents else [],
-            }
-
-        attachment_refs = []
-        if event.input and event.input.attachments:
-            for a in event.input.attachments:
-                attachment_refs.append(self._sanitize_attachment_ref(a))
-
-        await store.append_transcript(
-            transcript_id=None,
-            event_id=event_log_id,
-            conversation_id=event.conversation_id,
-            role='user',
-            bot_id=event.bot_id,
-            workspace_id=event.workspace_id,
-            content=content,
-            content_json=content_json,
-            attachment_refs=attachment_refs if attachment_refs else None,
-            thread_id=event.thread_id,
-            item_type='message',
-            metadata={
-                'actor_type': event.actor.actor_type if event.actor else None,
-                'actor_id': event.actor.actor_id if event.actor else None,
-            },
-        )
-
-    async def write_steering_dropped_audits(
-        self,
-        items: list[dict[str, typing.Any]],
-        run_id: str,
-        runner_id: str,
-        *,
-        reason: str = 'run_ended',
-    ) -> None:
-        """Write terminal audit events for steering items left unconsumed."""
-        if not items:
-            return
-
-        import datetime
-        import uuid
-
-        from .event_log_store import EventLogStore
-
-        store = EventLogStore(self.ap.persistence_mgr.get_db_engine())
-
-        for item in items:
-            event = item.get('event') if isinstance(item.get('event'), dict) else {}
-            input_data = item.get('input') if isinstance(item.get('input'), dict) else {}
-            conversation = item.get('conversation') if isinstance(item.get('conversation'), dict) else {}
-            actor = item.get('actor') if isinstance(item.get('actor'), dict) else {}
-            subject = item.get('subject') if isinstance(item.get('subject'), dict) else {}
-
-            text = input_data.get('text')
-            input_summary = text[:1000] if isinstance(text, str) and text else 'Unconsumed steering input dropped'
-            event_time = None
-            raw_event_time = event.get('event_time')
-            if raw_event_time:
-                try:
-                    event_time = datetime.datetime.fromtimestamp(
-                        raw_event_time,
-                        datetime.timezone.utc,
-                    )
-                except (TypeError, ValueError, OSError):
-                    event_time = None
-
-            await store.append_event(
-                event_id=str(uuid.uuid4()),
-                event_type='steering.dropped',
-                source='host',
-                bot_id=conversation.get('bot_id'),
-                workspace_id=conversation.get('workspace_id'),
-                conversation_id=conversation.get('conversation_id'),
-                thread_id=conversation.get('thread_id'),
-                actor_type=actor.get('actor_type'),
-                actor_id=actor.get('actor_id'),
-                actor_name=actor.get('actor_name'),
-                subject_type=subject.get('subject_type'),
-                subject_id=subject.get('subject_id'),
-                input_summary=input_summary,
-                input_json={
-                    'text': text,
-                    'contents': self._sanitize_contents(input_data.get('contents') or []),
-                    'attachments': self._sanitize_attachments(input_data.get('attachments') or []),
-                },
-                run_id=run_id,
-                runner_id=runner_id,
-                event_time=event_time,
-                metadata={
-                    'steering': {
-                        'status': 'dropped',
-                        'reason': reason,
-                        'original_event_id': event.get('event_id'),
-                        'claimed_run_id': item.get('claimed_run_id'),
-                        'claimed_runner_id': item.get('runner_id'),
-                        'claimed_at': item.get('claimed_at'),
-                    },
-                },
-            )
-
-    async def write_assistant_transcript(
-        self,
-        result_dict: dict[str, typing.Any],
-        event: AgentEventEnvelope,
-        run_id: str,
-        runner_id: str,
-    ) -> None:
-        """Write assistant message to Transcript."""
-        import uuid
-
-        from .transcript_store import TranscriptStore
-
-        store = TranscriptStore(self.ap.persistence_mgr.get_db_engine())
-
-        data = result_dict.get('data', {})
-        message = data.get('message', {})
-
-        content = None
-        content_json = None
-
-        if isinstance(message.get('content'), str):
-            content = message['content']
-            content_json = message
-        elif isinstance(message.get('content'), list):
-            text_parts = []
-            for c in message['content']:
-                if isinstance(c, dict) and c.get('type') == 'text':
-                    text_parts.append(c.get('text', ''))
-            content = ' '.join(text_parts) if text_parts else None
-            content_json = {
-                **message,
-                'content': self._sanitize_contents(message['content']),
-            }
-
-        assistant_event_id = str(uuid.uuid4())
-
-        await store.append_transcript(
-            transcript_id=str(uuid.uuid4()),
-            event_id=assistant_event_id,
-            conversation_id=event.conversation_id,
-            role='assistant',
-            bot_id=event.bot_id,
-            workspace_id=event.workspace_id,
-            content=content,
-            content_json=content_json,
-            thread_id=event.thread_id,
-            item_type='message',
-            run_id=run_id,
-            runner_id=runner_id,
-            metadata={
-                'run_id': run_id,
-                'runner_id': runner_id,
-            },
-        )
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
huanghuoguoguo	9d877b41c2	test(skills): clarify manual QA perf gates	2026-06-25 20:46:31 +08:00
huanghuoguoguo	9b0f5b36f3	test(skills): add debug chat timing and isolation probes	2026-06-25 13:34:30 +08:00
huanghuoguoguo	7e36869494	test(skills): extend fake provider load profiles	2026-06-25 12:54:08 +08:00
huanghuoguoguo	d59b49ec55	test(skills): add debug chat load gate	2026-06-25 11:48:59 +08:00
huanghuoguoguo	8749a9b56f	test(skills): prepare user path performance gate	2026-06-25 10:07:04 +08:00
huanghuoguoguo	67437c2f5a	Add performance and reliability QA gates	2026-06-25 00:07:37 +08:00
RockChinQ	74a18191dd	docs(readme): default docker compose command starts the sandbox The plain `docker compose up -d` leaves the Box sandbox runtime off (it's gated behind the box/all profile), so sandbox tools, skill add/edit and stdio MCP don't work out of the box. Use `docker compose --profile all up -d` across all 9 README translations so the default quick-start brings up the sandbox-capable stack.	2026-06-21 13:18:44 -04:00
RockChinQ	a15c98eb06	fix(web): point plugin help links to working docs URL The in-product plugin/add-extension help links went through link.langbot.app/{lang}/docs/plugins, which now 404s (it resolved to the removed /usage/plugin/plugin-intro path). Point them directly at the current docs page docs.langbot.app/{lang}/plugin/plugin-intro (verified 200 for zh/en/ja).	2026-06-21 12:59:13 -04:00
RockChinQ	cbe17cde6c	fix(web): provider card overflow on mobile via grid/flex min-width floor The previous truncate/shrink-0 pass only touched leaf nodes, but the min-content floor was set by two ancestors: the flex-1 left group lacked min-w-0, and CardHeader is a CSS grid whose implicit single column defaults to min-content. Constrain both (min-w-0 on the header grid + explicit grid-cols-[minmax(0,1fr)], min-w-0 on the inner flex groups) so the provider name / base_url+key subtitle actually truncate instead of forcing the card — and the whole settings modal — wider than the viewport.	2026-06-21 12:54:24 -04:00
RockChinQ	876e8bf804	fix(web): mobile overflow in settings panels - PanelToolbar: allow wrapping and tighten padding on small screens so the primary action (e.g. "创建 API 密钥") no longer runs off the dialog edge. - ProviderCard header: let the provider name truncate and pin the model-count badge and right-side action group with shrink-0 so credits / + controls stay inside the card on narrow viewports.	2026-06-21 12:48:18 -04:00
RockChinQ	b3848c9d05	feat(web): make tooltips tap-toggleable on touch devices Radix tooltips open on hover/focus only and stay closed on touch input, so on mobile every hover tooltip was unreachable. Detect coarse/no-hover pointers via matchMedia and drive the tooltip's open state ourselves so a tap on the trigger toggles it. Desktop hover/focus behaviour is unchanged (we only intercept the tap when the device has no hover capability). Fixes all tooltips app-wide from the shared primitive.	2026-06-21 12:46:18 -04:00
RockChinQ	85743cc75f	fix(tests): make Postgres migration head test revision-agnostic The PostgreSQL migration test had the same hardcoded 0005 head assertion as the SQLite one; resolve the actual head from the Alembic ScriptDirectory so 0006 (and future migrations) don't break it.	2026-06-21 12:10:20 -04:00
RockChinQ	c689b10c0d	fix(mcp): ruff format remote-mode files; make migration head test revision-agnostic CI follow-up to the local/remote MCP work: - Apply ruff format to provider/tools/loaders/mcp.py and the 0006 normalize-remote-mode migration (Lint job failed on formatting). - test_migrations.py hardcoded the head revision as 0005_*, which broke once 0006 landed. Resolve the actual head from the Alembic ScriptDirectory so future migrations don't require editing the test.	2026-06-21 12:04:37 -04:00
RockChinQ	812b1fff4c	fix(web): stop spurious page refresh on account menu open; plugin log auto-refresh as switch Two unrelated frontend fixes: - LanguageSelector mounts each time the sidebar account dropdown opens and unconditionally called i18n.changeLanguage() on mount, emitting a languageChanged event even when the language was unchanged. That handed every useTranslation() consumer a fresh `t` reference, re-running effects keyed on `t` (e.g. the plugins page system-status fetch) and surfacing as a page "refresh". Guard the call so it only fires on an actual change. - Plugin logs auto-refresh control changed from a toggle Button to a Switch + Label; the on/off button i18n keys are replaced by a single static logsAutoRefresh label across all 8 locales.	2026-06-21 11:58:01 -04:00
RockChinQ	9daf22d661	feat(plugin-market): align recommendation carousel with Space (pause + countdown ring) Port the Space marketplace recommendation carousel UX into the in-app add-extension page: a 10s auto-advance driven by a smooth countdown ring that doubles as a pause/resume toggle, and manual prev/next now reset the countdown. Adds market.recommendation.{pause,resume} across 8 locales.	2026-06-21 11:48:39 -04:00
RockChinQ	42a2c70b14	style(plugin-market): widen marketplace cards via auto-fill min width Replace fixed grid-cols breakpoints (which forced up to 4 narrow cards on wide screens) with auto-fill columns and a 24rem minimum card width on both the main market grid and the featured recommendation rows. The featured rows already measure real column count via ResizeObserver, so pagination adapts automatically.	2026-06-21 11:21:52 -04:00
RockChinQ	64ed6d994b	feat(mcp): simplify external MCP server config to local/remote modes Replace the three-way transport choice (stdio / sse / httpstream) for connecting LangBot to external MCP servers with two modes: local (stdio) and remote. Remote servers only require a URL; the runtime auto-detects the transport (tries Streamable HTTP, falls back to SSE). - provider/tools/loaders/mcp.py: add _init_remote_server() with Streamable-HTTP-then-SSE probing; dispatch 'remote' lifecycle, keep legacy sse/http branches for back-compat - plugin/connector.py: normalize legacy http/sse marketplace modes to 'remote' on Space install, preserving connection params - entity/persistence/mcp.py: document mode as stdio, remote (legacy: sse, http) - alembic 0006: idempotent data migration mapping existing sse/http rows to remote (downgrade maps back to http) - api/http/service/mcp.py: stash runtime_info (status + tool list) into test task metadata before tearing down the temp session - web: collapse mode dropdown to local/remote, remote renders URL+timeout only, edit auto-maps legacy sse/http to remote; show tools after test in create mode from task metadata; remove dead plugins/mcp-server/ tree - i18n: local/remote labels + mode/url hints across 8 locales	2026-06-21 11:20:32 -04:00
RockChinQ	2ff854f79a	build(Dockerfile): install Node.js LTS so sandbox can run npx-based stdio MCP servers The final runtime image (used by langbot/plugin_runtime/box) shipped uv and docker-cli but no node, so any npx-launched stdio MCP server inside the box sandbox exited with return_code=127 (command not found). Install Node.js 22 LTS via NodeSource; node/npx land in /usr/bin, which is on the nsjail read-only mount whitelist (_READONLY_SYSTEM_MOUNTS) and is bound into the sandbox chroot automatically.	2026-06-21 08:15:02 -04:00
RockChinQ	52c096ea4c	chore(deps): patch Dependabot vulns (Python + JS) Python (pyproject.toml + uv.lock): - aiohttp 3.14.0 -> 3.14.1 (8 alerts: medium+low) - cryptography -> 49.0.0 (high, floor 48.0.1) - langchain -> 1.3.10 (medium, floor 1.3.9) - langsmith -> 0.8.18 (high) - starlette 1.2.1 -> 1.3.1 (high+low, transitive) - pydantic-settings 2.12.0 -> 2.14.2 (medium, transitive) - torch 2.10.0 -> 2.12.1 (low, transitive; py>=3.14 only) JS (web/, dual lockfile npm+pnpm in sync): - vite ^8.0.5 -> ^8.0.16 (high+medium) - js-yaml -> 4.2.0 (medium, override >=4.2.0 <5) - form-data -> 4.0.6 (high, override) Unfixable (no upstream patch, left + reported): - chromadb critical <=1.5.9 (1.5.9 is latest) - PyPDF2 medium (deprecated; needs pypdf migration) Verified: uv sync + import check, pnpm frozen-lockfile, vite build.	2026-06-21 07:43:54 -04:00
Junyan Chin	eda80030b5	Improve README_CN.md with clearer Star instructions Update instructions for starring and watching the repository.	2026-06-21 19:34:34 +08:00
RockChinQ	dfbd176e42	docs(readme): move Star & Watch CTA after Key Capabilities, host gif on langbot.app	2026-06-21 07:33:00 -04:00
RockChinQ	6ddd24ae68	docs(readme): restore Star & Watch CTA with star.gif across all locales	2026-06-21 07:25:46 -04:00
RockChinQ	a2cdbb621b	Merge branch 'fix/mobile-responsive-pages': mobile responsive fixes for dashboard, plugin detail & models dialog	2026-06-20 10:58:43 -04:00
RockChinQ	b92d54254b	fix(web): improve mobile responsiveness of dashboard, plugin detail & models dialog - monitoring: stack filters full-width, scrollable tab bar, reduce card/content padding on mobile - models dialog: provider form modal no longer overflows viewport on small screens; shared panel body padding shrinks on mobile - plugin logs: reduce horizontal padding on mobile	2026-06-20 10:42:35 -04:00