chore: commit workspace changes

2026-07-25 13:56:08 +00:00 · 2026-06-10 22:46:13 +08:00
parent d68e9b7c33
commit 14bcd90783
17 changed files with 483 additions and 231 deletions
@@ -2,7 +2,7 @@

 本文档描述插件化 AgentRunner 场景下的上下文边界**设计理由**。结论先行：LangBot 不应成为最终 agentic context manager；它提供 context substrate，AgentRunner 或其背后的 runtime 自己决定如何管理历史、压缩、召回和 KV cache。

-> 涉及的数据结构（`AgentRunContext`、`ContextAccess`、`AgentRunAPIProxy` 等）唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。本文只讲语义和约束，不重抄 schema。实现进度见 [PROGRESS.md](./PROGRESS.md)。
+> 涉及的数据结构（`AgentRunContext`、`ContextAccess`、`AgentRunAPIProxy` 等）唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。本文只讲语义和约束，不重抄 schema。

 ## 1. 设计原则

@@ -77,17 +77,9 @@ await api.history_page(conversation_id=ctx.context.conversation_id,
                       limit=50, direction="backward", include_artifacts=False)
 ```

-返回：
+返回 `HistoryPage`（schema 见 PROTOCOL_V1 §8）。

-```python
-class HistoryPage(BaseModel):
-    items: list[TranscriptItem]
-    next_cursor: str | None
-    prev_cursor: str | None
-    has_more: bool
-```
-
-约束：`limit` 有 host hard cap；默认只能读当前 conversation / thread；跨会话读取需 manifest permission + binding policy；返回 artifact ref，不默认返回大文件内容。
+约束：`limit` 有 host hard cap；默认只能读当前 conversation / thread；跨会话读取需 binding policy / run authorization snapshot 授权；返回 artifact ref，不默认返回大文件内容。

 ### 4.2 Search

@@ -102,7 +94,7 @@ Search 可先用数据库全文索引，后续接 embedding recall。它是 host
 ### 4.3 Event / Artifact / State

 - Event API（`events.get` / `events.page`）用于读取非消息事件、工具事件、系统事件。Agent 不应把所有事件都当成 user/assistant message。
- Artifact API（`artifacts.metadata` / `read_range` / `open_stream`）必须校验 artifact 所属 conversation / run / binding，校验 MIME / 大小 / 过期 / 权限，大文件按 range/stream 读取，工具大结果也应 artifact 化。
+- Artifact API（`artifact_metadata` / `artifact_read` / `artifact_read_range`）必须校验 artifact 所属 conversation / run / binding，校验 MIME / 大小 / 过期 / 权限，大文件按 range/file-key 读取，工具大结果也应 artifact 化。
 - State API（`state.get` / `set`）是可选寄宿能力。自管 runtime 可以完全不用；依附 LangBot 的官方 runner 可以使用，例如 `external.session_id`、`summary.checkpoint`。

 ### 4.4 大文件与工具协作
@@ -122,16 +114,15 @@ Claude Code、Codex、Kimi Code 这类 runtime 通常已有自己的 session、

 当前 Claude Code runner 使用 schema `langbot.agent_runner.external_harness_context.v1`（现状见 OFFICIAL_RUNNER_PLUGINS §7）。这类 projection 是"把 LangBot 事实源和授权资源句柄交给 harness"，不是"把 LangBot 资源本体或内部权限交给 harness"，也不是"由 LangBot 决定最终模型上下文"。

-## 5. Runner manifest 中的上下文声明
+## 5. Runner 上下文边界

-`AgentRunnerContextPolicy`（PROTOCOL_V1 §4.5）声明 runner 的上下文能力：`supports_history_pull` / `supports_history_search` / `supports_artifact_pull` / `owns_compaction` / `wants_static_context_refs`。它表示 Host 只给当前事件和 context handles；runner 自己决定是否拉取历史、是否搜索、何时摘要、如何构造最终 prompt。
+Host 只给当前事件、当前输入和 context handles。Runner 是否能拉取历史、事件、artifact、state 或 storage，以运行时 `ctx.context.available_apis` 为准；runner 自己决定是否拉取历史、是否搜索、何时摘要、如何构造最终 prompt。

 ## 6. KV cache 友好的上下文管理

 支持 Claude Code SDK、Codex、Pi Agent SDK 等 runtime 时，必须避免每轮由 LangBot 重组大块 prompt：

 - 稳定 session key：`workspace/bot/binding/runner/conversation/thread`。
- 静态内容使用 `ref + version/hash`（`ctx.runtime.static_refs`）：system prompt、resource manifest、tool schema、platform policy。
 - 每轮只传 delta：当前 event、artifact refs、少量 runtime metadata。
 - 历史 append-only：不要每轮改写同一段 history 文本。
 - Summary checkpoint 稳定：只有压缩发生时产生新 checkpoint。
@@ -240,13 +240,4 @@ Dify、n8n、Coze、DashScope、Langflow、Tbox 等外部服务 runner 不作为

 ## 10. 历史高价值记录

-历史报告已合并为本指南，不再保留单独文档。后续若需要追溯，优先查看 `langbot-skills/reports/` 下的原始执行报告。
-
-截至 2026-05-29，已有本地 smoke 证明：
-
- `local-agent` 可以通过 Pipeline Debug Chat 走插件化 `AgentRunOrchestrator` 主链路。
- Claude Code runner 可以通过同一条 `run(event, binding)` 路径执行。
- Claude Code runner 可以读取 LangBot event-first context，并通过 SDK-owned MCP bridge / skill-backed scoped tools 访问授权资源，随后写回 `external.session_id` / `external.working_directory`。
- Codex runner 可以通过同一条路径执行，并把 Codex `thread_id` 写回 host-owned state。
-
-这些记录只证明本地协议闭环可用，不代表发布级 security hardening 已完成。
+历史高价值记录与当前 runner 验收状态见 [STATUS.md](./STATUS.md)。本指南只保留可重复执行的测试步骤和证据要求。
@@ -1,6 +1,6 @@
 # Event Based Agent 接入设计

-> 本文记录 EBA 如何接入当前 AgentRunner Protocol v1 / Host 底座。EventGateway、EventRouter、Event subscription/notification 由外部 EBA 分支实现并联调；本分支只保留 event-first 入口和 envelope/binding models。实现进度见 [PROGRESS.md](./PROGRESS.md)。
+> 本文记录 EBA 如何接入当前 AgentRunner Protocol v1 / Host 底座。EventGateway、EventRouter、Event subscription/notification 由外部 EBA 分支实现并联调；本分支只保留 event-first 入口和 envelope/binding models。
 >
 > 数据结构唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)（runner 可见）与 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)（Host 内部模型）；本文只讲 EBA 语义，不重抄 schema。
 > 与当前 runner 外化分支、后续 Agent Platform / Runtime Control Plane 的边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
@@ -74,10 +74,10 @@ EBA 后 `action.requested`（PROTOCOL_V1 §7.3，当前仅 telemetry 不执行
 { "type": "action.requested",
  "data": { "action": "friend.request.accept",
            "target": {"platform": "wechat", "request_id": "..."},
-            "reason": "policy matched" } }
+            "payload": {"reason": "policy matched"} } }
 ```

-Host 必须校验：runner manifest 是否声明 `platform_api` capability、binding 是否授权该 action、actor / bot / workspace 是否允许、是否需要人工审批。EBA 还可能预留 `delivery.requested`（请求投递到某 surface）。
+Host 必须校验：binding / platform action policy 是否授权该 action、actor / bot / workspace 是否允许、是否需要人工审批，以及当前 run session / caller identity 是否匹配。EBA 还可能预留 `delivery.requested`（请求投递到某 surface）。

 Delivery 方面，event 不一定回复到当前聊天窗口：消息事件通常带 reply target；系统事件可能没有默认 reply target，需要 runner 返回 `action.requested` 或由 binding 的 delivery policy 决定投递位置（`DeliveryContext` 见 PROTOCOL_V1 §5.7）。

@@ -28,7 +28,7 @@
 | Scheduler / Automation | 不实现。文档中只把 `scheduler` 作为 future event source。 | EBA / Agent Platform。 | 定时任务触发 `schedule.triggered` host event，复用 EventGateway -> EventRouter -> `run(event, binding)`。 | 不直接调用某个 runner 插件；不绕过 EventLog / authorization。 |
 | Integration provider | 不实现。IM platform adapter 仍是当前平台接入系统。 | EBA / Agent Platform。 | OAuth/webhook/outbound provider 应先转成 canonical host event 或 platform action，再交给 AgentRunner。 | 不把 Linear/Slack/GitHub 等 provider 私有 payload 扩散到 runner 协议顶层。 |
 | Platform action / delivery | `action.requested` 已预留但当前仅 telemetry，不执行。`DeliveryContext` 只作为上下文/策略投影。 | EBA / platform action executor。 | 后续 executor 校验 runner capability、binding policy、actor/bot/workspace 权限和审批后执行。 | 不让 runner 直接调用平台 adapter 私有 API；不把平台动作伪装成文本回复副作用。 |
-| Runtime registry / worker / task queue | 不实现。当前 Claude Code / Codex 是本机 subprocess MVP path。 | Runtime Control Plane v2。 | Host 新增 runtime registry、heartbeat、task queue、daemon claim、progress/audit；runner 可选择 runtime-managed 执行模式。 | 不把 heartbeat/task/warm pool 放进 Protocol v1；不让管理插件拥有 runtime/task 事实源。 |
+| Runtime registry / worker / task queue | 不实现。当前 Claude Code / Codex 是本机 subprocess MVP path。 | Runtime Control Plane v2。 | 第一阶段先补 Host-owned `AgentRun` / `AgentRunEvent` / run control primitives；完整 runtime registry、heartbeat、task queue、daemon claim、progress/audit 是后续可选阶段。 | 不把 heartbeat/task/warm pool 放进 Protocol v1；不让管理插件拥有 runtime/task 事实源。 |
 | Warm pool / reconcile / diagnose | 不实现。 | Runtime Control Plane v2 / deployment layer。 | 作为 task/runtime 的运维能力，围绕 Host-owned runtime/task/audit 表实现。 | 不把 runtime 运维语义写进普通 runner 协议；不把 pod/task 细节泄漏给普通 runner。 |
 | Agent memory | 不实现通用长期记忆产品层；提供 history/state/storage/artifact 基础能力。 | Agent Platform 或具体 runner/plugin。 | 平台 memory 可通过 Host storage/state 或独立产品表实现，runner 通过授权 API 拉取。 | 不在 Host core 内置通用 agentic memory 策略；不默认把 memory 全量 inline 到 context。 |
 | External harness native session | 已支持 external session id / working directory state handoff 和 resource projection。 | 官方 runner 后续增强；Runtime Control Plane v2 可接管执行。 | 一次性 CLI runner 可继续走 `runner.run(ctx)`；长连接/daemon 模式按 external session key 串行 turn，reader 独占 native stream。 | 不把 Claude/Codex native wire 变成 LangBot 协议；全局锁边界见 PROTOCOL_V1 §13。 |
@@ -3,7 +3,7 @@
 本文档描述 LangBot 作为 agent host 的内部能力与分层架构，以及 Host 内部模型。

 - SDK ↔ Host 的协议数据结构（`AgentRunContext`、`AgentRunnerManifest`、`AgentRunResult`、`AgentRunAPIProxy` 等）的**唯一定义在** [PROTOCOL_V1.md](./PROTOCOL_V1.md)；本文只引用，不重抄。
- 实现进度见 [PROGRESS.md](./PROGRESS.md)。
+- 测试执行入口和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)；安全发布门槛见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
 - 本文定义的 Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、`AgentRunnerDescriptor`）不属于 SDK 协议字段。

 ## 1. 目标
@@ -137,13 +137,17 @@ class AgentRunnerDescriptor(BaseModel):
    source: Literal["plugin"]
    label: I18nObject
    description: I18nObject | None = None
+    plugin_author: str
+    plugin_name: str
+    runner_name: str
    capabilities: AgentRunnerCapabilities    # 见 PROTOCOL_V1 §4.3
    permissions: AgentRunnerPermissions      # 见 PROTOCOL_V1 §4.4
    config_schema: list[DynamicFormItemSchema]
-    plugin: PluginRef | None = None
+    plugin_version: str | None = None
+    raw_manifest: dict[str, Any] = {}
 ```

-职责：调用 `plugin_connector.list_agent_runners()` 拉取 runner、校验 manifest（`kind == AgentRunner`、`metadata.name/label` 存在、`spec.*` 类型正确）、输出 descriptor、缓存 discovery 结果并提供 `refresh()`。单个插件 manifest 失败只记 warning，不影响其它 runner。`plugin:author/name/runner` 是稳定 id 格式；插件实例边界见 PROTOCOL_V1 §13。
+职责：调用 `plugin_connector.list_agent_runners()` 拉取 runner、校验 typed `AgentRunnerManifest`、输出 descriptor、缓存 discovery 结果并提供 `refresh()`。单个插件 manifest 失败只记 warning，不影响其它 runner。`plugin:author/name/runner` 是稳定 id 格式；插件实例边界见 PROTOCOL_V1 §13。

 Host 内置 runner / adapter 不能作为 `AgentRunnerDescriptor.source` 绕过插件
 runtime、`run_id`、`ctx.resources` 和 `AgentRunAPIProxy` 权限链。若需要
@@ -191,22 +195,24 @@ QueryEntryAdapter / EventRouter

 `run_from_query()` 保留为 Query entry adapter 入口，但内部转换成 event + binding 后走统一 `run()`。约束：`ChatMessageHandler` 不解析 `plugin:*`、不实例化 wrapper、不知道 runner 组件细节；`PipelineService` 从 registry 读取 metadata，不直接访问插件 runtime；跨请求持久化状态必须走授权 storage / 外部服务。

-### 4.5 Resource Authorization（三层裁剪）
+### 4.5 Resource Authorization

-LangBot 在每次 run 前生成 `ctx.resources`（PROTOCOL_V1 §6），来自三层约束：
+LangBot 在每次 run 前生成 `ctx.resources`（PROTOCOL_V1 §6），来自 manifest permissions 与 binding policy 的交集：

-1. runner manifest 声明的 `permissions`（最大能力）。
+1. `descriptor.permissions` 声明 runner 需要的 LangBot 资源访问上限。
 2. binding / resource policy 允许的资源范围。
-3. 当前 event / actor / bot / workspace 的实际权限。
+3. Agent/runner config 中选择的模型、知识库、文件等资源。
+4. 当前 event / actor / bot / workspace 的实际权限。
+5. `ctx.context.available_apis` 暴露的 pull API 能力。

 这次裁剪结果必须冻结为 run-scoped authorization snapshot，并由
 `AgentRunSessionRegistry` 按 `run_id` 保存。`ctx.resources` 是投影给 runner
 看的同一份授权结果；运行期每个 proxy action 只依据该 snapshot 校验 active
 run session、caller plugin identity、resource id、scope、payload size、rate
-limit 和 deadline。Handler 不应重新执行三层裁剪，否则 build-time 与 runtime
+limit 和 deadline。Handler 不应重新执行授权裁剪，否则 build-time 与 runtime
 授权逻辑会漂移。

-SDK 侧本地校验只用于开发体验，host 侧 run authorization snapshot 才是安全边界。
+SDK 侧本地校验只用于开发体验，host 侧 run authorization snapshot 才是安全边界。`spec.capabilities` 只帮助 Host 判断 runner 是否需要 tool / knowledge / skill 等资源投影，不能替代 permissions 或 binding policy。

 资源裁剪应通用，不写死 local-agent。selector 与资源的映射示例：`model-fallback-selector` → primary/fallback LLM、`llm-model-selector` → LLM、`rerank-model-selector` → rerank 模型、`knowledge-base-multi-selector` → 知识库；新增 selector 时在 resource builder 中统一扩展。

@@ -238,9 +244,6 @@ SDK 组件入口如下；所有数据结构定义见 PROTOCOL_V1。
 class AgentRunner(BaseComponent):
    __kind__ = "AgentRunner"

-    @classmethod
-    def get_capabilities(cls) -> AgentRunnerCapabilities: ...   # PROTOCOL_V1 §4.3
-
    @classmethod
    def get_config_schema(cls) -> list[dict]: ...

@@ -248,7 +251,7 @@ class AgentRunner(BaseComponent):
    # ctx: PROTOCOL_V1 §5.2 ; AgentRunResult: PROTOCOL_V1 §7
 ```

- Manifest / capabilities / permissions / context policy：PROTOCOL_V1 §4。
+- Manifest / capabilities / effective access：PROTOCOL_V1 §4。Capabilities 来自组件 manifest 的 `spec.capabilities`，不是 SDK 基类 classmethod。
 - `AgentRunContext`：PROTOCOL_V1 §5.2。`messages` / `bootstrap` 不是协议字段。
 - `AgentRunResult`：PROTOCOL_V1 §7。
 - `AgentRunAPIProxy`：PROTOCOL_V1 §8，是 runner 访问 host 能力的唯一入口，所有请求带 `run_id`。
@@ -1,6 +1,6 @@
 # 官方 AgentRunner 插件迁移计划

-本文档描述内置 `RequestRunner` 迁出 LangBot 后，官方 runner 插件如何组织、迁移和验收。它是 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) 和 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) 的下游落地计划，不是 LangBot 宿主协议的设计前提。验收状态见 [PROGRESS.md](./PROGRESS.md)，QA 入口见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。
+本文档描述内置 `RequestRunner` 迁出 LangBot 后，官方 runner 插件如何组织、迁移和验收。它是 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md) 和 [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) 的下游落地计划，不是 LangBot 宿主协议的设计前提。QA 入口和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。

 官方 `local-agent` 可以外移，也可以重写。设计重点不是保留旧内置 runner 的内部结构，而是验证一个依附 LangBot host 基础设施的官方 agent 能否完整工作。同时，LangBot host 协议必须服务 Claude Code SDK、Codex、Pi Agent SDK、外部 Agent 平台等自管 context/runtime 的 runner，不能被官方插件的实现细节绑死。

@@ -60,13 +60,6 @@ spec:
  config: []
  capabilities:        # 字段语义见 PROTOCOL_V1 §4.3
    streaming: true
-    event_context: true
-    stateful_session: true
-  permissions:         # 字段语义见 PROTOCOL_V1 §4.4
-    storage: ["plugin"]
-  context:             # 字段语义见 PROTOCOL_V1 §4.5
-    supports_history_pull: true
-    owns_compaction: true
 execution:
  python: { path: ./main.py, attr: DefaultAgentRunner }
 ```
@@ -83,7 +76,7 @@ execution:
 - 通过 `AgentRunAPIProxy.history` 拉取 transcript，而不是依赖 host 每轮强塞历史窗口。
 - `ctx.input.contents` 保留图片/文件等多模态内容；RAG 只替换/插入文本部分，不丢图片/文件。
 - 不能绕过 `ctx.resources` 调用未授权模型、工具或知识库。
- manifest 声明自管上下文能力（`context.supports_history_pull/search`、`owns_compaction` 等）。
+- manifest 只声明功能能力和配置表单；资源授权来自 binding resource policy、runner config、`ctx.context.available_apis` 和 Host run session snapshot。

 ### 5.1 Native Execution / Skills 后续接入

@@ -114,7 +107,7 @@ Claude Code、Codex、Kimi Code 这类 runner 不一定通过 LangBot 的模型/

 ## 7. Claude Code / Codex runner 当前形态

-`claude-code-agent` 与 `codex-agent` 是最小可运行 MVP / dev path，用来证明外部 harness runner 可以接入同一套 AgentRunner 协议。本地 smoke 验收记录见 [PROGRESS.md](./PROGRESS.md) 与 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。
+`claude-code-agent` 与 `codex-agent` 是最小可运行 MVP / dev path，用来证明外部 harness runner 可以接入同一套 AgentRunner 协议。本地 smoke 验收入口与记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。

 MVP 含义：已验证 event-first context、resource projection、result stream 和
 基础 resume state 可以跑通；不表示 Docker 生产部署、发布级执行隔离、
@@ -2,7 +2,7 @@

 本文档是 LangBot Host 与插件 SDK / Runtime / AgentRunner 之间协议合同的**唯一规范来源（single source of truth）**。

- 本文件描述"稳定接口应是什么"，是 normative spec，不混入实现进度。实现状态见 [PROGRESS.md](./PROGRESS.md)。
+- 本文件描述当前 Protocol v1 稳定合同，不混入验收流水。测试执行入口和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)，安全发布门槛见 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
 - 本文件之外的任何文档**不得重新定义这里的数据结构**，只能引用，例如"见 PROTOCOL_V1 §4.2"。
 - Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、Descriptor、各 Store）不属于 SDK 协议，定义在 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)。

@@ -57,27 +57,40 @@ Host 调用 Plugin Runtime 获取当前插件暴露的 runner 列表，请求无

 ```python
 class ListAgentRunnersResponse(BaseModel):
-    runners: list[AgentRunnerManifest]
+    runners: list[AgentRunnerDiscovery]
+
+class AgentRunnerDiscovery(BaseModel):
+    plugin_author: str
+    plugin_name: str
+    runner_name: str
+    runner_description: I18nObject | None = None
+    manifest: AgentRunnerManifest
+    config: list[DynamicFormItemSchema] = []
 ```

+`manifest` 是 SDK typed `AgentRunnerManifest`，由 Runtime 从插件组件 manifest 解析并校验后返回。`plugin_author` / `plugin_name` / `runner_name` 保留为 transport 寻址字段；Host 以它们生成稳定 runner id，并把 `manifest.id` 校验为 `plugin:author/name/runner`。单个 runner manifest 解析失败时 Runtime/Host 记录 warning 并跳过该 runner，不影响同一插件或其它插件的 runner discovery。
+
 ### 4.2 AgentRunnerManifest

+这里的 manifest 指 Runtime 返回给 Host 的 typed runner manifest：
+
 ```python
 class AgentRunnerManifest(BaseModel):
    id: str
    name: str
    label: I18nObject
    description: I18nObject | None = None
-    capabilities: AgentRunnerCapabilities
-    permissions: AgentRunnerPermissions
-    context: AgentRunnerContextPolicy
+    capabilities: AgentRunnerCapabilities = AgentRunnerCapabilities()
+    permissions: AgentRunnerPermissions = AgentRunnerPermissions()
    config_schema: list[DynamicFormItemSchema] = []
    metadata: dict[str, Any] = {}
 ```

- `id` 必须稳定，格式 `plugin:author/name/runner`。
+- runner id 由 Host 生成，格式 `plugin:author/name/runner`。
 - `name` 是插件内 runner 名称，例如 `default`。
 - `config_schema` 只描述绑定配置表单，不代表插件实例状态。
+- `capabilities` 是 Host 用于 UI 和资源投影的 typed bool model；它不是权限授予。
+- `permissions` 是 runner 申请的 LangBot 资源访问上限；实际授权仍必须与 binding policy 求交。
 - `metadata` 只放展示、诊断、非稳定扩展信息。

 ### 4.3 Capabilities
@@ -89,29 +102,21 @@ class AgentRunnerCapabilities(BaseModel):
    knowledge_retrieval: bool = False
    multimodal_input: bool = False
    skill_authoring: bool = False
-    event_context: bool = True
-    platform_api: bool = False
    interrupt: bool = False
-    stateful_session: bool = False
-    self_managed_context: bool = True
-```

-语义：
+    model_config = ConfigDict(extra="forbid")
+```

 - `streaming`: runner 可以返回 `message.delta`。
 - `tool_calling`: runner 可能调用 Host tool API。
 - `knowledge_retrieval`: runner 可能调用 Host knowledge API。
 - `multimodal_input`: runner 可以处理非纯文本 input / artifact。
 - `skill_authoring`: runner 需要 Host 提供 skill facts 以及 skill authoring tools，例如 `activate` / `register_skill`。
- `event_context`: runner 理解 event-first 输入。
- `platform_api`: runner 可能请求平台动作。
 - `interrupt`: runner 支持取消或中断。
- `stateful_session`: runner 可能维护跨 run 会话状态。
- `self_managed_context`: runner 自己管理 working context，Host 不应默认 inline 历史。

-> Capabilities 字段全部是 `bool`。runner 是否寄宿 host-owned state **不在 capabilities 表达**，而通过 `permissions.storage` 声明（见 §4.4），避免出现非 bool 取值。
+Capabilities 字段全部是 `bool`，未知 key 禁止进入 typed manifest。`event_context`、`stateful_session`、`self_managed_context` 等早期草案 key 不属于 Protocol v1 capabilities；对应语义由 event-first context 和 runner-owned context 原则表达。

-### 4.4 Permissions
+### 4.4 Permissions 与 Effective Access

 ```python
 class AgentRunnerPermissions(BaseModel):
@@ -121,25 +126,30 @@ class AgentRunnerPermissions(BaseModel):
    history: list[Literal["page", "search"]] = []
    events: list[Literal["get", "page"]] = []
    artifacts: list[Literal["metadata", "read"]] = []
-    storage: list[Literal["plugin", "workspace", "binding"]] = []
+    storage: list[Literal["plugin", "workspace"]] = []
    files: list[Literal["config", "knowledge"]] = []
-    platform_api: list[str] = []
+
+    model_config = ConfigDict(extra="forbid")
 ```

-Manifest permissions 是 runner 需要的**最大能力**。实际可用资源还要经过 Host binding policy 和当前 run scope 裁剪（三层裁剪见 HOST_SDK §4.5）。
+`platform_api` 不属于当前 permissions。Platform action executor / EBA action 分支落地前，runner 只能返回 `action.requested` telemetry，Host 不执行平台动作。

-### 4.5 Context Policy
+Runner 实际可用 LangBot 资源来自 Host 在 run 前冻结的授权快照：

-```python
-class AgentRunnerContextPolicy(BaseModel):
-    supports_history_pull: bool = True
-    supports_history_search: bool = False
-    supports_artifact_pull: bool = True
-    owns_compaction: bool = True
-    wants_static_context_refs: bool = True
+```text
+effective_access = manifest.permissions ∩ binding.resource_policy ∩ current scope/config
 ```

-Host 不使用该声明给 runner inline 历史窗口。默认原则：
+具体落地：
+
+1. `AgentResourceBuilder` 先用 manifest permissions 与 binding resource policy / runner config 求交，生成 `ctx.resources`。
+2. `AgentContextBuilder` 用 manifest permissions 与 binding state/storage policy 求交，生成 `ctx.context.available_apis`。
+3. `AgentRunSessionRegistry` 冻结 run-scoped resources 与 available APIs。
+4. Runtime handler / `AgentRunAPIProxy` 按 active `run_id`、runner identity、caller plugin identity、resource id、scope、payload size、rate limit 和 deadline 校验每次调用。
+
+反承诺：manifest permissions **只约束 LangBot 持有的资源访问**。它不承诺限制外部 harness 的 native shell、文件系统、CLI、MCP、网络或本机权限；这些能力由 operator/runtime/sandbox 另行约束，见 HOST_SDK §4.8 与 SECURITY_HARDENING。
+
+默认原则：

 - Host 不得默认 inline 全量历史。
 - Host 只 inline 当前 event / input 和 context handles。
@@ -263,12 +273,11 @@ class AgentInput(BaseModel):
    text: str | None = None
    contents: list[ContentElement] = []
    attachments: list[ArtifactRef] = []
-    message_chain: dict[str, Any] | None = None
 ```

 - 文本、多模态、附件都属于当前 event input。
 - 大文件、图片、音频、工具大结果应以 artifact ref 传递。
- `message_chain` 是平台兼容字段，不应成为长期稳定依赖。
+- 平台原始消息链不属于 SDK `AgentInput`；需要诊断时放在 Host 内部 envelope 或 `ctx.adapter.extra` 的一次性兼容字段中，不作为长期 runner 合同。

 ### 5.7 DeliveryContext

@@ -322,18 +331,12 @@ class ContextAPICapabilities(BaseModel):

 ```python
 class AgentRuntimeContext(BaseModel):
-    host: str = "langbot"
    langbot_version: str | None = None
    trace_id: str
    deadline_at: float | None = None
-    locale: str | None = None
-    timezone: str | None = None
-    static_refs: dict[str, StaticContextRef] = {}
    metadata: dict[str, Any] = {}
 ```

-`static_refs` 用于 KV cache 友好的静态上下文引用（system policy、tool schema、resource manifest 的 hash/version）。理由见 AGENT_CONTEXT_PROTOCOL §6。
-
 ### 5.10 AgentRunState

 ```python
@@ -366,7 +369,7 @@ class AgentResources(BaseModel):

 `skills` 只包含本次 run 中 pipeline-visible 的 skill facts，例如 `skill_name`、`display_name` 和 `description`。Host 不把这些 facts 追加到 system prompt，也不把它们编排进工具描述；runner 可以自行决定是否放入 model prompt、转换成 MCP surface，或只在自己的策略层使用。

-资源列表是本次 run 的授权结果。History / Event / Artifact 访问通过 permissions、`ctx.context.available_apis` 和 Host 侧 run session 校验控制，不作为可枚举 resource list 暴露。Runner 只能通过 `AgentRunAPIProxy` 访问这些能力。
+资源列表是本次 run 的授权结果。History / Event / Artifact 访问通过 `ctx.context.available_apis` 和 Host 侧 run session 校验控制，不作为可枚举 resource list 暴露。Runner 只能通过 `AgentRunAPIProxy` 访问这些能力。

 ## 7. Result Stream

@@ -387,123 +390,37 @@ ResultType = Literal[
    "run.failed",
 ]

-class AgentRunResultBase(BaseModel):
+class AgentRunResult(BaseModel):
    run_id: str
+    type: AgentRunResultType
+    data: dict[str, Any] = {}
    sequence: int | None = None
    timestamp: int | None = None
-    metadata: dict[str, Any] = {}
 ```

-`AgentRunResult` 是以下 typed result 的 discriminated union。Host 必须按 `type` 校验对应 `data` 结构；未知 `type` 按 §3 版本演进规则忽略并记录 warning。
+SDK 当前实现是单一 envelope：`type` 枚举 + `data` dict。Payload 由 SDK typed model 构造并 dump，但 wire 不改成 discriminated union；这样新旧版本偏斜时 Host 仍可按 §3 忽略未知 `type`。
+
+Host 边界分级校验：
+
+- `message.delta`、`message.completed`、`artifact.created`、`state.updated`、`action.requested`、`run.completed`、`run.failed` 属于会影响投递或 Host 副作用的严格 payload；校验失败时丢弃该 result 并记录 warning。
+- `tool.call.started`、`tool.call.completed` 当前只作为 telemetry，payload 宽松兼容。
+- 未知 `type` 忽略并记录 warning。

 ### 7.2 稳定 result payloads

-```python
-class AssistantMessageChunk(BaseModel):
-    role: Literal["assistant"] = "assistant"
-    content: str | None = None
-    contents: list[ContentElement] = []
-    metadata: dict[str, Any] = {}
+| type | `data` payload |
+| --- | --- |
+| `message.delta` | `{ "chunk": MessageChunk }` |
+| `message.completed` | `{ "message": Message }` |
+| `tool.call.started` | `{ "tool_call_id": str, "tool_name": str, "parameters": dict }` |
+| `tool.call.completed` | `{ "tool_call_id": str, "tool_name": str, "result": dict \| None, "error": str \| None }` |
+| `artifact.created` | `{ "artifact_type": str, "artifact_id"?: str, "mime_type"?: str, "name"?: str, "size_bytes"?: int, "sha256"?: str, "metadata"?: dict, "content_base64"?: str }` |
+| `state.updated` | `{ "scope": "conversation" \| "actor" \| "subject" \| "runner", "key": str, "value": JSONValue }` |
+| `action.requested` | `{ "action": str, "target": dict \| None, "payload": dict \| None }` |
+| `run.completed` | `{ "finish_reason": str, "message"?: Message }` |
+| `run.failed` | `{ "code": str, "error": str, "retryable": bool }` |

-class AssistantMessage(BaseModel):
-    role: Literal["assistant"] = "assistant"
-    content: str | None = None
-    contents: list[ContentElement] = []
-    artifacts: list[ArtifactRef] = []
-    metadata: dict[str, Any] = {}
-
-class MessageDeltaData(BaseModel):
-    chunk: AssistantMessageChunk
-
-class MessageCompletedData(BaseModel):
-    message: AssistantMessage
-
-class ToolCallStartedData(BaseModel):
-    tool_call_id: str
-    tool_name: str
-    parameters: dict[str, Any] = {}
-
-class ToolCallCompletedData(BaseModel):
-    tool_call_id: str
-    tool_name: str
-    result_preview: dict[str, Any] | None = None
-    error_code: str | None = None
-    error_message: str | None = None
-
-class ArtifactCreatedData(BaseModel):
-    artifact: ArtifactRef
-
-class StateUpdatedData(BaseModel):
-    scope: Literal["conversation", "actor", "subject", "runner", "binding", "workspace"]
-    key: str
-    value: JSONValue
-
-class ActionRequestedData(BaseModel):
-    action: str
-    target: dict[str, Any]
-    payload: dict[str, Any] = {}
-    idempotency_key: str | None = None
-    approval_hint: str | None = None
-
-class RunCompletedData(BaseModel):
-    finish_reason: str = "stop"
-    message: AssistantMessage | None = None
-    usage: dict[str, Any] = {}
-
-class RunFailedData(BaseModel):
-    code: str
-    message: str
-    retryable: bool = False
-    details: dict[str, Any] = {}
-
-class MessageDeltaResult(AgentRunResultBase):
-    type: Literal["message.delta"]
-    data: MessageDeltaData
-
-class MessageCompletedResult(AgentRunResultBase):
-    type: Literal["message.completed"]
-    data: MessageCompletedData
-
-class ToolCallStartedResult(AgentRunResultBase):
-    type: Literal["tool.call.started"]
-    data: ToolCallStartedData
-
-class ToolCallCompletedResult(AgentRunResultBase):
-    type: Literal["tool.call.completed"]
-    data: ToolCallCompletedData
-
-class ArtifactCreatedResult(AgentRunResultBase):
-    type: Literal["artifact.created"]
-    data: ArtifactCreatedData
-
-class StateUpdatedResult(AgentRunResultBase):
-    type: Literal["state.updated"]
-    data: StateUpdatedData
-
-class ActionRequestedResult(AgentRunResultBase):
-    type: Literal["action.requested"]
-    data: ActionRequestedData
-
-class RunCompletedResult(AgentRunResultBase):
-    type: Literal["run.completed"]
-    data: RunCompletedData
-
-class RunFailedResult(AgentRunResultBase):
-    type: Literal["run.failed"]
-    data: RunFailedData
-
-AgentRunResult = (
-    MessageDeltaResult
-    | MessageCompletedResult
-    | ToolCallStartedResult
-    | ToolCallCompletedResult
-    | ArtifactCreatedResult
-    | StateUpdatedResult
-    | ActionRequestedResult
-    | RunCompletedResult
-    | RunFailedResult
-)
-```
+`artifact.created.content_base64` 是小 artifact 的 inline 通道；Host 解码后写入 ArtifactStore，当前 hard cap 是 1 MiB。大 artifact 应使用外部存储 / file key / 后续上传通道，不应塞入 result event。

 ### 7.3 稳定 result types

@@ -521,14 +438,14 @@ AgentRunResult = (

 `action.requested` 是为 EBA 和 platform API 保留的协议表面：本分支 Host 收到后只记 telemetry，**不执行**，runner 作者不应在当前 Host 底座中依赖其副作用。真实执行器由外部 EBA / platform action 分支接入；执行模型见 EVENT_BASED_AGENT §6。

-Host 必须校验 `state.updated` 的 scope、key、value 大小和 JSON 可序列化性。`action.requested` 如果请求未来会产生外部副作用，runner 必须提供稳定 `idempotency_key`；本分支 Host 仍只记录 telemetry。
+Host 必须校验 `state.updated` 的 scope、key、value 大小和 JSON 可序列化性。本分支 `action.requested` 仍只记录 telemetry。

 ### 7.4 Stream delivery semantics

 - Host 按 Runtime stream 顺序消费 result。当前 v1 不定义跨连接 replay，也不承诺 at-least-once；从 Host 视角，收到的 result 最多应用一次。
 - `sequence` 是单个 `run_id` 内的结果序号。in-process / stdio 这类天然有序的在线 stream 可以省略；任何会缓冲、重放、跨进程队列或 runtime-managed task 的 transport 必须提供从 1 开始严格递增的 `sequence`。
 - Host 看到已提供 `sequence` 的 result 时，应按 `(run_id, sequence)` 做重复检测，并在缺号或乱序时记录 warning；除非 transport 明确声明 replay 语义，Host 不应自行等待缺失序号重排用户可见输出。
- `run.failed.data.retryable` 只表示整次 run 理论上可由上层重试；Protocol v1 不自动重试 run，也不自动重试 proxy action。任何未来自动重试的 side-effecting action 必须依赖 `idempotency_key` 或等价 Host-owned 去重键。
+- `run.failed.data.retryable` 只表示整次 run 理论上可由上层重试；Protocol v1 不自动重试 run，也不自动重试 proxy action。
 - History / Event / Transcript cursor 是 opaque token。runner 不得解析 cursor，也不得假设 cursor 在不同 API、conversation、thread 或 retention window 之间可比较；当前实现即使返回数字字符串，也只是实现细节。

 ### 7.5 示例
@@ -537,7 +454,7 @@ Host 必须校验 `state.updated` 的 scope、key、value 大小和 JSON 可序
 { "type": "message.delta",     "data": { "chunk": { "role": "assistant", "content": "hel" } } }
 { "type": "message.completed", "data": { "message": { "role": "assistant", "content": "hello" } } }
 { "type": "state.updated",     "data": { "scope": "conversation", "key": "external.session_id", "value": "abc" } }
-{ "type": "action.requested",  "data": { "action": "message.edit", "target": {"message_id": "..."}, "payload": {"text": "..."}, "idempotency_key": "run_1:edit:msg_1" } }
+{ "type": "action.requested",  "data": { "action": "message.edit", "target": {"message_id": "..."}, "payload": {"text": "..."} } }
 ```

 ## 8. AgentRunAPIProxy
@@ -569,18 +486,108 @@ await api.event_page(before_cursor=None, limit=50)

 # Artifact（必须支持大小限制、MIME 校验、过期时间和授权范围）
 await api.artifact_metadata(artifact_id)
+await api.artifact_read(artifact_id, offset=0, limit=None)
 await api.artifact_read_range(artifact_id, offset=0, length=65536)

 # State / Storage
 await api.state_get(scope, key);   await api.state_set(scope, key, value);   await api.state_delete(scope, key)
 await api.state_list(scope, prefix=None)
 await api.get_plugin_storage(key); await api.set_plugin_storage(key, value); await api.delete_plugin_storage(key)
+await api.get_plugin_storage_keys()
 await api.get_workspace_storage(key); await api.set_workspace_storage(key, value); await api.delete_workspace_storage(key)
+await api.get_workspace_storage_keys()
+
+# Files / Host info
+await api.get_file(file_key)
+await api.get_langbot_version()
 ```

-`state` 与 `storage` 的建议边界：`state` 放小型 JSON（conversation / actor / runner / binding），`storage` 放 blob 或较大数据（插件私有数据、workspace 数据、checkpoint）。
+`state` 与 `storage` 的建议边界：`state` 放小型 JSON（conversation / actor / subject / runner），`storage` 放 blob 或较大数据（插件私有数据、workspace 数据、checkpoint）。

-返回数据结构（如 `HistoryPage`、artifact metadata）见 AGENT_CONTEXT_PROTOCOL §4。
+Proxy 返回数据结构也属于本协议：
+
+```python
+class TranscriptItem(BaseModel):
+    transcript_id: str
+    event_id: str
+    conversation_id: str | None = None
+    thread_id: str | None = None
+    role: str
+    item_type: str = "message"
+    content: str | None = None
+    content_json: dict[str, Any] | None = None
+    artifact_refs: list[dict[str, Any]] = []
+    seq: int | None = None
+    cursor: str | None = None
+    created_at: int | None = None
+    metadata: dict[str, Any] = {}
+
+class HistoryPage(BaseModel):
+    items: list[TranscriptItem] = []
+    next_cursor: str | None = None
+    prev_cursor: str | None = None
+    has_more: bool = False
+    total_count: int | None = None
+
+class HistorySearchResult(BaseModel):
+    items: list[TranscriptItem] = []
+    total_count: int | None = None
+    query: str
+
+class AgentEventRecord(BaseModel):
+    event_id: str
+    event_type: str
+    event_time: int | None = None
+    source: str
+    bot_id: str | None = None
+    workspace_id: str | None = None
+    conversation_id: str | None = None
+    thread_id: str | None = None
+    actor_type: str | None = None
+    actor_id: str | None = None
+    actor_name: str | None = None
+    subject_type: str | None = None
+    subject_id: str | None = None
+    input_summary: str | None = None
+    input_ref: str | None = None
+    raw_ref: str | None = None
+    seq: int | None = None
+    cursor: str | None = None
+    created_at: int | None = None
+    metadata: dict[str, Any] = {}
+
+class EventPage(BaseModel):
+    items: list[AgentEventRecord] = []
+    next_cursor: str | None = None
+    prev_cursor: str | None = None
+    has_more: bool = False
+    total_count: int | None = None
+
+class ArtifactMetadata(BaseModel):
+    artifact_id: str
+    artifact_type: str
+    mime_type: str | None = None
+    name: str | None = None
+    size_bytes: int | None = None
+    sha256: str | None = None
+    source: str
+    conversation_id: str | None = None
+    run_id: str | None = None
+    runner_id: str | None = None
+    created_at: int | None = None
+    expires_at: int | None = None
+    metadata: dict[str, Any] = {}
+
+class ArtifactReadResult(BaseModel):
+    artifact_id: str
+    mime_type: str | None = None
+    size_bytes: int | None = None
+    offset: int = 0
+    length: int | None = None
+    content_base64: str | None = None
+    file_key: str | None = None
+    has_more: bool = False
+```

 ## 9. 错误模型

@@ -605,7 +612,7 @@ class AgentAPIError(BaseModel):
 Runner 失败使用 `run.failed`：

 ```json
-{ "type": "run.failed", "data": { "code": "runner.error", "message": "failed to call external agent", "retryable": false } }
+{ "type": "run.failed", "data": { "code": "runner.error", "error": "failed to call external agent", "retryable": false } }
 ```

 ## 10. Timeout 与 Cancellation
@@ -624,7 +631,7 @@ Protocol v1 的安全边界在 Host：
 - SDK 本地校验只提升开发体验，不能替代 Host 校验。
 - 所有 resource id 对 runner 来说都是 opaque。
 - 默认只能访问当前 conversation / thread 的 history；跨会话、workspace 级访问必须额外授权。
- 大 payload 必须 artifact 化。
+- 大 payload 必须 artifact 化；`artifact.created.content_base64` 只用于小 artifact，当前 Host hard cap 是 1 MiB。
 - Host 必须记录 run_id、runner_id、action、resource、scope、result。

 Host 不负责业务编排：不拼接全量历史、不替 runner 做 prompt assembly、不内置 agent memory / tool loop / 上下文压缩策略。这些由官方或第三方 AgentRunner 插件实现。
@@ -659,14 +666,12 @@ entry adapter 只是迁移桥。它负责：
 - `AgentRunnerDescriptor.source` 只允许 `plugin`；Host 内置 adapter 不能作为 runner source 绕过插件/runtime/proxy 权限链。
 - `ctx.resources` 与 proxy action 校验必须来自同一个 run authorization snapshot；runtime handler 不应重新执行资源裁剪。
 - v1 不要求 Agent、AgentRunner 插件实例或 runner id 全局串行。多个 bot / channel 可复用同一个 Agent；并发隔离依赖 `run_id`、binding、conversation / thread scope 和 Host authorization snapshot。
- 对 `stateful_session` runner，若外部 runtime 不支持同一 session 并发 turn，串行化粒度应是稳定的 external session key（例如 workspace / bot / binding / runner / conversation / thread / external session id），不是 Agent 或插件实例全局锁。
 - 外部 harness runner 当前是 MVP / dev path，证明协议可接入，不代表发布级安全边界或 Docker 生产可用性完成。

 ## 14. 开放问题

 - `AgentBinding` 是否需要进入 SDK 文档作为只读诊断信息，还是完全 Host 内部。
- `TranscriptItem` 的最小字段集如何定义。
 - ArtifactStore 是否复用现有 BinaryStorage backend，还是引入独立实体。
 - State 与 Storage 的边界是否需要更强类型。
- `platform_api` action 的审批模型如何表达。
+- platform action 的审批模型如何表达。
 - Host 侧 scoped MCP / skill / workspace projection 是否需要从 runner config 上移为一等 resource projection API。
@@ -9,7 +9,7 @@
 ## 文档维护原则（单一事实源）

 - **协议数据结构（schema）唯一定义在 [PROTOCOL_V1.md](./PROTOCOL_V1.md)。** 其他文档不得重抄 schema，只能引用，例如"见 PROTOCOL_V1 §4.2"。
- **实现状态唯一记录在 [PROGRESS.md](./PROGRESS.md)。** 规范类文档不维护"当前状态/✅"段落。
+- 当前实现状态、spec 差距与 runner 验收状态归 [STATUS.md](./STATUS.md)；测试执行入口归 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)，安全发布门槛归 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md)。
 - Host 内部模型（`AgentEventEnvelope`、`AgentBinding`、Descriptor、各 Store）定义在 [HOST_SDK_INFRASTRUCTURE.md](./HOST_SDK_INFRASTRUCTURE.md)，不属于 SDK 协议。
 - 其余专题文档只讲"为什么/边界/怎么用"，避免重复叙述。

@@ -36,7 +36,7 @@
 - **Event subscription / Event notification**：事件订阅、推送通知
 - **BindingResolver persistence UI**：绑定配置的持久化 UI 和 event router 集成（如由其他模块负责）
 - **Scheduler / Background event source**：定时任务、后台事件源
- **Runtime control plane v2**：runtime registry、heartbeat、task queue、daemon claim、progress/cancel 和 runtime audit
+- **Runtime control plane v2 / Run Ledger**：先补 Host-owned `AgentRun` / `AgentRunEvent` / run control primitives；runtime registry、heartbeat、task queue 和 daemon claim 是后续可选阶段，不进入 Protocol v1 主线。

 EventGateway / EventRouter 在本文档中描述为 **external EBA branch integration point**，由外部 EBA 分支提供并联调。本分支只定义 host-side envelope/binding models 和 `run(event, binding)` orchestrator 入口。

@@ -54,7 +54,7 @@ EventGateway / EventRouter 在本文档中描述为 **external EBA branch integr

 主入口仍可由 Pipeline 触发，但内部已转换成 event-first path：`run_from_query()` 经 `QueryEntryAdapter` 把 `Query` 转换为 `AgentEventEnvelope` + `AgentBinding`，再委托到统一的 `run(event, binding, ...)`。Pipeline path 因此获得了 event-first host capabilities（EventLog / Transcript / ArtifactStore / PersistentStateStore 写入，History / Event / Artifact / State pull API 可用）。

-详细实现进度、已验收能力和未完成收尾见 [PROGRESS.md](./PROGRESS.md)。
+下一轮测试路径、状态定义和 smoke 记录见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)。

 ## 术语表

@@ -69,8 +69,7 @@ EventGateway / EventRouter 在本文档中描述为 **external EBA branch integr
 | EBA | Event Based Agent，把消息、撤回、入群、定时任务等都统一成 host event 的接入方向；完整网关和路由在外部 EBA 分支联调。 |
 | harness runner | Claude Code、Codex 等已有自身 session / tool loop / MCP / 压缩机制的外部 runtime adapter。 |
 | projection | Host 把内部事实源、授权资源或配置裁剪成 runner / harness 可消费视图的过程。 |
-| `static_refs` | KV cache 友好的静态上下文引用，例如 system policy、tool schema、resource manifest 的 hash/version。 |
-| Runtime Control Plane | v2 Host 能力层，负责 runtime registry、heartbeat、task queue、progress/cancel 和 audit；不是 Protocol v1 主线。 |
+| Runtime Control Plane | v2 Host 能力层，第一阶段重点是 Host-owned run/result ledger 与 control primitives；runtime registry、heartbeat、task queue 和 daemon claim 是后续可选阶段，不是 Protocol v1 主线。 |

 ## 设计文档

@@ -81,11 +80,12 @@ EventGateway / EventRouter 在本文档中描述为 **external EBA branch integr
 | [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) | Agent-owned context 方向：事件到来时 LangBot 传什么，agent 如何按需拉取更多历史 / artifact / state，以及如何支持 KV cache 友好的上下文管理。 |
 | [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md) | AgentRunner 外化与外部 EBA / Agent Platform / Runtime Control Plane 的扩展边界矩阵，说明哪些是本分支底座、哪些由外部分支接入。 |
 | [EVENT_BASED_AGENT.md](./EVENT_BASED_AGENT.md) | EBA 接入边界：事件模型、事件来源、触发绑定、非消息事件如何复用 AgentRunner 调度；完整 EventGateway / EventRouter 由外部 EBA 分支联调。 |
-| [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) | Agent Platform v2 / runtime 管控面预留：Host 新增 runtime registry、heartbeat、task queue、daemon 执行和 audit；管理插件构建在这些 Host 能力之上。**标注为 future design note**。 |
+| [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) | Agent Platform v2 / runtime 管控面决策：第一阶段优先把 `AgentRun` / `AgentRunEvent` / run control 做成 Host 事实源；完整 runtime registry / daemon 管控是后续可选阶段。**标注为 future design note**。 |
 | [OFFICIAL_RUNNER_PLUGINS.md](./OFFICIAL_RUNNER_PLUGINS.md) | 官方 runner 插件迁移，包括 local-agent 和外部 runner。它是下游落地计划，不是 LangBot 基础能力设计的前置约束。 |
+| [RUN_STEERING_AND_CHECKPOINT.md](./RUN_STEERING_AND_CHECKPOINT.md) | 运行中消息注入（steering / follow-up）与压缩摘要持久化（compaction checkpoint）的 Host 能力缺口设计：来自 local-agent 对照 Pi agent harness 的差距分析。**标注为 future design note**。 |
+| [STATUS.md](./STATUS.md) | 当前实现状态、spec 与实现已知差距、runner 验收状态和历史高价值记录。 |
 | [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md) | Agent Runner QA 指南：保留最高价值测试路径，指导 agent 开展下一轮 WebUI / runner smoke 验证。 |
 | [SECURITY_HARDENING.md](./SECURITY_HARDENING.md) | 安全发布级 hardening 的后续发布门槛：路径隔离、权限边界、secret、资源配额、MCP / skill 投影和审计。 |
-| [PROGRESS.md](./PROGRESS.md) | **🔒 唯一状态事实源**。当前实现进度、已验收能力、未完成收尾和非本分支范围。 |

 ## 工作拆分

@@ -138,7 +138,7 @@ EBA dispatch 的基数和 fan-out 边界仍以 PROTOCOL_V1 §13 为准；本文
 ### 5. Runtime Control Plane v2（Future）

 当前 AgentRunner v1 主线只负责 `event -> binding -> runner.run(ctx) -> result stream`。
-后续 Agent Platform v2 可以在 Host 侧新增 runtime registry、heartbeat、task queue、daemon claim、progress/cancel 和 runtime audit。
+后续 Agent Platform v2 应先在 Host 侧新增持久 `AgentRun` / `AgentRunEvent`、result persistence、cancel/finalize/query 等通用 run control primitives。完整 runtime registry、heartbeat、task queue、daemon claim 和 runtime audit 只有在复用需求明确后再作为可选阶段下沉到 Host。

 在这些 Host 能力之上，可以构建独立 agent 管控面插件；插件负责 UI、策略和编排体验，runtime/task 的事实源仍由 Host 持有。

@@ -2,7 +2,7 @@

 本文档记录 AgentRunner 插件化之后，LangBot 如何继续演进成 Agent Platform 基础设施层。这里讨论的是 Host capability layer，不是 `AgentRunner Protocol v2`，也不是把某个具体 Agent Platform 产品写进 LangBot core。

-> 本文是当前决策版。协议数据结构仍以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) 为准；实现进度见 [PROGRESS.md](./PROGRESS.md)；扩展边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。
+> 本文是当前决策版。协议数据结构仍以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) 为准；测试执行入口见 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md)；扩展边界见 [EXTENSION_SCOPE_MATRIX.md](./EXTENSION_SCOPE_MATRIX.md)。

 ## 1. 当前决策

@@ -0,0 +1,152 @@
+# Run Steering 与 Compaction Checkpoint（Future Design Note）
+
+本文档描述两项尚未落地的 Host 能力缺口：**运行中消息注入（steering / follow-up）**和
+**压缩摘要持久化（compaction checkpoint）**。两者来自官方 local-agent 对照
+Pi agent harness（`pi-mono/packages/agent`，下称 pi-agent-core）的差距分析：
+local-agent 已移植 Pi 的事件生命周期、并行工具语义、hook 扩展点和压缩预算模型，
+但这两项无法由 runner 单方面闭环，需要 Host 协议或授权配合。
+
+> 本文是设计备忘，不是 schema 事实源。涉及的数据结构最终落到
+> [PROTOCOL_V1.md](./PROTOCOL_V1.md)；上下文边界语义以
+> [AGENT_CONTEXT_PROTOCOL.md](./AGENT_CONTEXT_PROTOCOL.md) 为准；
+> run 持久化与控制原语以 [RUNTIME_CONTROL_PLANE_V2.md](./RUNTIME_CONTROL_PLANE_V2.md) 为准。
+
+## 1. Run Steering / Follow-up（运行中消息注入）
+
+### 1.1 问题
+
+IM 场景下用户在 agent 运行中追加消息非常常见（补充信息、纠正方向、"算了别查了"）。
+当前主线是 `one event -> one AgentBinding -> one run_id -> one runner`
+（PROTOCOL_V1 §13）：同会话的新消息要么等待当前 run 结束后触发新 run，
+要么并发触发独立 run。两种行为都无法把新消息送进**正在执行的 tool loop**，
+用户体验是"agent 自顾自跑完过期任务，然后才看到新消息"。
+
+cancel（PROTOCOL_V1 §10）不解决这个问题：cancel 丢弃已完成的工作；
+steering 是在保留当前进度的前提下改变后续方向。
+
+### 1.2 Pi 的参考语义
+
+pi-agent-core 区分两个队列，注入时机都在 turn 边界，不打断进行中的模型流或工具执行：
+
+- **steering**：运行中插入。当前 assistant 消息的全部 tool call 完成后、
+  下一次模型调用前，注入排队的用户消息；模型在下一 turn 看到它们。
+- **follow-up**：排队后续工作。仅当没有 pending tool call 且没有 steering 消息、
+  run 即将自然结束时检查；若有排队消息则注入并继续下一 turn，而不是结束 run。
+
+两个队列各自支持 `one-at-a-time`（每次注入一条）和 `all`（一次注入全部）模式。
+
+### 1.3 设计方向
+
+职责划分遵循既有原则：Host 拥有事件路由和会话事实源，runner 拥有 turn 边界。
+
+- **Host 侧**：BindingResolver / dispatch 层识别"同 conversation 存在 active run
+  且 runner 声明支持 steering"的新消息事件，将其写入 run-scoped steering queue，
+  并标记该事件已被在途 run 认领（不再触发新 run，避免破坏 §13 的基数约束）。
+  事件仍照常进 EventLog / Transcript（事实源不变，改变的只是触发行为）。
+- **Runner 侧**：在 turn 边界（tool batch 完成后、下一次模型调用前，以及 run
+  即将自然结束前）通过 run-scoped pull API 拉取 pending steering 输入，
+  注入 working context。local-agent 的 `AgentLoopHooks.prepare_next_turn` /
+  `should_stop_after_turn` 已预留了对应的注入点。
+- **能力协商**：runner manifest 声明 `steering` capability（参照 PROTOCOL_V1 §4.3）；
+  未声明的 runner 保持现状（新消息按现有规则另起 run）。
+- **回执**：被 steering 消费的事件需要可审计的归属记录（event 被哪个 run_id 认领、
+  是否最终注入成功），形式可以是新的 result type 或 EventLog 记录，落协议时定。
+
+需要新增的协议面（最终定义归 PROTOCOL_V1）：
+
+1. `ContextAccess.available_apis` 增加 steering pull 能力位。
+2. `AgentRunAPIProxy` 增加 steering 拉取 action（含 one-at-a-time / all 语义参数）。
+3. dispatch 层的"认领"规则：什么事件类型可被 steering 吸收、超时未拉取如何回退
+   （建议：run 结束或 deadline 到期时，未消费的排队事件按普通事件重新触发 run）。
+
+### 1.4 边界
+
+- 不引入 Host 替 runner 做 prompt 拼接：Host 只递队列，注入位置和格式由 runner 决定。
+- 不与 observer / fan-out 混淆：steering 仍是单 run 内的输入补充，不产生第二个 runner。
+- 远程 / 外部 harness runner（claude-code、codex 等）若其底层 session 自带
+  steering 能力，adapter 可以直接转发；协议面保持一致。
+
+## 2. Compaction Checkpoint 持久化
+
+### 2.1 问题
+
+local-agent 当前是无状态 runner：每次 run 重新拉取 transcript 尾部
+（默认 50 条）、重新估算 token、重新生成压缩摘要。后果：
+
+- 长会话中每 run 重复压缩计算，摘要每次重新生成，不同 run 之间措辞漂移，
+  对 provider KV cache 不友好（AGENT_CONTEXT_PROTOCOL §"Summary checkpoint 稳定"
+  已写明期望：只有压缩发生时才产生新 checkpoint）。
+- 历史一旦超过 fetch limit，更早的内容永久不可见——没有 checkpoint 记录
+  "已压缩到哪里、压缩出了什么"。
+
+pi-agent-core 把 compaction 条目持久化进 session tree：摘要带
+`tokensBefore` 和覆盖范围，后续 turn 直接复用，只在再次越过阈值时增量压缩。
+
+### 2.2 现状盘点
+
+协议面基本已备齐，缺的是消费约定和授权：
+
+- State / Storage API 已定义（PROTOCOL_V1 §8 "State / Storage"），
+  且 AGENT_CONTEXT_PROTOCOL 已点名 `summary.checkpoint` 是 state 的预期用法。
+- `ContextAccess.available_apis.state` 默认 `false`（PROTOCOL_V1 §5.8）；
+  Host 尚未对 local-agent binding 默认开启。
+- local-agent 侧完全未消费：不读不写 checkpoint（其 README "Current Boundary"
+  已声明这是预期的未来工作）。
+- LLM 生成摘要**不依赖**本项 Host 能力——runner 用已授权的 `invoke_llm`
+  即可生成，可以先行实现；本项只解决"存下来、下次复用"。
+
+### 2.3 设计方向
+
+- **存放位置**：state，scope=`conversation`（小 JSON，符合 PROTOCOL_V1 §8
+  对 state/storage 的边界建议）。若未来摘要膨胀，超出部分放 storage 并在
+  state 中留引用。
+- **key 约定**：`runner.compaction.checkpoint`（runner 命名空间内）。
+- **内容约定**（schema 落 PROTOCOL_V1 或 runner 文档，此处只列语义）：
+  - `schema_version`
+  - `summary`：压缩摘要文本（LLM 生成或确定性生成）
+  - `covers_until`：已被摘要覆盖的 transcript 游标（seq / message id），
+    是增量压缩和"从哪继续拉历史"的锚点
+  - `tokens_before` / `created_at`：诊断与失效判断
+- **消费流程**：run 开始时读 checkpoint → 只拉取 `covers_until` 之后的
+  transcript → 压缩触发时基于旧摘要增量生成新摘要、写回新 checkpoint。
+  checkpoint 缺失或解析失败时回退到现行为（全量拉尾部），保证向后兼容。
+- **失效规则**：`covers_until` 在 Host transcript 中不存在（会话被清理 / 重置）
+  即作废；runner 不得信任跨 conversation 的 checkpoint。
+- **授权**：Host 对声明需要 state 的 runner binding 开启
+  `available_apis.state`；校验沿用现有 run-scoped state 校验
+  （scope、key、value 大小、JSON 可序列化，见 PROTOCOL_V1 §7.2 对
+  `state.updated` 的要求）。
+
+### 2.4 相关但独立的工作
+
+- **tokenizer / usage metadata 透传**：runner 目前用 chars/4 启发式估 token，
+  对 CJK 偏低 3-4 倍，压缩触发系统性偏晚。Host 应在模型响应或
+  `ctx.runtime.metadata` 透传 provider usage（prompt/completion tokens）与
+  model context window（LiteLLM model-info 工作）。该项不阻塞 checkpoint
+  落地，但决定压缩触发的准确性。
+
+## 3. 实施拆分
+
+| 项 | 归属 | 依赖 |
+| --- | --- | --- |
+| steering queue、事件认领、超时回退 | LangBot Host（dispatch / binding 层） | 无 |
+| steering pull API + capability 位 | PROTOCOL_V1 + SDK proxy | 上一项 |
+| turn 边界拉取与注入 | langbot-local-agent（hooks 已预留） | 上两项 |
+| local-agent 对 state API 的 checkpoint 读写 | langbot-local-agent | Host 开启 `available_apis.state` |
+| checkpoint key / 内容 / 失效约定 | 本文档 → PROTOCOL_V1 | 无 |
+| LLM 压缩摘要生成 | langbot-local-agent | 无（`invoke_llm` 已可用） |
+| usage / context-window metadata 透传 | LangBot Host（model 层） | LiteLLM model-info |
+
+建议顺序：checkpoint 先行（协议面现成，改动集中在授权和 runner 消费），
+steering 后行（需要新协议面和 dispatch 行为变更）。
+
+## 4. 开放问题
+
+- steering 注入的消息在 Transcript 中如何与普通消息区分（审计需要区分
+  "作为新 run 触发"与"被在途 run 吸收"）。
+- 多条排队消息的合并语义由谁定：Host 全量递给 runner，还是支持
+  one-at-a-time 协商；建议 Host 全量递、runner 自行决定消费节奏。
+- streaming delivery 下 steering 注入后，前序 turn 已流出的内容与新 turn
+  输出在 IM 消息编辑面的衔接（涉及 `ctx.delivery` 能力，待 delivery 演进定）。
+- checkpoint 是否需要 Host 侧主动失效通知（如会话清空时删除对应 state key），
+  还是仅靠 runner 读取时校验 `covers_until`。
@@ -90,7 +90,7 @@ Claude Code、Codex、Kimi Code 等外部 harness 可以继续使用自身的权
 | 项目 | 状态 | 当前已补 | 仍缺口 / 发布前要求 |
 | --- | --- | --- | --- |
 | Path isolation | Partial | 本地 Claude / Codex runner 会规范化 `working-directory`，拒绝系统根目录、用户 home 和不存在路径；context directory 必须是工作目录内相对路径，拒绝绝对路径、`..` 和 symlink 逃逸；remote daemon 对投影文件使用相对路径 + `realpath` containment，拒绝绝对路径、`..` 和 workspace 内 symlink 写出；ArtifactStore 对 file artifact 使用 `realpath` + root containment 复核。 | Host 生成 workspace / context / artifact root 还缺统一 allowlist、mount 策略、TTL cleanup 和 orphan cleanup；管理员显式 `working-directory` 仍是 operator-owned local directory，LangBot 不承诺阻止外部 CLI 访问同一 OS 用户可访问的所有路径。 |
-| Permission boundary | Partial | Host 已有 runner manifest 权限、binding 级 resource policy、run-scoped authorization snapshot、proxy action `caller_plugin_identity` 校验；Claude Code `--dangerously-skip-permissions` 已改为显式配置，默认 false；Codex 默认 `sandbox=read-only`、`approval_policy=never`，并过滤用户 `mcp_servers.*` config override。 | 外部 CLI 的 native 文件 / 进程 / tool 能力仍属于 operator-owned execution；生产默认或 managed runner 需要容器/VM/OS 级隔离、tool allow/deny 和可审计审批，不能把 runner manifest 当成外部 CLI 的完整权限边界。 |
+| Permission boundary | Partial | Host 已有 binding 级 resource policy、run-scoped authorization snapshot、`ctx.context.available_apis`、proxy action `caller_plugin_identity` 校验；Claude Code `--dangerously-skip-permissions` 已改为显式配置，默认 false；Codex 默认 `sandbox=read-only`、`approval_policy=never`，并过滤用户 `mcp_servers.*` config override。 | 外部 CLI 的 native 文件 / 进程 / tool 能力仍属于 operator-owned execution；当前 Protocol v1 不实现 runner manifest permissions，生产默认或 managed runner 需要容器/VM/OS 级隔离、tool allow/deny 和可审计审批，不能把 runner manifest 当成外部 CLI 的完整权限边界。 |
 | Secret handling | Partial | 子进程不再继承完整 LangBot / daemon 环境，只保留 CLI auth、proxy、locale、CA 等 allowlisted env；Codex `environment-json` 禁止覆盖 `HOME`、`PATH`、`CODEX_HOME`、`PYTHONPATH` 和 `LANGBOT_*`；Codex per-run `CODEX_HOME` 会继承 runtime 用户的 Codex auth/session 和非 MCP provider config，但剥离全局 `mcp_servers`；LangBot managed MCP 写入 per-run `CODEX_HOME/config.toml` 且 `0600`，scoped secret 不进入 argv；remote daemon MCP config / `mcp.json` 使用 `0600`；stdout/stderr、错误和 diagnostic artifact 做 redaction + 输出截断；相关单测覆盖 secret/env 泄漏。 | 仍缺 Host 全链路统一 redaction policy、transcript / artifact metadata / admin UI 脱敏规则、secret 来源与轮换策略、跨 runner 的配置脱敏审计。 |
 | MCP policy | Partial | SDK-owned per-run LangBot MCP bridge 已有；remote MCP channel 有 per-run secret；bridge 只暴露 SDK annotated tool surface；Codex managed MCP 不允许用户通过 `config-overrides` 注入/覆盖 `mcp_servers.*`，也不继承 runtime 用户全局 `mcp_servers`；remote Codex MCP secret 不进 argv。 | 缺 Host / Admin 级外部 MCP server allowlist、scoped token 生命周期、tool allow / deny 策略、危险工具审批和 MCP 调用审计。 |
 | Skill access policy | Partial | Host resource builder 会按 runner capability 和 resource policy 暴露 skill-backed scoped tool；当前 code-agent runner 不再接受用户手写 `skills-json`，避免 runner binding 任意投影 skill；skill tool 路径和可见性已有部分单测。 | 缺 code-agent harness 的发布级 skill 来源验证、版本 / hash 记录、projection cleanup 和审计；如后续需要 harness-native skill 文件，也必须由 Host / sandbox 生成受限 tool surface，不能绕过 SDK runtime 访问 LangBot 资源。 |
@@ -0,0 +1,49 @@
+# AgentRunner Pluginization Status
+
+本文档是 `docs/agent-runner-pluginization/` 的状态事实源。协议 schema 仍以 [PROTOCOL_V1.md](./PROTOCOL_V1.md) 为准；测试步骤以 [AGENT_RUNNER_QA_GUIDE.md](./AGENT_RUNNER_QA_GUIDE.md) 为准；安全发布门槛以 [SECURITY_HARDENING.md](./SECURITY_HARDENING.md) 为准。
+
+状态快照日期：2026-06-10。
+
+## 实现状态
+
+| 领域 | 状态 | 说明 |
+| --- | --- | --- |
+| SDK manifest schema | Done | `AgentRunnerManifest` 包含 typed `capabilities` / `permissions`；未知 capability / permission key 禁止进入 typed model。 |
+| Runner discovery | Done | Runtime 返回 typed manifest；Host registry 校验单个 runner，失败 warning + skip，不影响其它 runner。 |
+| Host resource authorization | Done | `ctx.resources` 和 `ctx.context.available_apis` 由 manifest permissions 与 binding policy / run scope 求交后生成。 |
+| Run authorization snapshot | Done | active run session 冻结 run-scoped resources 与 available APIs；runtime handler 按 snapshot 校验 pull API。 |
+| Result payload validation | Done | Wire 保持 `{type, data}`；Host 对投递/副作用类 payload 严格校验，tool-call telemetry 宽松，未知 type 忽略并 warning。 |
+| Old built-in runners | Done | 旧 `src/langbot/pkg/provider/runners/*` 与 `RequestRunner` 路径已从本分支删除。 |
+| Official runner manifests | Done | `local-agent`、Claude Code / Codex、外部服务 runner 已重新声明真实生效的 LangBot resource permissions。 |
+| Runtime Control Plane v2 | Future | 第一阶段设计为 Host-owned Run Ledger；runtime registry / heartbeat / daemon claim 是后续可选阶段。 |
+| Full release security gate | Future | self-host / container opt-in 可继续；managed/default external harness 需完成 SECURITY_HARDENING full gate。 |
+
+## Spec 与实现已知差距
+
+- `action.requested` 仍只作为 telemetry / reserved surface；platform action executor 不在本分支执行。
+- EventGateway / EventRouter 完整实现由外部 EBA 分支联调；本分支只提供 event-first host envelope / binding / run 入口。
+- State 与 storage 的长期类型边界仍可继续收窄；当前合同只要求 JSON-safe state 与受控 storage API。
+- External harness 的 native shell / filesystem / CLI / MCP 权限不受 manifest permissions 约束；manifest permissions 只约束 LangBot 持有的资源访问。
+- Managed/cloud/default external harness 的 OS/process/network quota、workspace GC、完整 audit/admin control 仍是发布门槛，不是 Protocol v1 已完成能力。
+
+## Runner 验收状态
+
+| Runner | 状态 | 最近证据 |
+| --- | --- | --- |
+| `plugin:langbot/local-agent/default` | Unit-pass; UI smoke pending | 2026-06-10 本地 pytest / ruff 通过；WebUI smoke 由人工统一执行。 |
+| `plugin:langbot/claude-code-agent/default` | Unit-pass; CLI smoke environment-dependent | 2026-06-10 本地 pytest / ruff 通过；真实 CLI 取决于本机登录态和代理。 |
+| `plugin:langbot/codex-agent/default` | Unit-pass; CLI smoke environment-dependent | 2026-06-10 本地 pytest / ruff 通过；真实 CLI 取决于本机登录态和代理。 |
+| Dify / n8n / Coze / DashScope / Langflow / Tbox | Unit-pass; credential smoke optional | 2026-06-10 plugin layout / parser tests 通过；真实服务凭据 smoke 非每轮必跑。 |
+
+## 历史高价值记录
+
+历史报告已合并为本状态页和 QA 指南，不再保留单独进度文档。后续若需要追溯，优先查看 `langbot-skills/reports/` 下的原始执行报告。
+
+截至 2026-05-29，已有本地 smoke 证明：
+
+- `local-agent` 可以通过 Pipeline Debug Chat 走插件化 `AgentRunOrchestrator` 主链路。
+- Claude Code runner 可以通过同一条 `run(event, binding)` 路径执行。
+- Claude Code runner 可以读取 LangBot event-first context，并通过 SDK-owned MCP bridge / skill-backed scoped tools 访问授权资源，随后写回 `external.session_id` / `external.working_directory`。
+- Codex runner 可以通过同一条路径执行，并把 Codex `thread_id` 写回 host-owned state。
+
+这些记录只证明本地协议闭环可用，不代表发布级 security hardening 已完成。
@@ -204,18 +204,29 @@ def wrap_python_command_with_env(command: str, *, mount_path: str = '/workspace'
        fi

        if [ "$_LB_NEEDS_BOOTSTRAP" -eq 1 ]; then
+          if [ -d "$_LB_LOCK_DIR" ] && [ ! -f "$_LB_LOCK_DIR/pid" ]; then
+            echo "Clearing stale Python environment lock without owner: $_LB_LOCK_DIR" >&2
+            rm -rf "$_LB_LOCK_DIR" 2>/dev/null || true
+          fi
+
          _LB_LOCK_WAIT=0
          while ! mkdir "$_LB_LOCK_DIR" 2>/dev/null; do
            if [ "$_LB_LOCK_WAIT" -ge 120 ]; then
+              echo "Timed out waiting for Python environment lock, clearing stale lock: $_LB_LOCK_DIR" >&2
+              rm -rf "$_LB_LOCK_DIR" 2>/dev/null || true
+              if mkdir "$_LB_LOCK_DIR" 2>/dev/null; then
+                break
+              fi
              echo "Timed out waiting for Python environment lock: $_LB_LOCK_DIR" >&2
              exit 1
            fi
            sleep 1
            _LB_LOCK_WAIT=$((_LB_LOCK_WAIT + 1))
          done
+          printf '%s\\n' "$$" > "$_LB_LOCK_DIR/pid" 2>/dev/null || true

          _lb_cleanup_lock() {{
-            rmdir "$_LB_LOCK_DIR" >/dev/null 2>&1 || true
+            rm -rf "$_LB_LOCK_DIR" >/dev/null 2>&1 || true
          }}
          trap _lb_cleanup_lock EXIT INT TERM

@@ -5,6 +5,7 @@ import asyncio
 import os
 import shutil
 import shlex
+import threading
 from typing import TYPE_CHECKING, Any

 import pydantic
@@ -25,6 +26,19 @@ if TYPE_CHECKING:
    from .mcp import RuntimeMCPSession


+_WORKSPACE_COPY_LOCKS: dict[str, threading.Lock] = {}
+_WORKSPACE_COPY_LOCKS_GUARD = threading.Lock()
+
+
+def _workspace_copy_lock(path: str) -> threading.Lock:
+    with _WORKSPACE_COPY_LOCKS_GUARD:
+        lock = _WORKSPACE_COPY_LOCKS.get(path)
+        if lock is None:
+            lock = threading.Lock()
+            _WORKSPACE_COPY_LOCKS[path] = lock
+        return lock
+
+
 class MCPSessionErrorPhase(enum.Enum):
    """Which phase of the MCP lifecycle failed."""

@@ -50,7 +64,7 @@ class MCPServerBoxConfig(pydantic.BaseModel):
    host_path: str | None = None
    host_path_mode: str = 'ro'  # MCP servers default to read-write mount only when explicitly requested
    env: dict[str, str] = pydantic.Field(default_factory=dict)
-    startup_timeout_sec: int = 120  # Longer default to allow dependency bootstrap
+    startup_timeout_sec: int = 300  # First Docker bootstrap may need to build a venv and install MCP deps.
    cpus: float | None = None
    memory_mb: int | None = None
    pids_limit: int | None = None
@@ -257,14 +271,32 @@ class BoxStdioSessionRuntime:

    @staticmethod
    def _copy_workspace_tree(source_path: str, process_host_root: str, process_host_workspace: str) -> None:
-        shutil.rmtree(process_host_root, ignore_errors=True)
-        os.makedirs(process_host_root, exist_ok=True)
-        shutil.copytree(
-            source_path,
-            process_host_workspace,
-            symlinks=True,
-            ignore=shutil.ignore_patterns('.git', '__pycache__', '.pytest_cache', '.mypy_cache', '.ruff_cache'),
-        )
+        # Docker-backed bootstrap writes root-owned runtime directories such as
+        # .venv/.tmp into the staged workspace. The host process may not be able
+        # to delete them, so refresh source files in place and preserve runtime
+        # directories instead of rmtree'ing the whole staging root.
+        with _workspace_copy_lock(process_host_root):
+            os.makedirs(process_host_workspace, exist_ok=True)
+            shutil.copytree(
+                source_path,
+                process_host_workspace,
+                symlinks=True,
+                dirs_exist_ok=True,
+                ignore=shutil.ignore_patterns(
+                    '.git',
+                    '__pycache__',
+                    '.pytest_cache',
+                    '.mypy_cache',
+                    '.ruff_cache',
+                    '.venv',
+                    'venv',
+                    'env',
+                    '.env',
+                    '.cache',
+                    '.tmp',
+                    '.langbot',
+                ),
+            )

    async def _cleanup_staged_workspace(self) -> None:
        if not self.resolve_host_path():
@@ -75,7 +75,7 @@ class TestStartupFlow:
        """Test auth endpoint."""
        # First startup may allow initial setup
        response = e2e_client.post('/api/v1/user/auth', json={
-            'username': 'admin',
+            'user': 'admin',
            'password': 'admin',
        })

@@ -56,6 +56,10 @@ def test_wrap_python_command_with_env_contains_bootstrap_and_command():

    assert '_LB_SYSTEM_PYTHON="$(command -v python3 || command -v python || true)"' in command
    assert '"$_LB_SYSTEM_PYTHON" -m venv "$_LB_VENV_DIR"' in command
+    assert 'Clearing stale Python environment lock without owner: $_LB_LOCK_DIR' in command
+    assert 'clearing stale lock: $_LB_LOCK_DIR' in command
+    assert 'printf \'%s\\n\' "$$" > "$_LB_LOCK_DIR/pid"' in command
+    assert 'rm -rf "$_LB_LOCK_DIR"' in command
    assert 'export VIRTUAL_ENV="$_LB_VENV_DIR"' in command
    assert command.rstrip().endswith('python script.py')

@@ -180,7 +180,7 @@ class TestMCPServerBoxConfig:
        assert cfg.host_path is None
        assert cfg.host_path_mode == 'ro'
        assert cfg.env == {}
-        assert cfg.startup_timeout_sec == 120
+        assert cfg.startup_timeout_sec == 300
        assert cfg.cpus is None
        assert cfg.memory_mb is None
        assert cfg.pids_limit is None
@@ -534,6 +534,27 @@ class TestPythonWorkspacePreparation:
        assert 'exec python /workspace/.mcp/u1/workspace/server.py' in wrapped['args'][1]
        assert wrapped['cwd'] == '/workspace/.mcp/u1/workspace'

+    def test_staging_refresh_preserves_box_runtime_dirs(self, mcp_module, tmp_path):
+        source = tmp_path / 'source'
+        source.mkdir()
+        (source / 'server.py').write_text('print("new")\n', encoding='utf-8')
+        (source / 'requirements.txt').write_text('mcp==1.26.0\n', encoding='utf-8')
+
+        process_root = tmp_path / 'shared' / '.mcp' / 'u1'
+        workspace = process_root / 'workspace'
+        (workspace / '.venv' / 'bin').mkdir(parents=True)
+        (workspace / '.venv' / 'bin' / 'python').write_text('', encoding='utf-8')
+        (workspace / '.langbot').mkdir()
+        (workspace / '.langbot' / 'python-env.lock').mkdir()
+        (workspace / 'server.py').write_text('print("old")\n', encoding='utf-8')
+
+        mcp_module.BoxStdioSessionRuntime._copy_workspace_tree(str(source), str(process_root), str(workspace))
+
+        assert (workspace / 'server.py').read_text(encoding='utf-8') == 'print("new")\n'
+        assert (workspace / 'requirements.txt').read_text(encoding='utf-8') == 'mcp==1.26.0\n'
+        assert (workspace / '.venv' / 'bin' / 'python').exists()
+        assert (workspace / '.langbot' / 'python-env.lock').is_dir()
+

 # ── get_runtime_info_dict ───────────────────────────────────────────