refactor(agent-runner): use sandbox file model

2026-06-21 04:54:21 +00:00 · 2026-06-19 09:30:12 +08:00
parent 2c09af406e
commit 79a5fba06b
49 changed files with 203 additions and 3401 deletions
@@ -8,7 +8,7 @@

 LangBot 的目标不是托管一个强隔离、不可信 code runner 平台。AgentRunner 插件，尤其是 ACP / Claude Code / Codex / OpenCode / Kimi Code 这类外部 harness，默认视为 **operator-owned execution**：用户或部署者显式配置并承担其文件系统、进程、网络、workspace、provider 登录态和 native tool 风险。

-LangBot 需要负责的是保护 **LangBot 自己持有的资源**，包括模型、知识库、LangBot tools、history、event、artifact、state、plugin/workspace storage 等。只要这些资源访问是 run-scoped、permission-scoped、可校验、可诊断的，当前阶段即可接受。
+LangBot 需要负责的是保护 **LangBot 自己持有的资源**，包括模型、知识库、LangBot tools、history、event、state、plugin/workspace storage、sandbox/workspace 文件访问等。只要这些资源访问是 run-scoped、permission-scoped、可校验、可诊断的，当前阶段即可接受。

 这意味着：

@@ -24,11 +24,11 @@ LangBot 需要负责的是保护 **LangBot 自己持有的资源**，包括模

 - **资源授权**：根据 runner manifest permissions、binding resource policy、run scope 生成本次 run 可访问的资源快照。
 - **运行期校验**：所有带 `run_id` 的 SDK / Host action 必须校验 active run session、caller plugin identity、resource id 和 operation。
- **Scoped projection**：只把授权后的资源摘要、MCP server config、context、artifact ref、state snapshot 投影给 runner。
- **LangBot artifact 路径约束**：LangBot 自己登记和读取的 file artifact 必须限制在声明 root 内，防止 path escape。
+- **Scoped projection**：只把授权后的资源摘要、MCP server config、context、attachment/path ref、state snapshot 投影给 runner。
+- **LangBot 文件路径约束**：LangBot 自己 staged 和读取的文件必须限制在声明 root 内，防止 path escape。
 - **基础 secret 策略**：不要主动把 LangBot 持有的 API key / token / secret 投影给 runner；日志和错误里做常见 secret 字段脱敏。
 - **基础运行约束**：提供 timeout、取消传播、输出大小限制或错误映射的基础能力。
- **audit-lite**：记录 event、run id、runner id、binding、资源授权摘要、关键失败、state/artifact/transcript 事实。
+- **audit-lite**：记录 event、run id、runner id、binding、资源授权摘要、关键失败、state/file/transcript 事实。

 ### Runner Plugin 负责

@@ -72,7 +72,7 @@ Claude Code、Codex、OpenCode、Kimi Code、Gemini CLI 等外部工具继续使
 - runtime action 按 `run_id` + `caller_plugin_identity` + resource id + operation 校验。
 - manifest permissions 只约束 LangBot 持有资源，不约束 external harness native tools。

-当前实现方向是正确的：`AgentRunSessionRegistry` 保存 run-scoped snapshot，`plugin/handler.py` 对模型、工具、知识库、history、artifact、state、storage 等 action 做运行期校验。
+当前实现方向是正确的：`AgentRunSessionRegistry` 保存 run-scoped snapshot，`plugin/handler.py` 对模型、工具、知识库、history、state、storage 等 action 做运行期校验，sandbox/workspace 文件访问由 scoped tool 边界控制。

 ### MCP / Asset Gateway Boundary

@@ -93,9 +93,9 @@ LangBot MCP / asset gateway 只暴露当前 run 授权的工具面：

 LangBot 只需要约束自己管理的路径：

- Host 生成或登记的 file artifact 必须校验 `realpath` 和 root containment。
- Artifact metadata 不应暴露 Host-only storage key / host path。
- Context 文件、artifact 文件如由 LangBot 创建，应放在可清理的位置。
+- Host staged 文件必须校验 `realpath` 和 root containment。
+- Attachment/file metadata 不应暴露 Host-only storage key / host path。
+- Context 文件、sandbox/workspace 文件如由 LangBot 创建，应放在可清理的位置。

 用户配置给 ACP runner 的 workspace 不属于 LangBot 的强监管范围。Docker/K8s 下依赖 volume 挂载边界；普通进程部署下依赖 OS 用户权限和用户自担风险。

@@ -107,7 +107,7 @@ LangBot 只需要约束自己管理的路径：

 - LangBot 不主动把自己持有的 secret 投影给 runner，除非这是 runner config 明确需要的外部服务凭据。
 - run token 是短期、run-scoped 的，不应长期保存。
- 日志、错误、transcript、artifact metadata 尽量避免打印常见 secret 字段。
+- 日志、错误、transcript、attachment/file metadata 尽量避免打印常见 secret 字段。
 - 配置 UI / API 返回时继续沿用现有 secret masking 规则。

 不要求当前阶段实现完整 DLP、全链路敏感数据追踪、secret lineage 或自动轮换体系。
@@ -119,7 +119,7 @@ LangBot 需要提供基本可控性：
 - Host run deadline / runner timeout。
 - runner 侧请求 timeout。
 - generator close / cancel 传播。
- 输出和 artifact inline size 上限。
+- 输出和 inline payload size 上限。
 - 错误映射为受控 runner failure。

 不要求 LangBot 为外部 harness 实现 CPU、内存、磁盘、网络、进程树强隔离。需要这些能力时由 Docker/K8s、systemd、容器平台或用户机器策略提供。
@@ -144,7 +144,7 @@ LangBot 需要提供基本可控性：

 - run id、runner id、binding、event。
 - 授权资源摘要。
- state update、artifact created、transcript message。
+- state update、file write/read event、transcript message。
 - MCP / pull API 拒绝时的 warning。
 - steering queued / injected / dropped。

@@ -154,7 +154,7 @@ LangBot 需要提供基本可控性：

 | 项目 | 当前要求 | 状态判断 |
 | --- | --- | --- |
-| Path isolation | 只约束 LangBot 管理的 artifact/context 路径；runner workspace 归用户/部署环境。 | Minimal required |
+| Path isolation | 只约束 LangBot 管理的 context/sandbox 文件路径；runner workspace 归用户/部署环境。 | Minimal required |
 | Permission boundary | 必须保护 LangBot 资源；不约束外部 CLI native 能力。 | Required |
 | Secret handling | 基础不投影、基础 masking、run token 短期化。 | Basic required |
 | MCP policy | run-scoped token + scoped tool surface；无复杂审批。 | Required |
@@ -163,7 +163,7 @@ LangBot 需要提供基本可控性：
 | State lifecycle | scope 隔离、JSON size limit、基础 cleanup primitive。 | Basic required |
 | Audit | 记录运行事实和拒绝原因。 | Audit-lite |
 | UI / Admin control | 权限摘要可展示；不要求审批流。 | Optional |
-| Test matrix | 覆盖 run auth、MCP token、permission deny、timeout、artifact path、state size。 | Focused tests |
+| Test matrix | 覆盖 run auth、MCP token、permission deny、timeout、sandbox path、state size。 | Focused tests |

 ## 当前实现快照

@@ -172,8 +172,8 @@ LangBot 需要提供基本可控性：
 - SDK typed AgentRunner manifest、capabilities、permissions。
 - Host resource builder 按 manifest permissions 和 binding policy 生成 `ctx.resources`。
 - Active run session snapshot 和 `caller_plugin_identity` 校验。
- History / event / artifact / state / tool / knowledge runtime action 的 run-scoped 校验。
- Artifact file path `realpath` + root containment。
+- History / event / state / tool / knowledge runtime action 的 run-scoped 校验。
+- Sandbox file path `realpath` + root containment。
 - Persistent state scope 隔离和 JSON size limit。
 - SDK-owned MCP bridge 和 long-lived asset gateway。
 - Dify / ACP runner 对 LangBot asset gateway 的接入。
@@ -183,7 +183,7 @@ LangBot 需要提供基本可控性：

 - 前端展示 runner LangBot 资源权限摘要。
 - 常见 secret 字段 redaction 收敛成统一 helper。
- Artifact/context TTL cleanup 调度。
+- Context/sandbox file TTL cleanup 调度。
 - 更完整的 MCP 调用 audit。
 - 更好的文档提示：ACP runner 是 operator-owned execution。