docs(review): refresh box architecture review for feat/sandbox

Sync the docs/review/ suite to the current state of the feat/sandbox branch
(both LangBot and langbot-plugin-sdk), ~30 commits ahead of the prior review.

- box-architecture.md: rewrite for the new box.{backend,runtime,local,e2b}
  config schema, add E2B backend, 6 native tools (incl. glob/grep), Skill
  Tool Call activation, shared multi-process MCP container, SkillManager,
  BoxSkillStore (SDK), 25 actions, 9 error types, heartbeat/reconnect
- box-issues.md: move resolved items (reconnect, heartbeat, Windows, nsjail
  image conflict, frontend monitoring card) into a Resolved section; add
  new P0 (INIT/backend ordering), P1 (extra_mounts immutability after
  container creation), P2 (skill_store test gap, integration tests not in CI)
- box-session-scope.md: add §0 Implementation Status — Phase 1 shipped,
  MCP unification landed earlier than originally scoped
- box-test-coverage.md: realign file inventory (4,400 -> 6,500 LOC),
  add 7 new test files including SDK backend_selection/e2b/skill_store
- box-tob-analysis.md: connection recovery now满足基本要求; add E2B and
  backend self-heal to capabilities; tick off Phase 1 reconnect/heartbeat
- box-vs-plugin-runtime.md: heartbeat/reconnect/Windows support now aligned
  with Plugin Runtime; revise remaining gaps (WS auth, shared base class)
This commit is contained in:
Junyan Qin
2026-05-19 13:31:26 +08:00
parent d80972417e
commit 6351730891
6 changed files with 552 additions and 326 deletions
+24 -27
View File
@@ -1,6 +1,6 @@
# Box Runtime vs Plugin Runtime: 连接架构对比
> 更新日期: 2026-04-16
> 更新日期: 2026-05-19
> 分支: `feat/sandbox` (LangBot + langbot-plugin-sdk)
---
@@ -10,10 +10,10 @@
| 维度 | Plugin Runtime | Box Runtime |
|------|---------------|-------------|
| **继承关系** | `PluginRuntimeConnector(ManagedRuntimeConnector)` | `BoxRuntimeConnector`(独立类) |
| **传输分支** | 3 条 (Docker/WS, Win32/subprocess+WS, Unix/stdio) | 2 条 (本地 stdio, 远程 WS) |
| **心跳** | 20s ping loop | **无** |
| **重连** | WS 模式: sleep 3s → re-initialize | **无** |
| **Handler 类型** | `RuntimeConnectionHandler` (1132 行, 25+ action) | 基础 `Handler` (311 行, 0 自定义 action) |
| **传输分支** | 3 条 (Docker/WS, Win32/subprocess+WS, Unix/stdio) | 3 条 (本地 stdio, Win32/subprocess+WS, 远程 WS) |
| **心跳** | 20s ping loop | 20s ping loop`_heartbeat_loop` |
| **重连** | WS 模式: sleep 3s → re-initialize | 由 BoxService `_reconnect_loop` 处理,指数退避 |
| **Handler 类型** | `RuntimeConnectionHandler` (1132 行, 25+ action) | 基础 `Handler` + `BoxServerHandler`SDK 端 25 action |
| **Client 抽象** | Handler 即 API | 独立 `ActionRPCBoxClient` 封装 Handler |
| **启用/禁用** | `is_enable_plugin` 开关 | 无开关(可用/不可用由初始化结果决定) |
| **初始化失败** | 异常上抛 | 静默降级 `_available=False` |
@@ -48,6 +48,8 @@ else:
await self._start_local_stdio() # StdioClientController
```
> 历史:2026-04-16 版本本文档曾把 Box 描述为 2 路决策(缺 Windows 分支)。现已对齐 Plugin 的 3 路设计。
### 决策矩阵
| 环境 | Plugin | Box |
@@ -111,26 +113,21 @@ connector.initialize()
| 维度 | Plugin | Box |
|------|--------|-----|
| 有心跳? | 是 (`connector.py:69-76`) | **否** |
| 间隔 | 20s | N/A |
| 失败处理 | 仅 DEBUG 日志,不触发重连 | N/A |
| 生命周期 | 整个应用生命周期,跨越重连 | N/A |
| 有心跳? | 是 | 是(`connector.py` `_heartbeat_loop` |
| 间隔 | 20s | 20s |
| 失败处理 | 仅 DEBUG 日志,不触发重连 | 仅 DEBUG 日志,依赖 connection close 触发重连 |
| 生命周期 | 整个应用生命周期 | 连接建立后启动;`dispose()` 时 cancel |
### 重连
| 维度 | Plugin | Box |
|------|--------|-----|
| Docker/WS 断开 | `runtime_disconnect_callback` → sleep 3s → re-initialize | Handler loop 退出,**永久不可用** |
| WS 连接失败 | 同上 | 存储错误 → `initialize()` 抛异常 → `_available=False` |
| stdio 断开 | 仅日志,不重连 | Handler loop 退出,永久不可用 |
| 重连退避 | 固定 3s,无 backoff | N/A |
| Docker/WS 断开 | `runtime_disconnect_callback` → sleep 3s → re-initialize | `runtime_disconnect_callback``BoxService._reconnect_loop()`(指数退避) |
| WS 连接失败 | 同上 | 同上;初次失败时 `_available=False`,重连成功后恢复 |
| stdio 断开 | 仅日志,不重连 | 接同样回调;stdio 重连需重新 fork 子进程 |
| 重连退避 | 固定 3s,无 backoff | 指数退避 |
**Box 断开后的效果链**:
1. `handler.run()` 捕获 `ConnectionClosedError`
2. `_disconnect_callback is None` → break
3. `_handler_task` 完成 → `_make_connection_callback` 返回
4. 后续 `client._call()``BoxRuntimeUnavailableError`
5. Box 功能永久不可用
> 历史:2026-04-16 版本本文档曾把心跳与重连标记为 Box 缺失。这两项已在 commit `2dfd9d5d` / `c6882cf` / `5029d9c` 等修复(详见 [box-issues.md 已解决](./box-issues.md))。
---
@@ -209,16 +206,16 @@ Box 的 RPC SHUTDOWN 确保容器被正确停止,不会成为孤儿。Plugin
### P0
1. **Box 加重连**: 在 `_make_connection_callback` 中设置 `disconnect_callback`WS 模式 sleep 3s → re-initialize
2. **Box 加心跳**: 30s 间隔 ping loop,参考 `PluginRuntimeConnector.heartbeat_loop()`
1. **两者都加 WS 认证**: 至少 token 认证(INIT 时下发,连接时校验)
### P1
3. **Box 加 Windows 支持**: 像 Plugin 一样加 Win32 分支 (subprocess + WS)
4. **考虑 Box 继承 ManagedRuntimeConnector**: 复用 `_start_runtime_subprocess`/`_wait_until_ready`/`_dispose_subprocess`
5. **两者都加 WS 认证**: 至少 token 认证
2. **考虑 Box 继承 ManagedRuntimeConnector**: 复用 `_start_runtime_subprocess` / `_wait_until_ready` / `_dispose_subprocess`,减少重复代码
3. **Plugin 重连加退避**: 固定 3s 无 backoff 可能造成日志洪水,建议向 Box 的指数退避看齐
4. **统一连接管理模式**: Event-based (Box) vs direct-await (Plugin),考虑收敛为一种
### P2
### 已完成(自上一轮)
6. **Plugin 重连加退避**: 固定 3s 无 backoff 可能造成日志洪水,建议指数退避
7. **统一连接管理模式**: Event-based (Box) vs direct-await (Plugin),考虑收敛为一种
- ~~Box 加重连~~commit `2dfd9d5d`
- ~~Box 加心跳~~20s loop 与 Plugin 一致)
- ~~Box 加 Windows 支持~~commit `120817a` / `fafb7a4`